eDiscovery has come a long way over the years. Before, we had the labor-intensive process of scanning documents and reviewing boxes, red wells, folders, and envelopes full of paper documents. Today, eDiscovery thrives on technological efficiencies. The goal is to reduce review time and cost while ensuring that the relevant documents are not missed. There are approaches we traditionally think of such as search term reporting, email threading, and advanced analytics such as CAL/TAR that continue to help us reduce the time and cost of document review. However, there is no one-size-fits-all solution for every case. Often, these techniques can be used in tandem with additional, non-conventional, culling strategies to reduce the review time even further. Below are some of the techniques that we execute at iDS based on the case background, data volume, review strategy and project timelines.  

Embedded objects – Do you or your team find yourself reviewing embedded objects with no significant content? If so, consider removing them from the review universe. Items like these can be mass tagged for exclusion during review but included in the ESI production. In one of iDS’ recent matters, we noticed that the embedded objects within PowerPoints were inflating the document population by over 40% after the application of search terms. We developed a strategy to quickly identify the file types which had reviewable content and to exclude the ones that did not. This automated workflow for all future data loads resulting in a reduction of review time by over 50%.

Domain Parsing: With email and web data, domain parsing is an extremely undervalued feature. Using this during Early Case Assessment (ECA) can efficiently cull your review universe by removing non-responsive domains prior to finalizing data sets. For instance, domain parsing can help identify junk, advertisements, monthly subscriptions, or newsletters that can be removed from review universe. When we combine domain parsing with search term application, we can reduce the document review population by 5 – 10%. Additionally, domain parsing when combined with date filtering can also help to identify questionable communications in Intellectual Property theft litigation.

High Attachment Count – You may consider this a no-brainer, but it often flies under the radar. Do you have documents in your document universe with extremely high attachment counts? Based on the issues in your case, you may be able to limit your review to the parent documents. If the parent is not relevant, it may be a waste of time to review the attachments. This is an easy workflow adjustment that can quickly reduce your review pool.

File Type Reports – File type reporting can help to quickly identify non-relevant file types. File types can be mass-tagged and excluded from review, thus reducing the document populations. For instance, source code file types (such as Java, class, or bin files) might be completely irrelevant in your corruption investigation and can potentially be excluded from review.

Normalizing Email Subjects/Filenames and Tally Reports: I highly recommend this method for investigations and beyond. Normalizing would typically involve stripping out email subject abbreviations such as Replies/Forwards (RE:/FWD) which can provide insight into the top Subject lines as well as frequency of these email subjects. This can also help to identify what is clearly ‘junk’. There may be newsletters or highly repetitive conversations you can exclude from review. Running a tally report on filename can also help with culling down non-case related documents.

Data Visualization: Visualizing our data can help to identify patterns and deficiencies. For instance, timeline charts could reveal potential gaps in your collection. Other visualization tools, such as concept clusters, can help test the relevancy of search terms before reviewing tens of thousands of documents.

Sampling: This is another approach that I highly recommend, especially when you are optimizing your search terms. Consider sampling techniques such as random or statistical sampling when your search terms are yielding extremely high document hits. For example, you can deduce the relevant documents in the total population by extrapolating the relevancy metrics in your sample set. If you end up finding more non-relevant documents than relevant in your sample set, you can say with certainty that your search terms need to be optimized. With this simple approach, you can fine tune your search terms to improve overall precision and recall.

Although various factors will drive your project, taking a consultative approach to understand them can help us identify the most efficient workflow for your case. We may use some of the non-conventional techniques discussed above to can help get answers faster. When properly deployed, we can increase review efficiency, reduce the review time, and refine review strategy.

