Use Data to Shape Other Discovery
This is the tenth installment in a blog series on Fact Crashing™, the acceleration of the consideration of ACTION data (Ambient, Contextual, Transactional, IoT, Operational, Navigational) to the benefit of resolving disputes.
There are 9 Principles of Fact Crashing™. Earlier blogs covered:
Now, let’s take a look at the eight principle.
One of the interesting things about structured data is that the costs of dealing with structured data is very different than unstructured data.
The Cost Curve of Unstructured Data
With unstructured data (emails and documents) we are accustomed to a linear cost curve. Every document has a similar cost – the cost of human review. Technological efficiencies can change the cost slope (less cost per document) or the cost duration (ending the review sooner), but otherwise the cost of review (and privilege review) is linear.
Example of the Cost Curve for Unstructured Data
To make document review even worse, it is not unusual in litigation to not know, initially, which custodians, which events, which dates are of most interest. This is the “fog of litigation.” To navigate this unknown, parties traditionally do what they do best – they review documents and records. They navigate through the institutional knowledge of the company as recorded in emails, and as revealed through interviews. While traditional, this approach is expensive. Not only because of the linear cost of document review, but because of the broad initial scope with which many investigations and litigations start. This linear cost, plus broad scope creates a double impact.
The Cost Curve of Structured Data
However, Structured Data has a very different cost profile. Structured Data costs more than Unstructured Data on the front-end to ensure that you understand what the data represents. At the same time, after that initialization, Structured Data costs much less than Unstructured Data on a marginal (record-by-record) basis.
This is, in part, because Structured Data records are typically highly consistent from record to record. Not in content, which can vary from record to record, but in their format, the types of values they contain, and the range of values they can contain (domain) or the full spectrum of values they do contain (range).
Once the manner in which a particular field is populated and used is understood, those characteristics can be easily anticipated and rapidly extrapolated across additional records. Hence the marginal cost can be next to zero.
However, the initial learning the manner of population, the application, the data type, the format, the domain, and the range, can be expensive. Hence the upfront cost can be high.
For this reason, the cost profile is a rapidly downward slope with low to zero cost per each additional record. Dealing with 10,000 records is not much less expensive than dealing with 1,000,000 – at least from a data management and evaluation basis.
Example of the Cost Curve for Structured Data
As a result of the different cost profile, it is economically attractive to deal with Structured Data first, and then to use Structured Data to refine and clarify the scope and relevancy of Unstructured Data.
For example, payroll data and the coding of hours can sometimes better clarify a class period for an unpaid overtime case. Similarly, putative class members for a products liability matter might be better identified through actual sales records than through internal email communications.
Further – in some cases, Structured Data can not only illuminate the case boundaries better than Unstructured Data, it can also better define which Unstructured Data should even be considered or reviewed.
Consider the following – imagine a market-timing case where the case team reviews hundreds of thousands of emails. They then use those emails to determine which stock trades they need to dig into. The cost of reviewing those hundreds of thousands of emails can be millions of dollars. The alternative Fact Crashing approach is to examine the stock trades, even hundreds of millions of trades, to determine which transactions are of interest, and then review the emails most closely related to those trades. By prioritizing Structured Data in front of Unstructured Data, the cost (and speed) of finding and investigating trades of interest, and even resolving the dispute can be dramatically affected.
Because the cost profile of Structured Data is so different, the per-record cost is negligible, but the per-record value is not. As such, high-volume record sets become a benefit instead of a detriment. It is a benefit because each individual record has potential value, and because large collections of records can reveal trends, chronologies, statistical variations, and other metrics in ways that Structured Data cannot.
“The Smoking Gun is now the Smoking Trend”
As a result, Structured Data can be used to define, or shape, many of the dimensions of a given litigation, including:
- The class period
- The putative class members
- The potential exposure
- The particular transactions at issue
- Potential employees (or custodians) of interest
- Which product models might be defective, or at issue
- The key dates of communications of interest
- Etc. Etc. Etc.
In return, these dimensions, when used as filters, can lead to great efficiencies in the collection, processing, categorization, and review of Unstructured data from memos, emails, texts, instant messaging, documents, and social media.
In summary, because Structured Data has a different cost curve (and benefit curve) than Unstructured Data, not only is it often more economical to try and resolve the entire dispute with Structured Data, even when you cannot, you may be able to resolve part of the dispute, and you may be able to limit (shape) the size, complexity and costs of your document discovery.
iDS provides consultative data solutions to corporations and law firms around the world, giving them a decisive advantage – both in and out of the courtroom. Our subject matter experts and data strategists specialize in finding solutions to complex data problems – ensuring data can be leveraged as an asset and not a liability.