By Dan Rupprecht and Tim LaTulippe on 7/10/20 12:12 PM
In reading Maria Lancri and Mary Ann Carpenters very informative article entitled “Internal Investigations in France – Legal controls to be performed before using a Discovery Platform”, it had me thinking about my many experiences in the field of eDiscovery as I travelled around the world in search of data in the rough.
Over my 15 years’ experience, first in the US followed by Belgium and now the UK, it never ceased to amaze me how the tools we use and the techniques we apply differ in so many ways from one jurisdiction to the next. Technology is the one constant, give or take, but how it is deployed really showcases why discovery advisors like the ones found at iDS really need to understand the pressures imposed by the various legal systems. In doing so, we become better equipped to develop workflows that address these issues in a way that enables investigators the ability to get what they are after while at the same time managing the nuances of subject rights and court/regulatory obligations.
Maria and Mary Ann very aptly provided the legal considerations, which form the foundation of what we as service providers can and cannot do when we begin the arduous task of finding needles in haystacks. Something we always tell our clients is that while we do not provide legal advice, it’s important that we understand what is and is not allowed. This begins with collections. Unlike the scorched earth approach found in the US, where virtually all potentially relevant information is up for grabs, most Europeans would agree that such a broad-based approach runs afoul of all European based jurisdictional requirements.
Preservation and Collection of Data
During an investigation or litigation, the need to triage or otherwise review documents or data records will arise –disclosure is another story. The act of identifying, collecting, thus preserving documents sounds straightforward and is usually painless apart from technical hurdles related to bad luck.
The purpose of a sound ‘forensic collection’ of data is to protect the integrity and evidentiary veracity of documents or other data points. Data forensics is as much art as it is science –equal parts integrity and repeatable process. Data can be preserved without being first collected if proper safeguards are put in place, for instance a legal hold module for e-mail. Once data are safely preserved and a snapshot in time has been established, the triage and analysis can begin.
Need for a targeted, sensible approach to Data Collection
Many of the tools and techniques used to acquire data are built on 1) integrity and veracity, and 2) speed. In order to achieve speed efficiency in data collection many of the tools (software or machine based) collect the entirety of data streams, for instance an entire computer hard drive can be very quickly captured this if all of its ones and zeroes are collected at once, rather than large amounts selectively extracted.
This approach of quickly collecting ever-increasing data volumes has dominated the conversation, however; the need to be more targeted in collection efforts is not only economically beneficial in some cases but is far more Eurocentric as regards data protection and the sensibilities of the general population. As the technology and methods have advanced, it has become simpler to target specific documents, folders and other records whilst preserving the important underlying metadata (the authorship, times created and other information about the data). Using a single data subject as an example, their laptop could have only its most relevant data extracted by a forensic consultant, with the remainder being merely the metadata of those documents, sans content. This method is valuable because in the course of an investigation, an iterative approach can be required, at which point the case team would at least have knowledge of the documents not initially captured (i.e. the names, dates, authors, where they were located on the laptop) and would help a secondary collection effort be similarly targeted.
Occasionally the collection effort can feel a bit like a ‘black box’ exercises, leaving data subjects rationally uncomfortable. The presence of unfamiliar people with lots of kit, wires and computers ‘invading your space’ is naturally disconcerting. Whether remote, or in person, the selective collection of potentially relevant documents can be handled very collaboratively often between the data subject(s) and the forensic consultants directly (with counsel present as desired).
Quick wins through analytics
I often say that where Tim builds the universe of documentation to consider, it is my job to tear it apart. We do not have enough hours in the day, days in the week or months in the year to review every bit of information available. I have also never been presented with a scenario where we have conducted the necessary collections, taken a step back to contemplate the task at hand, and said “that is not as much as I thought it was going to be”. Quite the opposite in fact.
Technology can get us some quick wins. Deduplication and email threading each regularly constitute around a 25% percent reduction in volume. Coupled with a well thought custodian list and a date range that manages to capture the relevant time period, you are already seeing significant culling of data. This said, much remains, and that which is left will most certainly contain information that might be considered personal, privileged or confidential as outlined by Maria and Mary Ann.
Search term lists are not dead yet
To accommodate these restrictions, we can run a number of checks to ensure data that needs to be withheld is accounted for. Although I have often been heard saying search terms are no longer effective as a valid single strategy, if there is a list of individuals both attorney and personal in nature that should be set aside for further evaluation, such a list should be run in the earliest of stages. Said list should also be updated on a regular basis to accommodate any additions as new names come to light.
Domain identification would be a natural next step. Knowing personal accounts or law firm names quickly singles out potentially sensitive information, which can again be withheld from production to authorities or opposing counsel. To the extend a corporation is dealing with a Subject Access Request driven by GDPR requirements, this same domain identification technology can be used to locate and redact personal identifiable information.
As we move through the evolution of discovery technology and think about new and strategic ways of applying them, we can begin to recognise efficiencies in both time as well as cost. Large volumes of data once processed can benefit from advanced conceptual analytics. Thematic similarities start to materialise and very quickly, with little human interaction, identify data to prioritise or set aside for other purposes such as documentation of a personal nature, privileged and confidential, or proprietary business secrets.
In the same vain, Continuous Active Learning or CAL is taking machine learning technology and applying it to the investigative space. Our ability to train computers to identify thematically related documents gets us much closer to AI than ever before. Equipped with exemplar documents or a seed set of data that relates to the types of documents sought, relevant information is quickly prioritised and moved to the front of the queue. Ensuring lawyers are looking at the right documents as early as possible is always the end goal and something that is music to the ears of all corporates and clients who are ultimately responsible for paying the bills. CAL provides endless efficiencies in this respect.
My favourite response to the age-old question of “when will the computers take the place of lawyers” has always been “computers will not take the place of lawyers, but lawyers who understand computers will take the place of lawyers who do not”. When I think about the myriad of issues faced by legal teams beginning the process of understanding all that needs to be considered in and around digital investigations, it is crucial to set in motion multidisciplinary teams and are completely aligned in terms of end goals.
From the attorney charged with deciphering the law and determining what can and cannot be done from a legal perspective, to the investigative technology specialist collecting and analysing vast amounts of electronic stored information, through to the review team charged with a document by document evaluation, clear coordination is essential across the board. In the end, the legal considerations outlined by Maria and Mary Ann become moot unless their advice can be technologically adhered to by the discovery support team instructed to create workflows that accommodate the many regulatory constraint imposed across the various jurisdictions. To this end, a consultative approach which is expert led is key in making sure all are operating on the same page.
Dan Rupprecht and Tim LaTulippe are Directors at iDiscovery Solutions (iDS), an award-winning, global, and expert services firm that delivers customized, innovative solutions for legal and corporate clients’ complex challenges. iDS’ subject matter experts testify and consult in connection with electronic discovery (eDiscovery), digital forensics, data analytics, and cybersecurity/information governance.