Data Discourse

The Most Important Aspects of E-Discovery

September 03, 2024 Peter L. Mansmann, Esq. Season 1 Episode 4
The Most Important Aspects of E-Discovery
Data Discourse
More Info
Data Discourse
The Most Important Aspects of E-Discovery
Sep 03, 2024 Season 1 Episode 4
Peter L. Mansmann, Esq.

In Episode 4 of Data Discourse, Pete Mansmann and Bill Saccani focus on the essential process of e-discovery and its role in managing electronically stored information in legal cases. Throughout this episode, you’ll get a comprehensive look at the strategies and best practices needed for effective e-discovery - the need for careful data management and strategic approaches to ensure successful litigation outcomes.

Pete and Bill cover the risks of both under-collection and over-collection, stressing that inadequate data identification can lead to excessive, irrelevant information that complicates the review process. This can result in increased costs and delays, as failing to gather all necessary data upfront often means scrambling to meet deadlines later, thus compromising the thoroughness of the review.

Another significant topic covered is the value of iterative search strategies - why we advise against using overly broad search terms that can generate an unmanageable volume of irrelevant documents. Instead, Bill recommends conducting refining search terms to narrow the dataset, ensuring that only pertinent information is reviewed. 

Key Topics Covered

  • Proper Data Collection: The crucial first step in e-discovery involving the careful identification and collection of relevant data sources.
  • Risks of Data Collection: The dangers of under-collection and over-collection, including increased costs and delays.
  • Iterative Search Strategies: The need for refining search terms through iterative approaches to manage data relevance effectively.
  • Hit Reports: Using hit reports to analyze search results and improve the accuracy of search terms.
  • Documentation: The importance of documenting search strategies and decisions to support the e-discovery process and defend against challenges.
  • Impact on Litigation: How strategic e-discovery practices can influence the efficiency and effectiveness of legal proceedings.


Precise is your trusted resource for all things mobile forensics and e-discovery.

Precise is your trusted resource for all things mobile forensics and e-discovery. We look forward to partnering with your firm and helping you win your next case!

Visit our website to learn more and set up a free consultation:
Click here to get started

Or call us at 866-721-5378

Show Notes Transcript

In Episode 4 of Data Discourse, Pete Mansmann and Bill Saccani focus on the essential process of e-discovery and its role in managing electronically stored information in legal cases. Throughout this episode, you’ll get a comprehensive look at the strategies and best practices needed for effective e-discovery - the need for careful data management and strategic approaches to ensure successful litigation outcomes.

Pete and Bill cover the risks of both under-collection and over-collection, stressing that inadequate data identification can lead to excessive, irrelevant information that complicates the review process. This can result in increased costs and delays, as failing to gather all necessary data upfront often means scrambling to meet deadlines later, thus compromising the thoroughness of the review.

Another significant topic covered is the value of iterative search strategies - why we advise against using overly broad search terms that can generate an unmanageable volume of irrelevant documents. Instead, Bill recommends conducting refining search terms to narrow the dataset, ensuring that only pertinent information is reviewed. 

Key Topics Covered

  • Proper Data Collection: The crucial first step in e-discovery involving the careful identification and collection of relevant data sources.
  • Risks of Data Collection: The dangers of under-collection and over-collection, including increased costs and delays.
  • Iterative Search Strategies: The need for refining search terms through iterative approaches to manage data relevance effectively.
  • Hit Reports: Using hit reports to analyze search results and improve the accuracy of search terms.
  • Documentation: The importance of documenting search strategies and decisions to support the e-discovery process and defend against challenges.
  • Impact on Litigation: How strategic e-discovery practices can influence the efficiency and effectiveness of legal proceedings.


Precise is your trusted resource for all things mobile forensics and e-discovery.

Precise is your trusted resource for all things mobile forensics and e-discovery. We look forward to partnering with your firm and helping you win your next case!

Visit our website to learn more and set up a free consultation:
Click here to get started

Or call us at 866-721-5378

All right. Welcome everybody to our latest episode of Data Discourse, Practical Advice and Insights about Digital Forensics and Ediscovery. So today, joining me on today's episode for today and the next episode is Bill Saccani. Bill Saccani is an attorney who I actually went to school with in multiple stages of my life, both in law school and high school, I believe even middle middle school, right Sir? Bill and I've known each other for a long time. He is currently president of Precise Discovery and he handles day-to-day interactions with clients and data on a regular basis. When it comes to E discovery needs, he regularly interfaces with our forensics team. You've, if you've watched prior episodes, you've already met Jeff Stigler. 


The two of them work hand in hand in a lot of cases. Jeff works in data collections and different things like that. And once the collection is complete, Bill's job really sort of comes into play where he starts to manage his data and take it from a large volume of data, you know, typically down to a production set and and a lot of stages in between. So what we want to talk about in today's episode is really about the nuts and bolts of e-discovery. Without getting too in the weeds and detailed, what are some things that you need to be thinking about if you have a case that involves E discovery? One of the things I will start off with is that pretty much every single case has some type of E discovery related to it. 


The E&E discovery just means that it's some type of electronic form of data that is being collected at the front end. So if you think about any situation, it will, you'd be hard pressed to think of there not being some source of data or information related to a case that doesn't exist in electronic form. Emails, documents, cell phones, social media, all the types of things that we talked about in previous episodes could eventually make their way into the E discovery process and arena where Bill handles it and ultimately gets into a point where we can produce documents. After that long introduction, welcome, Bill. Hi, Pete, How are you? Wonderful. 


So let's jump right into it. Let's talk about the importance of proper collection and identifying data sources early on in a case. How important is that? Where do you see it rear its ugly head when that's not giving it enough attention early on in a project? Well, for us, you know, and for the clients is an overabundance of data. If you don't take the time to consider where you're collecting from or, or custodians who you're collecting from, we end up with a lot more data and a lot more cost upfront to process this data. And then finally what went down. So you know, getting that identification upfront and before the start of collection is very important to the overall kind of cost of the case and the way that the review is going to go later on and timing as well, right? It is more data, more time to get through it, more questions that need to be answered in order to get to a point where we could produce documents, correct. 


So like you said, the more data there is, the more chances for it taking a long time to process, to review and to get to your relevant set of documents. And, and so let's, let's walk through. And I guess that the flip side, just to, to, to be fair, can also apply here as well where you can miss data sources and later on have to go back and realize you are under collected or under identified. And then you're scrambling to respond to document or discovery requests that you don't have the information for because you did this, it got missed somehow in this stage as well. 


Correct this. So the same goes for that under collection will. Force you to go back and recollect, which is a cost and a time, a time consideration for, you know, inevitably getting documents produced. And and and we've done this long enough, we know that what ends up happening there is that there are documents that are missing that should have been collected. And it's a mad scramble at the end to try to get stuff produced before some production deadline or discovery deadline. And you know, you don't want to rush doing this stuff. 


That's where mistakes are made. That's where things aren't given the proper consideration. So this is an important first step, right? Correct. OK, so let's talk about proper identification of data sources and collection of it. 


So there are times, you know, I, I know that sometimes we're involved with that. Sometimes we're, you know, we get involved or you get involved when somebody has just said, here's what I have, I need to deal with this, but what do I do next? But from your experience in dealing with attorneys, how are they identifying data sources like who needs to be involved in those discussions with them? What, what participation can we have that might help them uncover or discover where they need to potentially gather information? I think our involvement if they engage us early enough is critical in that we've seen this done over multiple cases, multiple years of doing this. So we can help guide the attorneys to look to appropriate sources where the data needs collected when there's just e-mail or email and servers or e-mail servers and some kind of cloud based system wherever they're storing data. 


But we will work in conjunction with the attorneys and then the other people involved would be of course the client and maybe they're IT people to help identify where these sources of data exist and the appropriate way to collect each one. And, and I think as you mentioned, after I've done it, I've done probably over hundreds if not thousands of cases at this point, years of doing this. One of the big factors I think that really slows this piece down is that attorneys might not know the right technical questions to ask, especially if there's IT people involved, if a company has their own internal IT resources or they're using an external IT company to manage data or whatever. My experience has been that IT people speak very specifically, they talk in very defined parameters. And if you're not asking them the right questions in the right way or know where to probe based upon an answer, you may not be getting to what you strategically are looking for from a legal standpoint. 


Does that correspond with what you've seen over the years? Yes, it does. You know, Speaking of that, recently we've had the opportunity to work with a client who has their own IT resources and just that communication between you know what is needed on the E discovery side for the legal standard and what the IT person is used to were two completely different things. So taking that lawyer's advice and turning into what the IT guy understands and can give us appropriately for the E discovery process is definitely a role that, you know, we're likely to be involved with. And, and I think and it's something that we do well and that we've honed our skills on over the years, are sitting down with the attorney at the beginning of the case and say, what are we trying to accomplish here? What's being requested? What's, what's the relevant facts in the case? Who are the important people? What are their roles? How to sort of lay that out if it hasn't already been done to say, OK, well, how do we turn that into an actionable plan that I can tell, you know, Mr. 


IT, this is what we need to do because it's that marrying of a legal strategy with an action plan that has technical components to it that becomes very important. You know, oftentimes they don't know what all the data sources are. Is there, you know, a Gmail email environment? Are they Office 365 environments? Do they manage emails on premises or in the cloud? Do they use a document management system? Do they just save things on the storage server? Do they save things on the cloud and Google Drive or on individual computers? All those things are important to be able to say, well, here's where we got to go to potentially collect this information or be aware that these data sources are out there and understand whether you agree or disagree that they should be collected. 


Just be aware that they're there and that you can make your arguments to the judge or the other side as to whether you should or shouldn't have to go into these data sources. And, and I think that's become probably even a more prominent role of ours as the years have come on where we sort of translate that legal strategy into an actionable discovery plan where we can say, all right, this is what we have to go after. 


We have a good picture of what it is that we're going to potentially need to collect. And that sort of leads into the next question, which is when is it reasonable to utilize internal resources from a client? So let's say, and there's all different kinds of cases, right? We've dealt with ones where it's individuals and obviously an individual probably doesn't have an IT resource. So we're dealing with them to sort of talk to him about where they keep data and, and get a picture of, of where their footprint is. But what about a company where you're like, all right, we're working with a law firm, we're working with general counsel, whoever might be, they've got an IT department or an outsourced IT department and they're saying to us, I want to utilize them. Where is it reasonable to do that? And where does it start to become an issue where you have to be careful about what you're asking them to do? I think it depends. And you know, any discovery, it all depends, but it depends on the tone of the case, I think. 


So if you think that there's going to be a, you know, if we're dealing with a case that has a small electronic discovery, the component to it and we are, you know, that's kind of the in the background type of thing. 


I think that's appropriate for an IT department. But if it's a contentious case upfront and you know that the sides are going to have these arguments over electronic data, you know, that may be the time to step back and say, do I really want my IT person doing the collection and turning this data over for production if there's a possibility that there's going to be a contested. A contested side to the data, like somebody's going to say, well, this wasn't collected properly, this was manipulated. Do you really want your IT people on the stand? If it comes down to that, at that point, it may be better to have an independent forensics person go ahead and do the collection. It just saves you a lot of hassle at the end. 


Wow. And I think that becomes the critical question. Look at the look at that IT person and say, do I want them on the stand? Most of the time the answer is no. And it doesn't mean that they can't be involved. I think we've sort of taken a hybrid approach oftentimes on things where it's like, look, let us come in and tell you where we think you need to collect the data from, what data sources it needs to be pulled in, what filtering is reasonable to do on the front end. Then they can push the button saving some, some, you know, hours on our end, but they're doing it our direction. And then if there ever is a question about why something was done a certain way or not, it's really us answering to say, well, we did it because this was a reasonable approach to the discovery, the way that it was requested and the way the data was set up. So I think that hybrid approach as well can also be a nice way to sort of bridge the gap to say utilize some internal resources, but make sure they're doing it under our directions. 


If there's ever a question about the reasonableness of how these things were collected originally that becomes less of an issue. We're also seeing I, I, I, I believe, and I'd like to get your perspective on this, that, you know, the data sources have certainly changed over time. I, I don't, I can't think of very many cases in the last couple years that haven't involved at least one phone, you know, some mobile device. I mean, they, you occasionally get just the e-mail collections and things like that. But you know, clearly funds have become an important part of most cases. But there's, there's even more Microsoft Teams. 


A lot of organizations adopted Teams as their internal meeting platform and all the rest of it. And it's, you know, it's an alternative to Zoom that people might be familiar with and in the sense that allows you to have web meetings, but Teams also is more than that. It allows you to have internal chat and, and discussions and people can use it as a less formal way of discussing things with each other. So rather than an email chain, you may have people that have chat messages and groups or what not, what not within the team's app. Are you seeing more of that? And do you see that that might continue to expand in the future? That, you know, something like that may be a more desirable target for somebody seeking discovery because it's going to be a less formal way of discussing things and communicating that they're going to want to see what people are talking about. Yeah, I can definitely see it growing and being used that way. Personally, I haven't seen too many of the Teams. I don't know whether Zoom has such a, you know, kind of that market share right now that I haven't seen as much of the Teams aspect of, you know, the Microsoft suites that come through. 


The rest of Microsoft of course is pretty prominent, but you know, that one is one that I think will grow. And you know, people like you said feel less formal when communicating through that. So that's definitely a good place for people to look for relevant data on any particular case. Yeah, I'm like text messages where they want to say, hey, I want the more spontaneous sort of communication that's come through Teams. And again, there's other platforms that are similar to Slack or I forget some of the names of some of the other ones that people utilize in this way, they may become more targeted sources of information. And it kind of goes back to what we've been talking about, which is identifying your data sources. 


You don't want to find out later on. Well, heck, this team used teams regularly to communicate about the issues at hand, you know, every day, all day long, you know, this was their primary communication methodology and somebody just didn't think to ask about it and see if it existed out there. So, and that kind of goes back to the point, I guess about, you know, making sure that their proper data sources are identified at the beginning. And as we've discussed in other episodes, and I know you're very aware this is an ever changing target. 


I'm sure you've seen that data sources have morphed over time. You know, you used to be that you could count on a server with server directories that you could go in and pull out data and you know, you see a lot less of that anymore. I mean it's morphed over into more cloud based storage and things just kept on people's individual computers, probably even in a less structured format and means that collecting it and searching it becomes even more critical. Does that align with what you've seen over the years? Yes, and you know it, it has morphed that way over years. But you know, and also goes back to now you have duplicate data sources and things like that because some people will save it to their computer because they want to work with it offline. But now it's also stored up on the cloud. You know, it just is ever changing and it's always going to be, you know, a moving target and it, you know, morphed over the years of e-discovery. And I think it morphs over the course in the life cycle of a case as well. 


Like you, you learn and identify different sources as the case moves forward. I don't, I don't think, you know, it's not set in stone that these are the only sources of data that we have to grab at the onset of the case and, and we're kind of stuck with what we decide. Well, and, and I think to the pandemic with more people working remotely, it also shifted data sources to be more online or more localized. In some instances where you know, people are not coming into the office every day from 9:00 to 5:00 and connecting to a server and saving data and to find locations. 


They may do a similar process to something that's stored online or they may be doing it locally and working off their individual computers. And, and that's certainly, I think it has changed a bit about where we consider data sources, particularly depending upon how an organization is structured. Yes, we talked for a minute about the timing, the importance of collecting quickly. 


What is that and potentially important. Well, I mean, this is more back towards the, you know, the IT side of things, but one data can get changed that it can get overwritten like if you're not paying attention, there's so many different things that can happen in a time span like. So as soon as you know that that's the data source that you want to collect, you have to move quickly just to make sure no corruption of that data takes place and get that, you know, captured as it is for your litigation or your case. Well, and and and collecting. Can be the same step as preserving, but doesn't have to be. You know, preservation can take a different format. 


That doesn't mean it needs to be collected. I can think of instances where you know, maybe it's early on and you're a client saying, look, we were anticipating a lawsuit or something's been filed, but we don't know where it's going to go yet. 


We're not ready to full bore going to discovery at this point. There are steps you can take to preserve data so you're not fighting a spoliation argument somewhere along the way that don't necessarily require us coming in and collecting it at that stage. Maybe you're not ready for discovery yet. I can think of a couple instances where you know, for instance, in Office 365, which is a pretty common platform that people use, they use Microsoft Outlook as their email. You have the ability in most of those versions to go in and to put a lid hold on emails or people's custodians e-mail boxes or emails from a certain period of time and with a lit hole. What's your understanding of what that does to an email when that is placed upon it? Well, that's basically just making sure that anybody in your organization cannot get rid of, cannot delete that e-mail or I'm not sure if it goes over to OneDrive or not, anything within that Microsoft suite. 


But it, it stops the ability to delete this data without authorization from the IT or your, your legal team so that, you know, you can't have a spoliation issue later on that, hey, you know, John Smith deleted all of his inbox and there was email in there that was needed. And depending on the settings and depending on the platform you're using, my understanding with some of these lid holds is that it really doesn't affect the end users. 


They can still. Move something off of their email box, for instance, and hit delete what looks like the lead to get it off their server if they're to allow them to go about their normal business. But the lid hold is preserving it on the server level so that it's basically a marrying of the two where it's like, look, we know this stuff's potentially important. 


We have to save it back here in the background. So even though they say delete, it's not deleting it. It's just removing it from their view and it can apply across Teams accounts that can apply across the OneDrive like you mentioned, different things like that. So that's, that's one example of a lid hold using a platform that people regularly have a preservation of may be as simple as saying, look, we got a laptop from a former employee. We think we might get sued or whatever, lock it up and put it in a drawer. You know, don't touch it, don't recycle it, don't turn it on, just leave it in a drawer. 


That's preservation. And that may be perfectly reasonable to say, you know, we took these steps to make sure this data, you know, didn't go away. So preservation has a little bit of a different meaning of collection at that point. But if you do know, to your point that, hey, this date is important, we're going to have to move it or into discovery. 


Sitting on this is not a good idea because we've seen it. You know, people will delete stuff. Not nefariously, sometimes it's nefarious, but usually not. Hard drive crashes, whatever it might be. 


And you know, you got to go explain to a judge somewhere along the way that hey, I know I should have preserved this stuff. I should have taken some steps to make sure nothing happened to this, but I didn't. 


Oops. And that doesn't always fly. And he wants to spend a bunch of money arguing a spoliation argument. If you can take these simple steps. So I think that collecting, analyzing, and preserving discussions with IT is a very important part of the whole process. 


It's something we regularly get involved with. It allows you and Jeff to have the tech speak about what it is we're dealing with. And it also starts to form the basis for, hey, here's what we potentially might have to deal with in terms of volumes and data sources and things like that, which everybody wants to know as quickly as possible. And we'll get to that. You know, what's this going to cost me? You know, how much stuff am I going to have to get? So all these things are important. 


Another step that's more and more often, you know, certainly in a federal jurisdiction, and we're seeing it creep up a lot more regularly in state jurisdictions, is the meet and confirm and in the meet and confer. This is essentially what it sounds like if people haven't done this before, which is a judge is asking you to meet with your opposing counsel to discuss discovery issues, including ediscovery, and see if you can come to an agreement about how you want to handle certain things. And this is an area that can really take away some uncertainty and how to approach things and, and how you're going to formulate a consistent and, and reasonable ediscovery plan. 


I'd also point out that it meets and confers any, you know, agreement that comes out of there doesn't have to be the finalized agreement. It can be an iterative process that says, all right, we agree to take these steps now, find out what we're dealing with and come back and discuss it later and then work your way through it. And I think throughout the discovery process, taking an iterative approach is important. I'd like to get your thoughts on that an iterative approach, both for me, discovery in general and as you're bringing things to the meet and confer. How, how, why is that something that you know becomes important? It goes back to, you know, as we discussed it, it's a moving target. You don't know, you don't know volumes. 


You don't know what you have until you start getting in there and looking. You don't know, you know, you may find something that points you in a different direction, like, oh, wait, this person was using an external drive that they were saving all their documents to. 


You didn't know that. You might not necessarily have known that right off the bat, but it's something you found out later, you know, going through the data. So I, I don't think there's any one case or you know, E discovery in general that you can just say this is everything's going to fit into this one. 


One box. So you know, to not say that it's iterative and you're going to move along and progress and and change things as you move forward is kind of full hearted. Yeah. And, I think to your point, you don't know until you know, and you find things out by actually spending a little bit of time and effort to examine, to calculate some volumes, to look at some different things. And, you know, you can be preserving in that same step or collecting in that same step and having real information in front of you. And, rather than just guesses. I mean, I can't think of how many times we've had somebody come to us and say, how much is it going to cost us to to do e- discovery for, you know, 10 people, you know, And that's like us saying to them, well, tell me how much this lawsuit is going to cost me. 


Well, the lawyer would turn back to me and say, well, what's the case? Which side are you on? All right, well, is it what's involved? All right, well, you know, how many people are potentially relevant witnesses, how many, you know, they're on and on and on. You would have to ask questions, a lot of questions have to be answered in order to say, well, based upon what you're telling me, here's what I think a lawsuit like this would entail, effort wise and therefore cost. It's the same thing with the discovery that, you know, we have to get some basic information otherwise we can give you an educated guess, but it's just that it's a guess until we know. And, and a lot of times I, you know, and I know you've heard this is people will say, well, we'll just assume 10 gigabytes of e-mail per person. There's 10 people and you know them, and we'll say, all right, we'll forewarn them. We'll say, but just so you know, it could be something drastically different. It could be half as much, could be 10 times as much. 


We don't know. We're just guessing at this point. But they always seem to forget that and say, well, you told me this was going to be XY and Z and say we were estimating in the dark at that point. 


You really need to take these steps if you want any clarity. Has that been your experience? And you know how people have come to us with inquiries about ediscovery? Oh, yes. I mean, every attorney and you know, rightfully so, they want to know the cost for their client upfront. 


It's just a moving target. And you cannot just by getting on a phone call one time with the attorney. There's no way to guess volumes and you know, doing this over the years, it's come down, you know, it's how long, what's the time frame? What's the date filtering? What type of business is your client in? I mean, that's huge. There's some that will be a year of data for, you know, a person or a year of email for a person. Maybe, you know, 3 gigs that, you know, seems reasonable. But if it's an architectural firm, where is it? Emailing CAD drawings or a lot of attachments back and forth, you know, that same time frame could be 30 gigs. 


You just don't know until you get in there and see what the data is showing you. And a lot of this goes back to this meeting confer that we started talking about. That's the kind of thing that the judge is going to ask you to discuss with the other side and try to get some clarity around it. And so, you know, understanding specifics, not guessing, understanding volumes and not guessing are all sort of the prelude to having meaningful meet and confer discussions, setting budgetary expectations, understanding, you know what, what type, type of time frame and, and legal resources you're going to need to get through all this stuff. So it's something you, you should be encouraging the client or doing as early as possible. So you have some clarity on this now in these meetings and, and the way our involvement I think is typically been with these are more so in, in developing what might become what's called the E discovery protocol. 


Let's talk about that for a second because, you know, I want to, you know, discuss what an E discovery protocol is, how these come to be and what we typically see in E discovery protocols when we're approached with them. Can you talk a little bit first of all about what is an E discovery protocol? E discovery protocol is going to be your road map for one, how you're going to receive data from the other side and or produce it to them. Yes, that's the flip side is, you know, whatever you're agreeing to, that's the way you want the data. That's the way you have to produce the data. So it's how you're going to exchange the information between the two parties. And there's a set, you know, standard of which you can expect that it's going to be XY and Z and you know, there's no surprises. 


This is what you agreed to. This is how we're going to produce this data and move forward. You know it, it always goes by the protocol and the protocol can have what you just described, which is the format of production, right. So it's like maybe they want PDF files, maybe they want load files to put them in their own database program to utilize. 


Maybe they want a printed copy. It can be whatever the parties agreed to. But typically what we see is there's an agreed upon set of fairly standard metadata fields that are exchanged for information about these files that can be typically important about them. And we'll, we'll get into what some of these metadata fields are here, but it can also be how you're going to search this data. 


And, you know, that can be very helpful because it takes a lot of the uncertainty out of all right, what am I supposed to do with all this to meet my obligations of, of discovery? And then also at the other side. So search terms are still a fairly typical strategy that are used in a lot of cases, partially because attorneys understand them. 


They understand basically how they work. They're more comfortable with them. It's imperfect, certainly, but most parties are OK with dealing with the imperfections that come along with search terms you discover. He's never 100% perfect, but it allows them to move this forward in a way that everybody tends to understand. 


We've seen this, I've seen this. I know you have that. You can't go into something and say why I'm going to apply this set of search terms. And that's what it is. And there's no, there is no discussion on it past that. 


This is how we're going to do it. That's not a reasonable approach to this. Going back to the discussion or the, the conversation around an iterative process. How important is it to have an iterative process for searching that you do? And why is that? Well, I mean, recently we had a chance to work with a client that, you know, we got the search terms and we can tell right away, you know, just by doing this for a number of years that these were going to be kind of overly broad, so to speak. So, you know, we ran the, we processed the data, got it ready to search, ran the search terms and we probably hit on 95% of the data. So those search terms did not narrow it down in any significant way. So now you have to go back and look at these search terms and say, why is it hitting on so many documents? Is there a way for us to narrow this down? And that's where the iterative process comes in because now we can go back and look and say, you know, search term A is the main offender of hitting all this data. 


So why is that? Let's take a look and let's talk with the attorneys, get that squared away and see if we can either apply to remove the search term completely or apply some kind of narrowing, you know, connector or anything like that to help narrow this. Broader term down to something that's more reasonable to produce relevant data. Well, and, and a couple examples of that come to mind. It's like, you know, we know, and looking at some search terms, we could say this is going to cause a problem because it's just way too broad and you would never run the word the right every, every other document, if not every document is going to get a hit on the word the so we don't need to understand the case. We don't need to understand, you know, the issues to say that is probably going to return a lot of false positives. 


So we know from experience looking at cases when a word is too generic is too common to potentially be a reasonable search German on its own. And we can then suggest to him, well, maybe we need to use the in relation to other words or make it a phrase like it has to be the whatever, you know, in quotes. And it has to have these two next to each other. It has to be within so many words of another word. There's lots of ways to, to to boolean search this stuff is what it's called to expand it out. But you don't know whether it's completely unreasonable until you've done one of these hit reports and you can show now you're looking at numbers and you're saying, look, this one search term is returning 95% of the documents in the case. You know, that alone should tell us it's not a reasonable term. 


We're dealing with the case right now where that's the case because one of the search terms proposed is the name of the company. It's in every email footer. And also, anytime you know you Bill would write an email and say, Bill Secaney, precise, if precise was one of our words, it's going to get a hit. 


Doesn't matter what the content of the e-mail is. So you can look at that and say, logically, we'd only have to sample this stuff. We know that this is a bad search term because of that, but sometimes it can be an unusual one where you're like, boy, we thought this term was specific enough, but now looking at the number of hits, it's not. Why is that? You can sample through the data and find where your false positive hits are. That gives you some guidance for how to narrow these in, but to your point earlier and particularly on this meet and come for as it relates to search strategies and narrowing this stuff down. You can't do this in all one fell swoop. You have to look at results, make adjustments, be willing to argue those adjustments, try to get agreement on the other side as to why you know you're adjusting it. And most of the time I've seen they agree too, because they don't want to get tons and tons of garbage. And so you try to work this out and if you can't, you get the court involved or you say, look, we were going to go this route and we believe it's reasonable. And if you, you need to bring this up with the court and argue that it's not, that's that's up to you. 


That's that iterative process that lets you get to a point where you say we have a defined search strategy that's documented. We showed how this, you know, how the decisions were made, why it was made and it was based on real numbers, not us guessing when we, when nobody knows until you get to that point. 


I think that's very important. Yes, I think that it is important and it helps the client or the attorneys a lot more than just generically speaking about. Well, yeah, we think this is a good search term. We think this is a good plan. If you have the numbers in front of you, there's no disputing that, hey, this is unreasonable and needs to be adjusted, right. And, and reasonableness, like in any case, is very subjective. 


Maybe the other side is going to say the word " is" is a reasonable search term because your company's called the OK, they can make that argument. You can make your argument. 


You have to try to find some middle ground where possible. But the point being in all this that they can do this stuff through dealing with real data, real numbers starts to become very important and in the capabilities of, of putting together a, a, a reasonable E discovery strategy. 


So we talked about, you know, the importance of collection, the importance of getting to a point where we have real information that we're discussing, real volumes and data sources and things like that. And we touched base a minute ago on a format of production and metadata. 


What is metadata #1 and #2 is? Why is it important? Metadata is going to be all the information about your file as it existed whenever it was collected, and hopefully it was collected soundly. All that metadata stays intact and we can talk about, discuss that later why it's important to collect appropriately versus just, you know, and we've had clients do this. I'm just going to copy this file and I'm going to send it to the ediscovery vendor via e-mail and just have them throw it into the case. Now you know that that kind of messes up the metadata because now you've transferred that file multiple times without any consideration for, you know, keeping that intact. But metadata is going to be all of your information on that file the way that it was, the way that it was when it was created and collected. So it's going to have your date created, date modified, author, file title, all that type of information so that you can use that to help you narrow down, you know, specific files and, and it goes to search strategy as well. 


Sometimes if you want to file type and particular dates and things like that. So the metadata is important because that is just another tool for the attorneys to go and sift through their data and help narrow down to relevant data. So if for example, your search strategy said, you know, we're going to use these key terms and we're going to say you know, the relevant time periods, January 1st to November, November 1st. If that date is not accurate, then now you have an issue because it could either fall outside of your date range or inside your date range. 


But it is just a tool to narrow those down. So it gives additional levels of search ability that most people have come to expect at this point. Yes. What and you mentioned, you know, some of the metadata fields you mentioned, most of those I would think are related to what we'll call E docs, stand alone non emails. What are some metadata fields or common metadata fields for emails? Well, same you're going to have your dates, but it's going to be dates set date received, who who was the author of the e-mail, who sent the email, who was, who the email was to who was CC D, who was even goes as far as capturing the BCC, the blind carp, the blind copy as well. So you know, you can use that to help narrow in on time frames. 


An e-mail subject which is not a poor one. And I always use examples like on a PDF file, for instance, if you accepted or the other side accepted a hard copy of that file or a metadata scrubbed version of that file, they might read a file and say, all right, I can read the content on here. But maybe they didn't know that the title of the file was, you know, how we're going to put our competitor out of business in 2023. And that might be, might be important information and if it's not handled properly and exchanged properly, that's a piece of evidence of that file that should have been exchanged in discovery may be very important and critical to some documents. And and so managing the metadata is important. 


I also want to, while we're talking about metadata, talk about the importance as it relates to emails in how family relationships are maintained with emails. Can you explain what a family relationship is with the emails and the importance of that? Sure. The email itself is, well, let me step back so. A family relationship is just going to be that those documents are tied together. 


Usually it's going to be the e-mail is going to be a parent of the family. So that is the main document that exists. All the attachments within that e-mail, those are considered children. So those are going to be a sub of the parent email so that once you see this in a metadata or once you see this in a database versus let's say once it's in a database, then we're going to see that relationship that hey, this is, you know, email A is the parent and you know, PDFB and PDFC are the children of that email. So, you know, that relationship is kept within the system so that the attorneys know, hey, yeah, this goes along with this family group. And for instance, an email said!!! Read immediately in the email subject line. 


And it's all it says in the body is C attached. You're probably going to want to see what was attached to that email. And so maintaining that relationship between the e-mail and its attachments isn't that simple example becomes very important. And that's what email families in a very simple way allow you to do correct. And there's also something called threading on emails where you know, an email gets replied to or gets forwarded or reply to reply to reply where it's a chain of communications that you know, you have lots of emails that may have come into it. And, and for ease of review, sometimes people want to see threaded emails together, even though they may be separate documents to kind of look at those together and say, look, I know all these emails are dealing with this one subject. So if I read the great, great, great grandparent of all of these emails that contain all these others in it, I don't need to look at the rest because I've already consumed them within this email thread. 


That's another thing that email metadata allows you to do, is that correct? Correct. So, we talked about that. 


It's importance of metadata. Another metadata field that we collect if done, if the collection's done properly and that we extract when we process data is something called a hash value. 


Can you explain for a second what a hash value, MD5 or Sha hash value, the two typical ones you see, First of all, what's that look like and what is it and what's the importance of it? Sure. I mean in the generic term it is a long string of numbers and it is unique to each file. There are different ways that processing engines go ahead and produce their hash value. 


It's usually based off of all the document metadata. You know, file name, file size, date, things like that. And I know there's some programs that will allow you to apply document sender and two and two CC those fields as well. So that, you know, if it's truly that is all the same, then those two hash values are going to line up. So it's basically a yeah, it's going to be somewhere north of 15 characters. 


It's a big long string of numbers and letters. But what that allows the system to do is de duplicate the data. So if document A has a hash value that is equal to document C, then it knows the the system knows that those are two identical documents and it can set a duplicate aside so that the attorneys don't have to look at multiple copies of the same document. So for instance, an email goes out and then the organization and 10 people or CC D on it and it says, hey, we're meeting in the conference room today at 10:00 AM for an important meeting. Nine of the it's the same email to 10 people. Nine of them could get identified as duplicates to say, look, we don't need to produce this 10 times. 


We could produce it once all the information's within it. One of the things I think we've learned over the years is sometimes people want to know well who else had it. We can see the recipients there, but we can also flag it and say it's also come out of these other people's mailboxes. But that's the only difference Getting it from over here versus from over here would show you. That saves you a lot of volume in terms of documents you may be processing or, or producing or reviewing. It's just it, it's a, it's a good automated, reasonable process. It's been accepted as a, as an E discovery sort of standard thing at this point. That allows you to get rid of a lot of volume without having to, you know, look at each individual record, correct. So, proper use of de duplication and metadata collection and all the rest of this allows you to do things like that. So one of the things I always, you know, we're talking about, all right, we take the data, we process the data, we extract the metadata, we also extract the text out of files, right, Correct. 


OK. And what's the importance of pulling the text out of individual files? Well, that allows you to search the data, search the documents. So, you know, you can kind of think of it as you know now, yeah, you can open up a Word document and you can search that Word document for your keyword and you can pull it up and that's perfectly fine. 


What these review systems allow you to do is it extracts those text that that text out into its own individual text files. So now that's embedded into the system. So now when you run your keyword search, it's not against one document, it's against, you know, all 100,000 documents in your database and it's going to pull back instead of 100,000, it's going to pull back the 10. 


These ten documents have this keyword. So it just allows you to search more, more robustly within your data set so that you can do these searches, you know, at one time across everything as opposed to trying to open up individuals. Well, I think this document has this word in it. I'm not sure, but I'm going to open it up and I'm going to search for it. 


You don't have to do that now. You can just go into your review system and type your keyword searching in across the extracted text. So how does the system like that handle documents that are a scanned image or PDF that may not be searchable on its own? Sure. So as we process this data, we identify documents. 


Of course you're going to want to use the extracted text. The extracted text is exactly the way the text exists. So you know, if it's a Word file, you know you're in an extracted text. It's going to pull the exact verbiage out of your document and create the text file that way, which is, you know, almost 100% accurate. Now, OCR, what allows you to do it is optical character recognition. And I'm sure a lot of people know what that is. Now, it's so commonly used today when you're dealing with these cases. 


So what that does is it's a system that will go through and read your document or you know, go through and look at your document and pull out and it will actually create the text file based on what it's seeing. So you know the text, maybe a scanned image is going to now be converted to a text file so that you can search that data as well. Now one thing with that is. 


It's not 100% accurate and it depends on how clean your scanned version is and you know what system you're using to OCR. So it's an interpretation, not a copy paste, correct. And so you have to be aware that when you're utilizing OCR to do text searches, it's just inherently not going to be 100%. It's better than nothing, but it's not 100%, right? You know, there's a lot of times where an L will be an exclamation point, things like that. So it's just a tool to use to help make that document searchable. It's not 100%, you know, reliable like, hey, if I run this search term, it doesn't show up that it doesn't exist, but it's, it's a tool to help get through that data. 


And, and when you do all this processing, you have your documents OCR, you've pulled out the metadata, you've extracted the text you've D duped and all the rest of it. How, how do we take advantage of all that to allow a client to come to us to say we did all that stuff and, and now they say, well, I got to look at this. Either it's production, it's been produced from the other side in that format, or it's our own documents that were deciding what needs to be produced or not. 


It's something privileged going through those types of things. How do we provide them with the visibility of these documents so that they can easily do, you know, their review of these records? Well, you know, we work with them to, one, come up with a review strategy to help them, you know, decide what they're going to look at. And, a lot of times our clients, you know, they, they want to pick off the more relevant stuff first. So, you know, we can kind of help segregate that out and going into that, I mean, you're looking at, you know, keyword searches, you know, we can use what we've previously done and create, you know, these, these times seem to be more important and are going to ultimately return more important documents than, you know, some of the the lesser ones. So we can use that to help pull documents for the clients that the attorneys to review first. We can use date, date ranges. We can use, you know, the tools available to us to allow them to, you know, kind of segregate these in the buckets so that they can review these in a timely fashion and, you know, kind of work their way backward from the most relevant to the least.