Data Discourse

An Inside Look at Actual E-Discovery Case Specifics

September 03, 2024 Peter L. Mansmann, Esq. Season 1 Episode 5
An Inside Look at Actual E-Discovery Case Specifics
Data Discourse
More Info
Data Discourse
An Inside Look at Actual E-Discovery Case Specifics
Sep 03, 2024 Season 1 Episode 5
Peter L. Mansmann, Esq.

In the Season 1 finale of Data Discourse (Episode 5), our discussion centers on the transformative changes in e-discovery practices and technologies. Throughout this episode featuring Pete Mansmann and Bill Saccani, you’ll discover valuable insights into how these advancements are reshaping data management during litigation, offering practical advice and highlighting emerging trends that are setting the stage for the future of e-discovery.

One of the key topics covered is the role of Artificial Intelligence (AI) in e-discovery. AI is revolutionizing the way legal teams handle large volumes of data by simplifying the search process through natural language understanding. 

However, despite its potential to improve efficiency, there is some reluctance among legal professionals to fully trust AI for relevancy determinations, particularly when dealing with sensitive information. As you’ll learn, AI can streamline tasks…but maintaining human oversight remains crucial.

This episode also addresses the importance of cost-effective production formats in e-discovery, such as avoiding the production of overly large PDF files (courts prefer document-level productions that are easier to navigate and organize). By adopting more manageable formats, legal professionals can facilitate smoother proceedings and demonstrate professionalism, ultimately avoiding complications with opposing counsel.

Key Topics Covered

  • The Role of AI in E-Discovery: How AI is changing the landscape by improving data search and processing, while highlighting the ongoing need for human oversight.
  • Challenges with AI: The hesitance among legal professionals to rely fully on AI for relevancy determinations due to trust issues with automated systems.
  • Cost-Effective Production Formats: The advantages of using manageable document formats over large PDFs to enhance efficiency and organization in e-discovery.
  • Document-Level Productions: Courts' preference for document-level productions that streamline navigation and reduce burdens on opposing counsel.
  • Data Security and Integrity: The importance of securing sensitive data and maintaining compliance throughout the e-discovery process.
  • Emerging Trends in E-Discovery: How advancements in technology are influencing e-discovery practices and shaping future developments in this field.


Precise is your trusted resource for all things mobile forensics and e-discovery.

Precise is your trusted resource for all things mobile forensics and e-discovery. We look forward to partnering with your firm and helping you win your next case!

Visit our website to learn more and set up a free consultation:
Click here to get started

Or call us at 866-721-5378

Show Notes Transcript

In the Season 1 finale of Data Discourse (Episode 5), our discussion centers on the transformative changes in e-discovery practices and technologies. Throughout this episode featuring Pete Mansmann and Bill Saccani, you’ll discover valuable insights into how these advancements are reshaping data management during litigation, offering practical advice and highlighting emerging trends that are setting the stage for the future of e-discovery.

One of the key topics covered is the role of Artificial Intelligence (AI) in e-discovery. AI is revolutionizing the way legal teams handle large volumes of data by simplifying the search process through natural language understanding. 

However, despite its potential to improve efficiency, there is some reluctance among legal professionals to fully trust AI for relevancy determinations, particularly when dealing with sensitive information. As you’ll learn, AI can streamline tasks…but maintaining human oversight remains crucial.

This episode also addresses the importance of cost-effective production formats in e-discovery, such as avoiding the production of overly large PDF files (courts prefer document-level productions that are easier to navigate and organize). By adopting more manageable formats, legal professionals can facilitate smoother proceedings and demonstrate professionalism, ultimately avoiding complications with opposing counsel.

Key Topics Covered

  • The Role of AI in E-Discovery: How AI is changing the landscape by improving data search and processing, while highlighting the ongoing need for human oversight.
  • Challenges with AI: The hesitance among legal professionals to rely fully on AI for relevancy determinations due to trust issues with automated systems.
  • Cost-Effective Production Formats: The advantages of using manageable document formats over large PDFs to enhance efficiency and organization in e-discovery.
  • Document-Level Productions: Courts' preference for document-level productions that streamline navigation and reduce burdens on opposing counsel.
  • Data Security and Integrity: The importance of securing sensitive data and maintaining compliance throughout the e-discovery process.
  • Emerging Trends in E-Discovery: How advancements in technology are influencing e-discovery practices and shaping future developments in this field.


Precise is your trusted resource for all things mobile forensics and e-discovery.

Precise is your trusted resource for all things mobile forensics and e-discovery. We look forward to partnering with your firm and helping you win your next case!

Visit our website to learn more and set up a free consultation:
Click here to get started

Or call us at 866-721-5378

All right everyone, welcome to Data Discourse, practical advice and insights about digital forensics and E discovery. Here again today I have Bill Saccani, Esquire, who is President of Precise Discovery. He was in our prior episode and as you may remember from that episode, he runs our E discovery company as a whole. He's heavily involved in both consulting with attorneys on ediscovery issues, helping them to develop ediscovery plan searches, strategies, handling all the data from start to finish the ediscovery matter and you know, providing his expert guidance throughout. So Bill, welcome back, great to be back. 


So in today's episode, we are going to basically talk about in generics without giving any names up or any case specifics, talk about some cases we've worked on and some issues that we've come across or requested we've had that we've dealt with things that we think are important for you to know to be able to understand in the decision making process when you may want to go One Direction versus another. So we kind of sat down over lunch today and just discussed some of the things that questions that we get from people that, you know, we're regularly answering for, for example, that we thought it'd be worth addressing in this podcast so that we can make sure people understand why they may want to move in One Direction versus another. So I wanted to start first just with some cases in the news. 


As you can imagine e-discovery cases and the news, you know newsworthiness is almost probably a little more esoteric and more for a law journal type of information. But there have been some cases in the news not too long ago that got some national attention. I think that is worth discussing and just to show you the importance of E discovery and how they can show up in different cases. The 11 I wanted to talk about today was the trial of the defamation trial against Alex Jones. 


This has been at this point a year or more since a verdict was rendered in the case, but there were some interesting E discovery issues that arose. As people may remember, Alex Jones was the owner and spokesperson for the broadcaster for a company called Infowars. And the claim was, and the verdict was that Alex Jones had defamed the parents of the Sandy Hook victims who he said weren't actually killed, that this was just all a part of an elaborate hoax. The simplest explanation that case and they he wanted that they want a defamation lawsuit against them for a very large number essentially appears it's going to be putting him in the bankruptcy in the near future. 


Why that case is relevant to the E discovery arena is that during the course of discovery in that case, Alex's cell phone was collected and the text messages in particular were pulled off that cell phone. They were given to the attorney, his attorney on the case and his attorney turned over the entire contents of the cell phone to the plaintiff's attorney inadvertently or as he claims inadvertently, but he turned him over with no claim of privileges or confidentiality. And one of the crux or key issues in this case was that prior Alex Jones had testified, I think believing in in his deposition that he had never texted about Sandy Hook, never had any conversations, I guess outside the Infowars broadcasting about Sandy Hook, but in particular he had not texted about this. And lo and behold, when they get this phone turned over, these phone text messages, a large number of text messages directly dealing with Sandy Hook. 


So it contradicted what he said. It was certainly an argument he lied underoath. And even though his attorney said I inadvertently turned these over, the plaintiffs should have immediately destroyed them, the judge didn't agree with them, and allowed all that information to come in. And so just goes to show that the process of handling information and data, in this case the cell phone data, the text messages is very important. And it should go through steps. It should go through double checks that you go through QC, that you go through all those things to make sure that you're producing what you intended to produce, especially in apparently an issue that was that important for that case. 


Maybe they would have had a tunnel over all that stuff anyway. But if the, my understanding was they were going to argue privilege on at least some of it, that may have protected it. So because that was the case in the news recently, I thought it was worth pointing out that it was an E discovery issue case that had popped up that I'm sure people would recognize. So we deal with a wide variety of cases, I mean, contract disputes, you know, can be personal injury claims, it can be truck driving accidents, employment disputes, patent cases, you name it. 


We've probably dealt with it in one way shape or form over the years. And we've dealt with all types of attorneys, solo practitioners to mega law firms to everything in between. And, you know, there's different levels of familiarity with the discovery. 


There are certainly people who give it more attention than others. And so we've got a wide variety, I think of exposure to different types of cases and people, they'll talk for a second. Just to give a background here, what's, what's the size of say, some of the smallest cases you've dealt with, size of some of the biggest cases you've dealt with in terms of just volumes, You know, it, it runs the gamut. So we've had small cases with, you know, solo practitioners that, you know, they may have not agreed to a search protocol or something like that. And the other side just gave them a small data dump that was not data dump, but a small production that they didn't have any other way to look at other than to put it into something or have it converted to hard copy or PDF or something like that. So you know, I've had cases under a GB of data, you know, maybe. 5000 documents and I've had data that's multi million documents over, you know, 3 terabytes of data, so you know, and everything in between. 


So most of our cases are going to fall well, well between that, of course. But you know, we've had very small to very large cases. And, and for the stuff that we do, a really small case, let's say it's 100 docs or less. I mean, a lawyer may be able to just manage that by having folders on their desktop or, or, you know, reviewing them somewhere else. It's not so voluminous that they necessarily need our assistance unless it's a very, you know, technical case of dealing with something. But you know, you get into thousands of documents or maybe you have lots of cases that are hundreds of documents at the time and organize lots of little cases over time. 


That's when we typically find ourselves getting involved. Would you agree? Yes, I mean, it's just a matter of, you know, it's more efficient to have these set up in the database and, you know, organized and being able to code tag things like that. And that stays with the document, you know, through the life cycle of the case. So now, you know, you don't have to. The attorney does have to remember, hey, well, I went through this folder already or I went through this folder and these ones are relevant. You know, all that data is captured while it's in the system. And, and I think there's two important points in what you just said there. So #1 is most cases have a hiatus status, right? We see it all the time where it's hot and heavy at the beginning. There's a lot going on, discoveries exchanged, the review of documents and things like that are happening to get through initial productions. And then you might see a long wall, you know, wait while you know, discovery depositions are being scheduled or whatever might be. 


There are these inevitable wall periods where you know people may not come back to a database for months at a time. When that's happened, and it happens fairly regularly, what do you see is the value of having this stuff in a database for when this case heats back up again? Well, you know, there's multiple, there's the 1st that you know, you know, it's kind of in the system. 


We know what was produced already. So say if you know you're working outside of a database system, you have folders on your server and your office, whatever it may be, you know, wait a second, did I produce this? Did I not produce this? That's all captured within the system. So, you know, we know exactly what was produced, when it was produced and what base numbers are associated with that production. 


All of your relevancy tagging, all of that is captured in the system. So you know that that's all going to stay with the document. There's no way to do that really on a Windows based system if you're trying to look through the system and you know, short of maybe renaming it with a review. So it's not very it's not very practical to do that. So you know all that's captured within the review platform so that you can pick up where you left off fairly easily. And that's true for attorney work product information as well, correct, correct. So if again, someone's issue is tagging or identifying documents for use with somebody at that position and wants to write a couple notes about the document, whatever it might be, highlight what they want to particularly talk about. 


All that information is captured and preserved there so that they come back into it. It's much easier to pick it back up again because it's right where they left it off, right? I would also say that you know this, that also becomes useful if you know, an attorney leaves and somebody else is picking up the case file that, you know, again, it's not looking through notes and all the rest of it. 


There is some organization structure to how these documents were reviewed and handled. And our involvement in that typically lets us go in and say, all right, well, here's what was set up for, by this other attorney. 


Here's what they wanted in here. We can provide some guidance as to, you know, what their thought process was and how they were organizing these documents for future use. Yes. 


And I think you know, when you again, you have these inevitable hiatuses that are coming in and out, that becomes very valuable to be able to pick back up quickly. 


But one of the things that we talked about as well is that it's also very important that there is consistency in the way data is captured and managed within a database. If it's a single person going into a database and it's their own work product, it's a lot easier, right? They're going to, they should be able to recognize what it was that they were doing at any point in time. But when you start having multiple people working on cases, and that's pretty regular, you know, we'll have one centralized place where all this data exists, coding taggings being made on it. But you might have multiple people doing that. Maybe one person's prepping, excuse me, one person's prepping for a deposition, another person's prepping for a different deposition. Or you have multiple people working on a review for production to get it done in the certain time frame, you now have to organize multiple people's input into the database. And we always call it the normalization of data. And I, I always like to use the example of saying you don't want somebody looking at a picture and one person saying, well, that's a photograph, and somebody else saying that's a photo, and somebody else saying that's a picture, and somebody else saying it's something different. 


They could all be describing the same thing, but they're now using four different words that you have to get into each of their minds, say, well, how would Joe have described this tagging as you described it? Or you have one choice, it's a photo, then everybody's choosing things the same way. There isn't a whole lot of translation that needs to happen and and and. Does that type of sort of workflow normalization apply throughout a database as you're setting these things up for people to use? Oh, yes. You know, one example is I have a client that likes responsive versus relevant. So, you know, somebody else in the firm is like, oh, this is relevant data, this is relevant, this is relevant and he doesn't like that term. So he uses responsiveness. So everything he does is in response. So if we normalize it in the database to say that you can choose responsive since that's what he likes, but you know, everybody has to choose responsive. 


So, you know, it leaves out that ambiguity and goes back later to say exactly like you described. Well, you know, how did this particular person describe this document, this issue? And, you know, trying to put four different terms into one set where, you know, it was normalized. 


We know that everything is responsive. That's what we're, that's our universe of, you know, producible documents. Well, and as things are happening in a database, whether it's, you know, designating documents for production, designating documents to be withdrawn for privilege, redacting, tagging, any of these things we've described, does the database like this maintain an audit trail of who's doing what and what? Yes, yes, completely. 


From the time that the documents processed all the way through production and afterwards it, the system will tell you, you know, when the document was looked at, who tagged it or what if it was tagged a certain way, what user tagged it, what date that that happened, was it produced, You know, all the way down to, I believe all the way down to you can look after everything. If it was produced, you can look later on and say, oh, this user viewed the document. So it's down to the level of just somebody viewing that document. It has a trail of that and, and why, why is that important? Why is audit trail information potentially important in a case? Well, one, you know, it helps the authorities know if there is a question, they can go back to whoever made the decision and you know, get clarification on that. 


That definitely is a helpful tool. And you know, it just helps you understand, you know, what happened to that document throughout the life cycle. Well, and this was an area we touched upon. I think this ties into its documentation, you know, decisions that are made throughout the course of a case use this search term over this one. No, I'm, I, we're not going to produce this document or we're not going to produce this area of documents or whatever it might be. The decision making along the way and how that translates into essentially work orders for us to say, OK, And I, I imagine that some people sometimes wonder why we send back emails that say, all right, I'm going to be doing XY and Z. 


It's our way of documenting. Here's the decisions made along the way and who made them and why. You know, is that important? You know, if we're ever called to later on, a year later, two years later, for instance, say why, why did something happen the way it did? Yes, I mean, you know, to look at the document, you know, a year later, two years later and just say, you know, you're looking at it and I would say a new light, you know, you're far removed from what was happening at the time of review. And you know, it may be an issue that, you know, they don't think that should have been produced at that time or they think that it should have been coded a different way, You know, without that audit trail, without that documentation, it's just that, well, that's that's just what happened. But if we have the audit trail, we have the decision process, we can say This is why that document was produced, was not produced, You know this user A. 


Told me it was relevant and available for production, you know, at this particular time, right. And, and again, that's all just about making sure that questions can be answered at a later time. If you know, you know, if there's some issue that arises out of decision making along the way. 


So let's run this next little section of questions in this way. I'm going to come to you like a client and either ask you a question or make a request of you. And I would like to hear your response as to what the answer is or B, why you think I maybe shouldn't be doing what I'm requesting. OK, all right, all right, so here's number one, and these are all based on things we've had people ask us to do in the past. 


I want to give them one large PDF file as my production. I don't care if it's 10,000 pages, just give them one big PDF file and nothing else. 


First off, I would ask you, is there a production protocol in place? One, that's going to dictate how you have to produce. 


Two, I would advise you against doing that for a multitude of reasons. One, courts are usually not going to like that. 


Usually you're going to have to produce in a way that those documents were kept in the ordinary course of business. What that means is different though, you know, they could be left to interpretation, whether it's, you know, do we produce as Word document, if it was a Word document, anything like that. That's, you know, you can decide that image files are perfectly acceptable with the court. 


But yeah, you're making this more difficult for the other side. And yeah, they're going to respond the same way and they're going to email back. They're going to produce to you in the same way, which will be cumbersome for you. So in my the way that I would view it is you want to produce in a reasonable format with if you want to produce a PDF, that may be acceptable, but I would advise against 10,000 page PDFs and do it at a document level bait stamps so that you know, you made a reasonable effort to produce this in good faith. I think that, you know, it makes you look like you're doing something intentionally to make things more difficult for the other side. And I would advise against it and, and have you ever seen one that's been the case? Somebody's either just insisted that's their production, that's what they want, they're that it hasn't come back and they've we've ended up having to reproduce in the way we've suggested. 


Yes, I, I've had clients instruct me to produce in thousand page PDFs all merged together. And you know, inevitably it comes back that the other side goes to the court and says this is unreasonable and we have to reproduce in a new format with, you know, document level breaks for that production. And you know when you receive that as well, it's cumbersome because now we have to either do manual document breaks or logical document breaks and split it up. 


You know how we would see fit. Or, you know, you have to go back to the other side and ask them to reproduce, you know, in a usable format. What's good for the goose is good for the gander. 


Yes. All right, here's the next one. I absolutely will never produce a native format that just lets the other side manipulate those files and they can change them however they want. 


OK so again as someone that would be advising the client you're no native depending on how you view it is kind of misconception that they can do anything with that data. So they can take a PDF and change PDF as well. But you know it's difficult to do if you produce in a native format. 


It's usually not a recommended production to do across the whole board, but you are going to have times where you're going to want to produce certain files in native format. So, you know, if you produce a msg file, we have a copy of the original one we sent, we have a copy of the one that was given to the other side. So if the other side manipulates it, we can prove that in an image format versus native Excel is one of them. Excel can be very large and the way that they're set up in the system or you know, however the Excel was, you can print. If you try to print in Excel, it can literally, I've seen this happen multiple times. 


A one page Excel file, the way that the print is set up in the Excel file itself printed off 10,000 pages of blank pages. Almost every cell was, you know, its own page. 


That's the way the format was set up for the printer. So now we have 10,000 blank pages, blank pages, but you know with the page number at the bottom. So you just expanded the A1 page document to 10,000 pages. And so I think one of the benefits of doing an image production where you can, an image can mean PDF, it could mean a TIFF image. Although we're seeing more and more people want PDF files. 


You can put Bates numbers on every single page of an image. Yes, that's very important. You obviously can't do that with a native file. But how do you get around the Bates numbering issue with the native file? How do you identify, oh, this is what the Bates number is for that native file. 


Usually what we would do is slip the sheet of that file. So we would say we would give it a slip sheet with the Bates number that says this has been produced in native format, Bates number 001. So now that slip sheet lines up with, if it's an MSG file, it's going to be number 00. 


It's going to be named by the same number as the Bates number. So you can link those together. So we would have an electronic file called 001, an image file that would have a Bates number 001 saying it's been produced in native format under this folder called 0001 dot MSG and it links it all together. Yes, OK. And that lets you keep sort of a Bates number organization, which people like and understandably so when you're referencing individual pages on documents and depositions and things like that, certainly makes it a lot easier. And then to your point, or some files you just can't produce and I can't make a video file PDF. 


It's impossible. So you're going to have to produce the stuff natively. And if there is an all native production, if that's what the request is, it's not the end of the world. It can still be dealt with if that's what they're going to fight for or fight over. The manipulation, data manipulation thing to me is not an issue because we would be able to be able to tell, right? But you know, there's one thing I was thinking about that is that an issue is the base number. So now you have. Two different people that have this native file that that's fine, but say you're going to print that off to go depose somebody. Different programs print differently. So you can have an email file that your system prints off as 4 pages. Somebody else's system prints off as 2 pages. Not when you're talking about that in a deposition correlating, you know, page which page is correct or, you know, you want to go back and find that page in your system. 


It's just cleaner, I think, to produce page level base numbers, which you can't do at a full native. Yeah. And, I think we're more and more seeing hybrid approaches where when a document can be converted to an image file, they are, and that's typically emails and Word files and PDFs and things like that. And then there's usually specific file types to say, but we want these in native, no questions asked Excel files. People often request PowerPoint files and PowerPoint largely because the comments don't always show up if they're converted to files and they want to be able to see them speakers, notes, for instance, database files, database files. And then and then there's usually a sort of catch all in the agreement that says any others that can't be produced in image format will be produced in native. That just gives you that cover all to be like if you got a what you know, you gotta a movie file and you have to produce it, you produce it in native. 


That's how we typically see these things. It kind of gives you the best of both worlds and covers as much of the issues as you can and and gives you the Bates number into the large part, right. 


Yeah. All right. Here's another one. 


I want to produce just the email but not the attachments or vice versa. Well, there is we, we have clients that and have had clients tell me that's the way they want to produce that. You know, if I produce the email and I don't feel that the attachments are irrelevant, then we're going to split that family. And you know, in, in the industry that's yeah, we'd call that split family where you produce one, you know, one the emails relevant, the children are considered not relevant or vice versa. There might be an attachment in there that somebody says is relevant, but they say the email is not relevant. Just if it's an issue of not being relevant, I would suggest that you produce entire families just because it saves you an argument later of why you didn't keep this family together, why one was relevant, one was not because inevitably, you know, a little less so on attachment. 


If you produce an attachment without an e-mail, that's probably going to be a little bit less obvious than if you produce an email that has see attachment 1-2 and three and you don't produce any attachments. The other side's going to come back and say, hey, why aren't these attachments produced? Then you have to explain it, well, they're not relevant. Well, if this is relevant, why isn't this? So now you have an argument you have to make to the other side and potentially the court if they go that route. So, you know, splitting up families. And then if it's privileged, then you just slip it and say this is a, we're withholding it for privilege, but you're not hiding that it was there. Or in the very least, if you're going to do this relevancy split because maybe there's confidential information or personally you got your extra concern about, you need to slip-sheet it to say, we recognize it's here. 


We're not hiding that it's not here. This is the reason we're not producing it, correct? You know, and that that to your point, particularly on the email level, it's usually, you can see in the image file, the produced e-mail, the names of all the attached files. 


So they're going to know that something's missing. And, and, and oftentimes in the attachments in the metadata that's produced, it'll say, here's a parent that relates to this and it'll point to something that doesn't exist. 


So I think you're always better off either addressing that as an agreement with the other side in your protocol or if that's the position you're going to take to make sure that your slip sheet and the ones that you're withholding, or in the very least, if it's irrelevant, but it's it's not damaging or confidential or any of those other issues, then just produce it, you know, and, and just say we made the decision to keep the families together for consistency in the data, even though it's an irrelevant document. That's most often how it's done. 


Yes. OK, this is a big one. My clients are very concerned about, you know, the data that's being collected, where is it going to live? There's personal information on here. This is their family photos and, you know, login information, you know, where's this stuff going to be? Who's going to see it? How do I know that this isn't just going to be spread out there for everybody to see? Well, you know, that's a concern. And you know, the more and more we see in the news and data breaches and things like that, I see why people think that that's going to be an issue. 


For the most part, you know it's kept on servers that are protected. And personally, it sounds kind of bad, but I don't care that nobody here cares what data is on there. We are set with a specific task to collect the data, search for a limited number of. Or for certain criteria and that goes to the attorneys. 


The attorneys are the ones that see this data. We don't look at anything else that is outside of the scope or beyond that. So, you know, I understand the concern and we recently had a, you know, client that refused to turn over cell phones. That's going to be an argument for the court because, you know, I'm sure the other side's going to want it, but they are refusing to do so until they have a court order. I mean, they know that they may be coming, but they're not going to voluntarily turn over certain data. And when you say you don't care, I, I, I think what you're saying is you got too much stuff to do to be snooping through people's, you know, non relevant data or yes, there, there's no reason for us to be in that data. We usually have 10/15/20 cases going on at any one time. 


We don't have the time or resources or inclination to try to go through somebody's personal data just for fun. Yeah. And, and I think when we, we usually get a chance to explain to people, here's the process. We're not just going in there and clicking, clicking, clicking, clicking, looking through things where we're using computer programs, encrypted software's and encrypted file storage and password protected access, all those things. And we're, we're taking this systematic, logical approach to target relevant information, further filter, further filter, further filter before anybody ever said so, you know, puts a set of eyes on things that usually gives them a lot more comfort to be like, all right, here, someone's just not going to be looking through all my stuff. 


There's a process here. And this is how it's done. And this is fairly regular. And it's not, it's not a snooping type of scenario, Right. 


OK. Here's another one. And I think this is maybe less of a client request as it is an issue to point out is that and I think you talked about this a bit and at the end of our last podcast, which was. It's inevitable you're going to come across files that are corrupt, have some type of issue with them, might be password protected, might be old. 


You know, I wanted that. We made a couple notes here of a couple things that you do come across or some examples that we've come across oversized files. And in particular, this shows up a lot of thinking and architectural drawings that may be converted to PDF files or may be in the original format, but they're designed to be large files because you need to be able to, to narrow in on small pieces of a big picture and still see it with clarity. 


Talk about some of the practical issues that you have to deal with, and dealing with large files like that. There's multiple issues with drawing files. So I'm going to say drawing files, we do get some of them in PDF format. If you look at the properties of the PDF, it's going to be, you know, 24 inches or larger by, you know, they're formatted large because they're meant to be printed off on large scale so that you can read them. 


So once you create an image file from a production like that, it's going to be 8 1/2 by 11 is what the system is going for. That's the standard size. So now when you go to blow up, it's going to be hard to read. So that's, you know, the issue with the static image file when it's a large PDF or something like that. I've also run across, you know, large drawing files. You just have to be careful because you know, a lot of these CAD system files have layers just like an Excel sheet. You know you can tab through, you may only print the first, the top layer. So now you're truly missing a few of the layers that may build upon that file. So you know, that's one of the reasons why CAD drawings are usually. Exchanged natively, that's one of the file types. I think that would be definitely fair game for suggesting that they be produced natively. 


And, I think we've come across and recommended the clients, you know, look, if you got a large amount of drawing files segregated or video files or photographs, something like that where you're like, look, you know, they're relevant. For instance, you're not concerned that there's privileged content in them. 


We don't necessarily need to run this through the traditional ediscovery processing that would incur charges and things like that to make them searchable. They are what they are and maybe faster, more efficient, cost effective ways of getting them organized and numbered and added to a production file like we discussed without having to process a large amount of data which is going to be driven by file size. Yes, that's definitely an approach to help the clients. 


Yeah. We've also come across old PDF files where, you know, they view OK, but they don't print. 


All right. And the printing in this case is relevant when we're converting these to be a uniform production to go out. Essentially when we convert them to image files, it's the same basic process that a print function would. But instead of going to a printer, it goes to a file, a file. 


So you want to just talk about that scenario for a second. And it's just an example of some of the stuff we run across. Yeah, I mean you know it's, it was a PDF file, a particular client had multiple PDF files within a data set. 


You would think PDF is PDF, but unfortunately it was being created on an old scanner. I believe that you know there was version one of a PDF or Adobe. So it was creating these files that had something embedded in them that the newer version did not like that was reading it as just a big black box. So every time we would print it would. 


Print a big black box over certain sets of or certain places on the page. It was never it wasn't consistent. It was just something weird in the file. So you know, it goes back to when I say no data set is the same. I think I said that in the last podcast is just yeah, something as simple as, hey, a PDF is a PDF. No, you know, everything is different and you just have to be aware that there are issues with everything and it's not 100% right. And this goes back to that. 


There's manual steps that are always going to be involved. It's inevitable. Yes. All right, well, here's another request. 


I want you to use OCR on my documents and then D dupe based upon the OCR that we get a lot. So a few things with that. One is that OCR is not perfect either. 


OCR is run across static images to, you know, recognize the text that is on the page and create a text file so that you can search with it. That is 1 going to be each OCR engine going to be treated differently. But even if it's just us taking this document and OCR in it with the same engine, it could still pick up letters differently depending on the way that it was scanned. So you know, if the same, even if it was the same page, it could have multiple characters that are different in another page that may be duplicated further down in the document. So it's a tool and there is near dupe analysis that we can look at and say, OK, technically there's you know an algorithm based method to say these ones are and you can set the similarity. 


You can set the similarity proximity. So if it's, you say if it's 80% similar, I want to see them. If it's 95% similar, I want to see them. So you can say they are similar. But right now in the industry, there's no practical way and it's not acceptable or you know, accepted in the industry to say that we're going to use OCR to do a deduplication. It's really just on that hash value right now, which is what gives you accuracy duplication, correct? It's going to be based on, you know, the metadata of the document. 


All right, here's another one. I, I want you to just export the documents and I'll make some final changes to them before I produce them to the other side. 


Yes, we can set up a document set so to speak and the review platform and I can give you access to it. I would advise against going ahead and making changes once we produce because one, I don't know what changes you made and if we come back to that, this going to be a mess to try to figure out what happened to a document production after I created it, exported it, gave it to you, and whatever you did to it beyond that is going to be extremely difficult. Not to mention that, you know, if you're removing stuff and not logging it now we have gaps in base numbers or you know, if it's at the end of it, I can use a base number that is inappropriate for the next production. 


There's a multitude of things that can create issues that just expand upon themselves. So, you know, now we have missing Bates numbers and then we use a wrong Bates number and then we do a couple of more productions. Now we're, you know, 3-4 productions in and nothing's lining up. 


It just creates more of a headache for the client than what it's worth. And it's just easier to do it correctly the first time even you know, I've had clients or OK, well. I shouldn't have left this document in. 


I'm just going to pull it out. No, don't do that. Just let me pull it out on our end, reclassify it, re export so that we have consistent dates numbers. 


Yeah. And, and I think maybe they think that they're saving, you know, costs or time by doing it that way. They're, they're just, they're creating three times as many problems further down the road. 


It's fine if you want an export and work off the export to say, well, I've identified 10 documents I've decided not to produce. That's OK, just tell us and let us rerun the production for you so that it's consistent. And like we talked about earlier, it's documented. 


The process is there. We have a clean delineation of what was produced and when it was produced and why it was produced, and if we need to pick up from there and start again, it's all clean. Otherwise, there's just a lot of unraveling in the further, further along that time expands between when you did your last production and you're expecting to do the next one. 


It just makes it a lot more difficult, correct? This is another one that goes back to a case we worked on. Why is my database so large? You know, and, and I'll just give a little background on this, that the client had agreed to a production format before we were ever involved and then came to us with the output of this production and said I need you guys to host this, which we did. 


But it ended up being I think almost two terabytes of data for a relatively small production, maybe 100,000 documents or something like that. It wasn't, it should not have been that big. 


And they were mad that, you know, it was costing him as much money as it was to host it because we charge on the GB rate. 


And then, you know, they said why is my database so big? You know it shouldn't be this big for 100,000 documents. And what did it turn out to be? Well, that one ended up being they agreed to produce everything in color. And not only that, it was color TIFF files. So, you know, if you're producing color and it's AJ Peg or PDF, they get compressed a little bit so they're not quite as large. But these were color tiffs and every single page was produced as a color TIFF as opposed to if it was black and white email, it was still a color TIFF. So it exponentially just exploded the size of that database compared to what it should have been. I mean, remember back on that case, I think we estimated at some point we would ask for it to be produced and what would be a normal format or a reasonable format, I think we figured the thing would have been closer to 50 gigabytes instead of 2000 gigabytes. 


So it was, it exploded. It's just for a size, I can't remember the exact numbers, but it was a substantial amount of data that was being used for no reason. And I'm sure that somebody said, oh, you know, color is going to be important in this case for whatever reason, and then made a blanket decision that had this effect without knowing what the potential of the effect was. 


I'm sure there would have been ways to work around that. And I've even seen protocols. Yeah, that's the new standard in the protocol is basically, you know, you're going to produce everything black and white TIFF. And if the other side says, hey, I think this is something that needs to be seen in color, then you know, you honor that request. It's going to save you a lot of time and a lot of money in producing and volume as far as GB size. And then, you know, just have to have it set up so that you know you're going to honor any reasonable request to go ahead and produce that in color. 


All right, so for the last bit of our podcast discussion, I just want to turn our attention to where things are going. And I think a good way to, to see where things are going, to talk about where things have come from. 


And yeah, we discussed this a bit of, you know, what the evolution of ediscovery has been since we've been involved in it and we've been involved in it since the discovery was coined. 


So from the very beginning, and, and I remember the days and I'm sure Bill does as well, where, you know, ediscovery was ridiculously expensive compared to what it is now. There weren't a whole lot of software options and the amount of data we were dealing with was a lot smaller than the amounts of data we're dealing with today. So a few bullet points, I guess that I would ask you, Bill, are what do you think are some of the bigger evolutions of ediscovery from those early days to where we are now? What do you think has changed probably the most? Well, you know, a bunch of stuff has changed, but the speed at which we can process the data is definitely increasing. The error rate of aired files has gone down. You know how we're storing it, the cost of storage that all comes down. So, you know, it's definitely made strides. 


But you know, and the review platform is much better and more user friendly than it was when we first started so that, you know, attorneys can filter through their data sets a lot quicker. Do you see much of any on premises servers and softwares and things like that or is everything in the cloud at this point? I wouldn't, I wouldn't hesitate to say everything there. We still have clients that have AOL accounts, so nothing's. 


Unreasonable to think, but for the most part that's everything is moving to the cloud. None of the old yeah big server rooms in these wells even on the E discovery side, I mean softwares, tools, access all the rest of that. I think it has shifted 99% of it is online definitely. And, and it's good. It's good in the sense that the ease of accessibility, it's used to share things that updates on having the latest software and access to more processing power and things like that are available. 


You know, people have their fears about, you know, accessibility hacking and all the but it's been done that way now for 10, maybe 15 years and you don't hear about a lot of data breaches just because, you know, these companies are on top of making sure security is a is a primary focus of what they do yes. So, you know, that's some things that have changed significantly. Things have gotten faster, prices have come down. We can get through a lot more data quickly, less air times and and come with that is the conception or the the misconception that everything is, you know, can be done with the push of a button and turned around in an hour or two. 


That's not true. You need to put thought behind what you're doing. You know, sometimes things move very smoothly and they are fast. 


Sometimes they don't. You want to leave yourself some wiggle room, regardless of time wise, just to make sure that you're accounting for the inevitable bumps in the road that come with dealing with this stuff. 


Definitely. And you know, that goes with every case like, we may be able to get it done faster, but you know, plan for the worst just to make sure because it's not, it's not 100% accurate. It's not. That is easy to just push a button and it's done. 


There is some definite effort involved and processing and thought process that goes behind you. How do you deal with this stuff and what is the best way to get it corrected? And I think the last thing I want to talk about today is where is it going? You know, AI is infiltrating everywhere now and it's not going away anytime soon. And I think AI may have the potential to have a big impact in the E discovery. Now, most people probably have heard about lawyers getting in trouble with AI by using it for legal research or having it write briefs and AI creating its own cases and stuff like that, where they totally, 100% relied on AI and just handed in. 


That's not good lawyering. But you know, you, there are limitations to things right now. But where I think AI might have a big impact is in the attorney's abilities to interact with an interface that doesn't require a higher level of knowledge of how to run searches. And that's, that's one of the possible issues right now, with you needing a little bit of proficiency to do your own searching. 


We do a lot of searching for our clients. They tell us what they want. We say, OK, based on what you're telling me, here's how we're going to search it. We search it, we set up the results, they look at it. 


Some attorneys have gotten better at running those searches themselves. They're more comfortable with those kinds of interfaces that can do that a bit more. There's others that aren't. I think AI has the potential to bridge the gap in the search requests where you can talk in more of a natural language type of request and have the search engine go run to look for information for you using very powerful analytics tools and using search terms and combinations of all the above. I, I think of, from my point of view as potentially one of the one of the more impactful ways that AI infiltrates our space. 


That's going to be one of them. I think we're far ways away from AI being able to be accepted as the determiner of relevancy for production. Predictive coding and technology assisted review have now been around for a long time. They never got quite the traction that they were supposed to get at the time they came out. And this is a real quick background. The premise behind those are that by lawyers reviewing a small set of sample docs, a computer can then take the analysis that the lawyers put into those relevancy calls and apply it to a larger set to say all right, based on what you told us. 


Here's other documents that are probably relevant and should be produced. And it makes sense you're sort of taking this hybrid approach. Put the human brain in there, but let the computer do the heavy lifting. Yet we don't come across that very often at all. And I think part of the reason is, is because a, people don't want to just turn over the, you know, the decision of relevancy to a black box, understandably so. 


And they don't, it's not something they can practically understand on a deep level. And so therefore they shy away from it. And so unless you're in a really big case with millions and millions of documents where the cost savings are too big to ignore, people tend to shy away from that type of technology. AI is even more so like that, I think, where you know, you're not going to know how it's doing its own processing, you're just going to have to accept the results. So I don't think on a production standpoint or relevancy that will be there anytime soon. But for you to be able to query your own database and say, show me something that deals with this or that and or has this importance level, that could be very valuable because you know you would be doing the same thing essentially running your own searches. 


You're just letting a search engine do it for you. So I think that's probably where the impact comes into play here on AI. The processing speeds are always going to get faster and faster, make it easier to process data and get through stuff. But I think the ability to translate legal strategy into technical workflow and and you know, an actual working plan is still going to require somebody with, you know, an understanding and level of expertise that that light somebody like yourself has has gained over years doing and that and that's probably not going away anytime soon. So unfortunately for you, Bill, you're stuck in the sea discovery world for a while longer. 


So unless you have anything else to add, I think we're at the end of our podcast. I don't think so, not at this time. All right, well, we hope this was useful information. 


We thank everybody for listening. Bill, thank you for joining us on this and the E discovery side. Again, this is data discourse, practical advice and insights about digital forensics and E discovery. 


We hope to see you in future episodes. Thanks everyone. Thank you.