Understanding Quality Measurement in the Legal Environment- by R. Sam Gilcrist
NALSM VIEWS, National Association of Litigation Support Managers, Spring 2004

As case complexity increases outsourcing (arguably) enables law firms to improve efficiencies, decrease costs, and utilize specialized skills that may not be available with smaller cases. On the other hand, firms lose the craftsmen's control exercised on smaller, less complex matters. To maintain control of these complex, often partially outsourced cases, measures need be taken to allow suppliers and consumers of legal services to measure the quality of their work. This enables firms to maximize outsourcing efficiencies while maintaining the level of excellence that clients expect. To this end, I am presenting one simple, tested method for determining quality, available to both the managing attorney and the supplier of legal work. It produces a fact-based, reproducible measurement used to determine whether the outsourced product meets the given standard of excellence or whether the firm is dealing with a substandard product that needs to be reworked.

Before we start, I would like to address attitude and the desire to excel. I doubt there is a provider of legal data who knowingly produces anything less than perfect workmanship. Yet in this world of imperfection, we know errors are inevitable. We know that humans fall asleep, that machines misread text. Because of inevitable errors, practical quality measurement techniques need to be taken by the legal professional to understand the true quality of the work they produce and use.

There are a number of ways to do this wrong, all generally listed under the category of Quality Control, or "QC" as it is generally known. Under this banner, well-intentioned firms advocate and implement expensive, inadequate, and occasionally frivolous document inspections, data checks and overall hand wringing that achieve no defensible objective. One very common method of nonproductive QC is to check "a bunch" of the work to make sure it is right. Worse is the firm who employs a specialized person to check all or some of the work, all the while having no proscribed technique to determine what to inspect, how much to inspect, and how to determine whether to accept or reject the work in front of them. In both cases time, effort and money flow freely down a drain, while no value is added to the product and no meaningful work is done.

Having taken this shot at my well-intentioned colleagues, it is time to describe a system that data providers and consumers can use to prove that they have produced the high quality work promised. Additionally where conflict arises, it can be used to defend or refute work quality either in court or at the bargaining table. Having promised to deliver, let me introduce - or perhaps reintroduce - the legal industry to one of oldest, simplest methods to measure work quality, namely MIL-STD-105.

MIL-STD-105 (pronounced "mil standard 105") has been a manufacturing standard since World War II. Moreover, since it is straightforward and easy to implement, it maintains a stalwart position in modern MBA operational science courses, along with the more theoretical and statistically complex theories that make this simple system work.

MIL-STD-105 has two defining features:

1. It is easy to use.
2. It is repeatable and defensible.

Originally the standard was designed to create a non-arbitrary, meaningful and repeatable method of determining whether products supplied to the U.S. Armed Services met agreed quality standards. As in many situations, the services reduced the complexity of quality measurement into two simple tables that are both easy to use and technically concise. However, to implement this standard, several terms need to be defined.

  1. ACCEPT QUALITY LEVEL (AQL). This is the standard of perfection to which both the supplier and producer agrees. It originates as a percentage of expected quality, such as 97 to 99.999 % compliant, and is quickly translated into the number of defects (or errors) that will be allowed in a batch of work before the whole batch is sent back to the producer for rework (presumably at the producer's expense).

  2. ATTRIBUTES. This is probably the most misunderstood term in quality measurement. Attributes are the measurable features that define the product. In the case of a 12-foot long 2x4, measurable attributes would include the height of the board (2 inches), the width of the board (4 inches) and its length. In the legal world, measurable attributes might include coded fields, data extracted, or text legibility.

  3. DEFECTS: These are any deviation outside of the standard set for each attribute. In the 2x4 example, the actual standard may state that widths greater than 1.75 inches or less that 1.5 inches are not acceptable (i.e., defects). In legal work, a defect could include a misspelled name, a missing field entry, an incorrectly entered data item or an image that is not accessible or useable.

  4. LOT SIZE: Lot size is the quantity of items produced. It might be 130,000 pages imaged or 50,000 docs coded.

  5. SAMPLE SIZE: This is the quantity of items to be measured, and is determined by the Lot size and the AQL.

  6. ACCEPT / REJECT CRITERIA. This is both the simple strength of MIL-STD-105 and the characteristic that allows this standard to withstand cross-examination. Based on the AQL established during negotiations, it explicitly states the number of defects allowable in an acceptable lot of data or images, or such, without losing confidence that the job is as good as expected or, perhaps, as good as humanly possible. Conversely, it describes the point at which statistical confidence is lost and the job cannot be accepted. Rigid, fact-based Accept/Reject criteria, along with the predetermined sample sizes, differentiate MIL-STD-105 from the well-intentioned QC program described earlier.

With this standard, there are no "redo's" or "maybe I should inspect a few more." The test is statistically sound and designed in such a way that a minimum number of samples (i.e., a known, minimum cost) can provide accurate and meaningful description of the over-all quality of the work on hand. This does not mean that attorneys will not debate the issue. (Heaven forbid for those of us who support you.) Rather it provides a concrete, reproducible test that both the consumer of data and the provider of data can implement to ensure themselves that they are producing the quality of work expected and advertised.

Implementing MIL-STD-105, A Case Study
Having reviewed the merits of using MIL-STD-105, I would like to create a simple, realistic case study that we can use to learn how to implement the tool. As an example, let's assume the following:

  1. This is a coding job with 5 fields to be coded per document.
  2. There are 15,232 documents to be coded.
  3. The agreed AQL is 99.85%. That is there can be no more than 15 documents with one or more incorrect entries per 10,000 documents. Finally this correlates to a defect rate of no more than
    0.15% (100.00 - 99.85).
  4. Finally, let's assume that we are dealing with a supplier, or internal department, that has a history of good, but not perfect, work.

Using our example, before we review the first item, we need to make a tactical choice: do we count each field individually, or do we count documents? In the first case, our batch size would be 5 fields times 15,232 documents, or 76,160 items in the batch. In the second the batch size is 15,232 documents. In our example, I am really interested in how well each document is coded, so I am going to choose to view the document pool as the batch. Having made this decision, I am now required to look at each sample document in its entirety, meaning that we as the inspect team will need to verify that each of the 5 required fields is coded correctly. If any field is coded incorrectly, then I need to reject the document. If they are all correct, then the document passes.

Similar reasoning could be used to view each field as a single entity. In that case, one incorrect field would be an error out of a batch size of 76,160.

Determining the Correct Inspection Level
Table-1 Sample size code letters

To determine how many documents to inspect, we need to establish our inspection level. In Table I, you will notice there are a variety of inspection levels available including three general inspection levels and 4 special levels. As in many cases, we will start in the middle and work outwards. According to the American Society for Quality (ASQ), level II - called normal inspection - is appropriate for unknown suppliers and for suppliers of modest quality. Consequently, we will use this level in our example; however, the standard is designed to be fluid, so inspection levels may change over time depending on how the quality of the work changes over time. For example the ASQ states that if 10 lots are inspected with no errors then sampling can be reduced from normal level II to a reduced level I. On the other hand if two of 5 jobs are rejected, then inspection should be tightened from level II to level III until 5 consecutive jobs are accepted. Finally if any job is rejected from level I, inspection automatically returns to level II. The other levels, the special levels S1 - S4, are for very small jobs, and in our case, we would be doing this work in-house, probably reviewing everything in its entirety rather than sampling or coding.

Knowing the batch size, in our case 15,232, and our inspection level, normal level II, we use Table I to determine the number of samples we need to inspect. We find this by following down the left hand column, until we find that 15,232 falls between 15,001 and 500,000. Reading across the top of the table we find general inspection level II. Locating the intersection of our row and column, we determine that our batch size is "P." So how many is "P"? To answer this, we need to make one more look up. With "P" written down on the back of our hand, we go to Table II to determine the sample size as a real number. By following down the left-hand column we see "P." Just to the right of P, we find that we need to look at 800 documents.

At this point we can see why "choose a few" is totally inadequate as a QC measure. From my experience, very few firms would actually review 800 documents to prove they are really 99.85% accurate. Most likely they would review a few score and call it a day. Yet to have the kind of accuracy demanded, 800 is the inspection size required. On the other hand, other firms might attempt to inspect 15,000 documents. Eight hundred is a lot less than 15,000, and makes for just as reliable an inspection for a number of reasons, chiefly inspection fatigue.

Having learned our inspection lot size, and knowing that we are looking for 99.85% accuracy (or a .15% error), we read across table II-A to 0.15 and find that in those 800 documents, we can find as many as 3 with errors and still accept that job; however if we find 4 or more errors, then we cannot state with any certainty that this job is good enough, and the whole job needs to be re-done and resubmitted. The lines pointing up and down indicate that if we are reading across and do not find a number in our row, we either skip up or down to the numbers provided. This allows one simple table to handle the very widest possible number of AQL and sample sizes and still remain uncluttered.

Inspecting the Job
Knowing our sample size, the job of inspecting is very simple.

  1. Grab 800 documents at random. Review them to make sure that each document is coded correctly.
  2. Record the number of error-free documents, and record the number of documents with errors.
  3. Compare the results to the requirements of Table II-A.
  4. Accept or reject the job. That is if 3 or less documents with errors are found then - correct the errors of course - and accept the job; otherwise, consider the job as meeting standard, and ship it.

At this point we have completed our review of MIL-STD-105; however, there are two matters left: gathering a truly random sample and record keeping. Our first impulse may be to simply grab 800 documents, but this almost always favors some attribute or person (like sampling only file boxes on top and in the aisle). To prevent this unintentional skewing of the results, there are a couple of simple ways to get random numbers. Before computers, random number tables were commonly available. (Perhaps they still are.) But today, with computers on almost every desktop, generating a custom random list is fairly simple task that we can do ourselves.

Creating a Random Number Table
There are a number of ways to create random number tables, but you can use the description below to create a custom, random number table in MS Excel that admittedly it is not perfect, but it is certainly good enough. Let's begin.


  1. Type the random number function "=rand()" into the first cell (A1). This generates a number between 0 and 1.
  2. In cell B1, multiply A1 times our lot size of 15,232 (=A1 * 15323) to get a document or record number in the appropriate range.
  3. Copy A1 and B1, and highlight down 800 cells to get 800 random numbers between 1 and 15,232.

We now have a valid random number table. The remaining steps are optional but helpful in creating a functional spreadsheet that we can use as an inspection document:

  1. Sort the list so that the numbers are in the same order as the documents. This will make pulling the documents a lot easier. To do this we need to turn the auto-calculate option off or else we will simply get another unsorted list of new random numbers. (I did this a couple of times writing this paper, so I know from experience.) To toggle auto-calculation look under Tools>Options>Calculation and set the option to manual.

  2. After turning auto-calculation off, the list sorts correctly; however, the list will still recreate itself each time we reopen the spreadsheet or hit F9. This will completely destroy any traceability, which we will need if we ever go back and review our inspect results. To freeze the record numbers or DocID, one simple solution is to export the values into the csv format and then re-import back into Excel or Access. This erases all reference to the rand function and makes the numbers permanent. (Optionally, we could have set the auto-calculate to not re-calculate before save, but this strategy is too risky when dealing with records we are going to retain for any period of time.)

  3. Either before of after we freeze the numbers, use the cell format function to eliminate the decimal values. They don't mean anything in our context, so they should be eliminated to avoid confusion. This is done by selecting the column B and then setting the decimal places to 0. You can find this under Format>Cells>Number and typing 0 into the decimal places box.

  4. Finally, don't forget to turn auto-calculate back on or else the rest of your spreadsheets will not update like you expect. (This will cause a lot of head scratching the first few times the spreadsheet doesn't update as expected.)

Having gone to the trouble of creating and formatting a random list, we might as well use it to record a few numbers. This will provide meaningful traceability and a way to prove that we did what we said. In our case, we already have the Document number 1-800. In another case this might be a bates number or a DocID. In all cases it should be the unique identifier that tells us precisely which documents we reviewed. From here we could add:

  1. Personal ID and client information: This would include the name (or names) of the inspector(s), the date the results were tested and which matter we are inspecting.
  2. Pass / Fail Results: Indicate any records that contain errors with a check mark or perhaps with the inspector's initials. To avoid excessive documentation, I would indicate accept as a blank.
  3. Error Description: This would include the reason for rejecting the document, like "address misspelled" or "incorrect author." This information is valuable for understanding typical errors and error sources even when the project meets the AQL.
  4. Results summary: This would include number passed, number failed, and whether the batch is accepted or rejected.

Meaningful Results
Having taken all of these steps - pulling the applicable number of documents at random, reviewing the documents and recording our results - we now stand ready to state with authority that the work we are producing or purchasing meets a known quality standard. If there is a question about whether the job should be reworked, the consumer and the producer of the data can review the test documents and see the exact results of the original testing. Finally, and hopefully in most cases, both the consumer and producer of the legal service will have a meaningful, repeatable measurement that proves the work product meets the level of quality expected. This, in turn, is a vital first step in understanding and improving overall quality for our clients.

The inspection measurement technique reviewed in this article is not the most complete inspection tool available to modern litigation support managers and attorneys. Moreover, there are valid criticisms of using MIL-STD-105. The primary complaint is that it does not in any way improve quality. It simply measures each batch pushed through the process. Still, this standard has enough strength that it continues after 60 years of implementation to remain a mainstay of quality measurement for several reasons:

  1. It is simple enough that it can be implemented with a very little training.
  2. The technique is valid on both big and small batches.
  3. The results are definitive rather than arbitrary, and
  4. The results are reproducible.

In summary MIL-STD-105 provides a simple means of measuring quality on a day to day basis without involving complex math or training, while at the same time creating a just and reproducible quality measurement system that employees and clients can understand.

Further Reading
*Quality Council of Indiana. Terre Haute, Indiana. The Quality Council of Indiana provides a definitive and useful guide for serious students of product and process quality who are interested in passing upcoming ASQ exams. They are found on the web at www.qualitycouncil.com.

* American Society for Quality (ASQ) is a professional association "advancing learning, quality improvement and knowledge exchange to improve business results."
Additionally they provide examination and testing for nationally recognized certifications such as Certified Quality Engineer, Certified Quality Manager and several others. They may be found at www.ASQ.org.

* MIL-STD-105E is available through the Navy Publishing and Printing Service Office and is sold by any number of technical publication vendors. You can also locate various editions of the standard in PDF by running a search on http://assist.daps.dla.mil/quicksearch.

R. SAM GILCRIST is an independent litigation technical consultant who does trial work through Litigation Tech and who does computer consulting and e-discovery consulting through Gilcrist.com. He is completing a BS in Computer Science and holds a BS and MBA in management from Georgia State College, Augusta. You may reach Sam for trial work at sgilcrist@litigationtech.com or for computer programming or further information on forensic production at sam@gilcrist.com.