AI / ML Product Design
Experimenting Towards an Automated Law Clerk

Delve into the challenges and breakthroughs in creating an AI product that acts as a clerical bot, aiming to revolutionize pretrial discovery by saving time and reducing manual effort.
Two men sorting documents in a pre-trial document dump
"So, tell me about this SaaS idea of yours so we can take it for a spin," I said.

In April 2016, I met a junior associate at a global law firm who was deeply involved in the St. Louis software startup scene. As we shared an interest in emerging SaaS models, he invited me to explore ways technology could support document filing and management for an ongoing case. With NDAs in place to protect client confidentiality, Matt and I decided to meet and discuss collaborating on this exploratory project.

Pastaria
We were seated for lunch at Pastaria, a St. Louis favorite of mine because of its striking branding by Atomicdust and cuisine courtesy of owner and three-time James Beard nominee, Gerard Craft.

Background

"One of the things I've learned as a junior associate," Matt began, his gaze fixed on his salad with a look of determined frustration, "is how much time an opposing legal team can waste with a document dump. It’s a pre-trial tactic intended to overwhelm us, often leading to missed details and extended litigation periods—not just a massive drain on the firm's resources, but on our ability to effectively serve our clients."

His voice carried a blend of exasperation and excitement as he leaned forward. "A buddy of mine from college is pursuing a CS degree and already experimenting with our tech stack—let me CC you into our email thread."

I ate a few bites of my caesar as Matt tapped at his phone screen for a few moments, then looked up expectantly. "Hey, do you think you could show me exactly what a document dump looks like?" I asked.

Downtown, Clayton, MO
We walked around the block to an auxiliary office, temporarily rented to manage pretrial discovery.

As we approached the office, Matt chuckled, half in resignation. "We had to rent extra space just to manage this case," he explained as he opened the door to a modest commercial office. The room was stark—a temporary setup with a cluster of folding tables in the corner covered with towering stacks of paperwork, and a scanner whirring quietly in the background.

"Welcome to the chaos," he said, gesturing towards the bustling room. Three legal assistants sat heads-down typing on laptops among piles of documents, occasionally pausing to scan pages using the networked computers. Their percussive rhythm of mouse clicks, scanner motors, and focused activity was punctuated by the occasional snap of a legal binder closing.

"This," Matt waved his hand across the room, "is what we call a document dump in action. Imagine sifting through all of this manually for nuggets of relevant information."

The practical demonstration of the problem was more impactful than any description could have been. As we wove between the tables, the scale of the task at hand was unmistakably clear. Each box represented hours of potential work—work that Matt was convinced could be streamlined significantly.

Robert "Double Diamond" Duebelbeis Logo ©1980

The Problem

\( C = D \times E \times T \times W \)
Enter value in hours
Enter value in dollars
Total cost of manual labor

Existing OCR solutions often fall short in accuracy, particularly with irregularly scanned documents, requiring significant manual labor for corrections.

The Hypothesis

By training machine learning algorithms, we can significantly reduce error rates associated with Optical Character Recognition (OCR) document processing in pretrial discovery. Specifically, we believed that for every point reduction in OCR error rates, there will be a compounding cost savings of $0.15 per document, leading to enhanced efficiency and accuracy in legal document handling.

Our Approach

Our team was distributed and fully remote, relying on various tools to collaborate effectively. During our series of discussions via Google Hangouts and GitHub, the three of us meticulously planned the OCR training loop process. I decided to focus on ensuring we collected accurate data through a training UI, while Ryan, our software engineer, took charge of the backend and machine learning components. Matt was our lead-user and subject matter expert, ultimately responsible for helping our team navigate the problem space.

"Ok, now who is going to be training this?," I asked.

User Research

Our primary users were paralegals working in pretrial discovery—they alone would help us understand the necessary features for an OCR correction tool. To know them, I needed to delve into their daily operations.

A journey map made up of colorful sticky notes on a wooden surface. The top row outlines the steps of the process, including 'Load Scanner,' 'Configure OCR Tool,' 'Initiate Scanning,' 'Wait for Scans,' 'Manual Review,' and 'Filing Documents.' The middle row uses emoji-style faces to represent the emotions felt at each step, ranging from neutral to frustrated. The bottom row outlines key elements and outcomes associated with each step, such as 'Scanner & Papers,' 'Remove Bottleneck,' and 'Decrease Costs.' The layout visually illustrates the workflow and the emotional experience of the process."

During some visits to that auxiliary office, I got the chance to map out the work of three paralegals as they loaded document scanners with thousands of papers and configured an OCR tool to automatically transcribe the text onto a networked server. They were extremely candid, happy to answer my questions, and shared several ideas to make their work more efficient. The manual review process was the team's main constraint and source of pain.

Technical Requirements

I relied on Ryan to help make sense of the technical aspects of our work. He provided me with a sample OCR output in JSON format from a scanning result obtained from our test document. The essential artifacts for conducting a manual review were all there: the original document, coordinates and dimensions for low-confidence passage highlights, and transcribed text awaiting correction.

{
  "document_id": "12345",
  "pages": [
    {
      "page_number": 1,
      "blocks": [
        {
          "text": "ST. MARY'S OF MICHIGAN: WASHINGTON AVE. EMERGENCY RECORD",
          "confidence": 95,
          "coordinates": { "x": 10, "y": 20, "width": 300, "height": 50 }
        },
        {
          "text": "Patient: John Doe",
          "confidence": 82,
          "coordinates": { "x": 10, "y": 100, "width": 150, "height": 20 }
        },
        {
          "text": "Date of Birth: 01/01/1970",
          "confidence": 78,
          "coordinates": { "x": 10, "y": 130, "width": 200, "height": 20 }
        }
      ]
    }
  ]
}

My challenge was finding an open-source technology that would allow me to make use of this data to re-create the low-confidence scenario for a human reviewer. I was inspired by my previous experience with 4/4 print press setup creating negative film for copper-plate printing. It involved separating layers of colors (CMYK) in design layouts into four different plates that would be over-printed. Registration marks are critical to this process to ensure the colors print in the right places. Using a similar approach, I could layer the original document with highlighted areas indicating low-confidence OCR results, ensuring the reviewer could focus on these specific sections.

“Yellow sticky note with a black drawing depicting a wide-open mouth and two lines above it, suggesting an expression of frustration or anger. This represents a critical pain point in a machine learning journey map.”  This moment could be when a user encounters a complex or confusing part of the process, such as struggling with the usability of the machine learning tool, facing unexpected errors, or dealing with the lack of clear instructions.

Prioritizing "Moments That Matter Most"

One of our first spirited disagreements occurred over implementation. Matt saw immediate value in providing basic Cloud OCR access and wanted to prioritize document upload features. However, we ultimately decided this was a "pipeline dependent" feature—it's success was dependent upon a reason to automate which presumed success of a trained model—which we didn't yet have.

If we wanted accurate training data, we were going to need to change attitudes around the manual review process with the right design. We agreed to focus on what was making our users scream: the endless pile of documents awaiting manual transcription.

SolutionA Machine Learning Pipeline

I animated this pipeline diagram in Figma because I'm a nerd, but I sorta feel like all pipelines need to be animated for full effect.

P‍r‍oof of ConceptHuman-Centered ML Training

I designed our first UI as a web app using HTML5, CSS3, and vanilla JavaScript. To ensure it integrated seamlessly into existing workflows and ensure that model trainers felt productive and that their time was respected, my goal was no installation and limited instruction.

To ensure that the manual correction tool was seamlessly integrated into the workflow, I designed it to be used during the OCR processing wait time. This approach allowed paralegals to perform corrections in parallel with OCR processing, maximizing their productivity and reducing idle time.

Github
By highlighting the low-confidence OCR text areas directly on the document, trainers immediately identify where their attention is needed. This visualization significantly reduced the time spent searching for errors.
By highlighting the low-confidence OCR text areas directly on the document, trainers immediately identify where their attention is needed. This visualization significantly reduced the time spent searching for errors.
Document review often requires scrutinizing small text details, especially when dealing with poor-quality scans. Adjustable zoom levels enables focus on specific areas, improving accuracy in corrections.
Document review often requires scrutinizing small text details, especially when dealing with poor-quality scans. Adjustable zoom levels enables focus on specific areas, improving accuracy in corrections.
By categorizing errors (e.g., 'Misread characters', 'Incomplete text') provided crucial data for refining the OCR model. This feature was based on the understanding that detailed error data is essential for iterative improvement.
By categorizing errors (e.g., 'Misread characters', 'Incomplete text') provided crucial data for refining the OCR model. This feature was based on the understanding that detailed error data is essential for iterative improvement.

Results

The team conducted two phases of manual corrections to train the ML model. We established a baseline threshold using Google Cloud OCR and flagged all transcription confidence scores lower than 86% for manual correction.

With a custom web interface, we accurately collected and categorized each correction made by our human reviewers, segmenting errors by type (e.g., misreads, incomplete transcriptions, formatting issues). After applying these corrections, the OCR model was retrained with the refined data. We re-evaluated the model’s accuracy using a separate validation set not included in the initial training.

250 Document Training Set
In this initial phase, we focused on a smaller sample set to evaluate baseline improvements. Corrections were meticulously recorded and analyzed, with the most common issues related to poor-quality scans and non-text artifacts. Our preliminary goal was to achieve a 3% improvement in accuracy from this set.
1,000 Document Validation Set
To validate our early improvements, we scaled to a 1,000-document set. This expanded training pool enabled the model to handle a broader range of document types and quality. Using the training UI, each paralegal worked on distinct document sets to diversify the data and categorizing corrections to ensure robustness.

\[ \text{Accuracy Improved} = 7\%\]

\[ \text{Cost Savings} =  \$1.05\text{ per document}\]

The expanded training set enabled the model to handle a broader range of document types and quality levels. By categorizing errors, trainers provided nuanced data that significantly enhanced the model's learning process. For example, by categorizing errors by font type, document quality, and specific sections (headers, body text), the model could apply corrections more intelligently.

Key Learnings

Our product demonstrated significant improvements in processing efficiency and accuracy through the integration of advanced OCR and ML technologies. By involving users in both the design and training process, the system quickly evolved to handle document dumps more effectively, ultimately reducing the manual effort required and allowing legal teams to focus on more critical tasks.

Human-Centered Loop Approach
Engaging users in the training process proved essential. Paralegals' involvement ensured corrections were accurate and relevant, leading to high-quality training data that significantly improved the ML model.
Hypothesis-Driven Design
Setting clear, measurable goals upfront allowed us to focus on the most impactful features. Prioritizing assumptions testing over feature buildout ensured the tool evolved effectively.
High-Quality Training Data
Scaling from 250 to 1,000 documents confirmed that accurate and diverse training data are crucial for model robustness and performance, directly reducing error rates.
Respecting Users' Time and Expertise
Providing a high-quality tool increased user engagement and motivation, leading to better training data and more efficient ML model development.

Future Considerations

Following our success an unexpected challenge arose regarding intellectual property ownership. Because the project was developed using our sponsor’s cases, personnel, and resources under an informal, non-contractual arrangement, new terms around IP ownership emerged. These complexities ultimately became a barrier, making it difficult for our engineer to proceed with further development.

Many of us will find ourselves training computers to do our work within our lifetimes. They key to success in the design of any automation is the careful consideration of humans—treating everyone involved with respect and dignity.

Automating Categorization
Expanding the model to automatically categorize complex document types will enhance its versatility, allowing it to handle diverse formats and varying quality with high accuracy.
Further Reducing Error Rates
Continued refinement of the model, focusing on reducing error rates, will involve fine-tuning algorithms and incorporating more data to maintain and improve model performance.
Continuous UI Improvement
Ongoing UI enhancements, such as improved annotation tools, will ensure the tool remains user-friendly and effective, accommodating evolving user needs.
Monetization Strategy
Building on demonstrated cost savings and efficiency, developing a monetization strategy, including subscription models or partnerships, could position us well in the market.
Do you need help finding the right people within your organization to experiment with AI/ML and process automation?