How is Natural Language Search Changing The Face of Legal Research?

Stergios Anastasiadis
April 8, 2019
Legal Research

In order to understand how ROSS’s A.I. is changing legal research, it’s important to understand machine learning, natural language processing, and other A.I. basics. From that basis, we can more meaningfully discuss how those systems are being deployed to improve legal research. In this post, we’re going to discuss the fundamentals of natural language search and how it improves legal queries and legal search results in ROSS.

ROSS’s natural language processing (NLP) allows lawyers to phrase their research queries the way they would phrase a question to a colleague. Having a system that understands and can leverage the semantic and syntactic context of your query  will improve your legal research questions and search results.    

A query optimized with the help of NLP will surface the most accurate and relevant decisions because the system was assessed with the prior queries that yielded the best legal search results. Since the system has analyzed all relevant case law, assistance from optimized queries will deliver high recall (the percentage of relevant decisions returned ) and high precision (the percentage of total decisions returned that are relevant).

The four areas that drive NLP are speech recognition, understanding, generation, and speech synthesis.

  • Natural language understanding (NLU) is narrower in purpose than NLP, focusing primarily on machine reading comprehension: getting the computer to comprehend what a body of text really means. After all, if a machine cannot comprehend the content, it cannot process it.
  • Natural language generation (NLG) is another subset of NLP.  NLG turns data into narratives and reports expressed in easy-to-read language.
  • Speech Recognition (SR) understands or transcribes spoken language
  • and Speech Synthesis (SS) into easy-to-read narratives and reports.

The system allows NLP queries to be typed into a search engine, spoken aloud with voice search or posed as a question to a digital assistant. The goal is to make interactions feel exactly like interactions among humans.

A great overview can be found in this Medium article, "Chapter 9: Natural Language Processing"  by Madhu Sanjeevi. The article provides a good understanding of the moving parts without getting bogged down in technical details.

Now let’s dig into how ROSS makes this happen.

The complex world of Language, Linguistics, Terms, Expansions and Simplifications is outlined nicely by Jocelyn D’Souza in a great 3 minute read: “A dive into Natural Language Processing”.

The lawyer enters a query to find relevant cases. The NLP system interprets the query and “matches” it with legal text.

As you type, your query is processed instantly on three levels:

  • Syntax – understanding the grammar of the text
  • Semantics – understanding the meaning of the text
  • Pragmatics – understanding the context of the text

In milliseconds, the system undertakes the three-level processing and it assesses:

  1. Language representations: representing the meaning of words at the word, phrase, sentence, and passage level. This potentially makes matching a lot easier.
  2. Linguistic features: alternatives are used to help with query complexity. A "major" category is noun vs. verb, but also person and number, plurality, tense.
  3. Term suggestions: legal term suggestions specifically help users write better queries that incorporate legal context.
  4. Expansions: query expansion provides alternate queries to users based on a global analysis, using some form of thesaurus. This can be combined with legal term suggestions.
  5. and Simplifications: gives users feedback on their queries that attempt to eliminate complexity.

ROSS is including 1,2,4 and 5 with “Term suggestions” coming soon. “Expansion” will be treated using the same model as with “Term suggestions” but automated on behalf of the user.

A simple overview of the way Google applies this thinking to its search can be found in the  “Google Inside Search” blog.

Before you hit “Ask the Question?” button it's also helpful to understand how data is stored and retrieved.   

The system contains millions of legal decisions and hundreds of millions of passages that have already been processed by machine learning algorithms. The ingestion of legal data happens daily. The algorithms are trained against a corpus of queries and legal decisions. Once the algorithms meet acceptable statistical thresholds, they are then let loose to perform searches against the millions of decisions and hundreds of millions of passages.

These algorithms are usually referred to as ranking and retrieval algorithms. In the example below all the lightly colored boxes assemble the graph used to narrow down from your query to nodes displayed in the answer cards at the bottom of the illustration. They are fast, accurate and reliable algorithms.

As the tree is traversed, the query is used on each node to create a score, making its way down to the leaf nodes. Each leaf node gets a final score, thus completing the ranking process. The system then returns the top leaf nodes based on their final score. This is called retrieval.

Now hit the “Ask the Question?” button. The query is submitted and voila, the ranking and retrieval algorithms do their magic by analyzing decisions and other data in milliseconds.

It’s an enormous win when lawyers can use their familiar speaking or writing muscles to retrieve authoritative decisions quickly, comprehensively, and accurately.   When NLP is done right, it’s a powerful tool for legal research.

Stergios Anastasiadis

Stergios is the Head of Engineering at ROSS. 25 years in the tech industry, angel investor and wants to make the tech startup community a continued success.