System and method for classifying legal concepts using legal topic scheme

US 6,502,081 B1
Filed: 08/04/2000
Issued: 12/31/2002
Est. Priority Date: 08/06/1999
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented method of building a knowledge base for a legal topic classification system, the method comprising:

inputting a plurality of training documents;

parsing the plurality of training documents to extract classified legal concepts;

extracting features from the legal concepts;

generating relevance scores for each feature; and

storing features, topics, and relevance scores in a knowledge base, using an inverted index.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An economic, scalable machine learning system and process perform document (concept) classification with high accuracy using large topic schemes, including large hierarchical topic schemes. One or more highly relevant classification topics is suggested for a-given document (concept) to be classified. The invention includes training and concept classification processes. The invention also provides methods that may be used as part of the training and/or concept classification processes, including: a method of scoring the relevance of features in training concepts, a method of ranking concepts based on relevance score, and a method of voting on topics associated with an input concept. In a preferred embodiment, the invention is applied to the legal (case law) domain, classifying legal concepts (rules of law) according to a proprietary legal topic classification scheme (a hierarchical scheme of areas of law).

199 Citations

20 Claims

1. A computer-implemented method of building a knowledge base for a legal topic classification system, the method comprising:
- inputting a plurality of training documents;
  
  parsing the plurality of training documents to extract classified legal concepts;
  
  extracting features from the legal concepts;
  
  generating relevance scores for each feature; and
  
  storing features, topics, and relevance scores in a knowledge base, using an inverted index.
- View Dependent Claims (2, 3, 4)
- - 2. The method as set forth in claim 1, the step of parsing comprising the steps of:
3. The method as set forth in claim 1, the step of extracting features comprising the steps of:
- extracting terms, excluding stop words;
  
  extracting legal phrases; and
  
  extracting embedded case citations.
4. The method as set forth in claim 1, the step of generating relevance scores including the steps of:
- converting features to terms;
  
  generating, for each training concept, term frequency (TF) for each term, as number of occurrences of that term in that training concept;
  
  generating, for each training concept, document frequency (DF) for each term, as total number of training concepts in which term appears;
  
  generating inverse document frequency (IDF) for each term; and
  
  generating a relevance score for each term for each concept.

5. A computer-implemented method of building a knowledge base for a legal topic classification system, the method comprising:
- analyzing previously classified legal concepts to determine distinguishing features for each concept;
  
  generating relevance scores for each feature in each training concept; and
  
  storing features, topics, and relevance scores in a knowledge base, using an inverted index.
- View Dependent Claims (6, 7)
- - 6. The method as set forth in claim 5, the step of generating relevance scores including the steps of:
7. The method as set forth in claim 6, wherein the step of generating IDF is performed using the formula, log ((DBSIZE−
- DF+0.5)/(DF+0.05)).

8. A computer-implemented method of processing an input concept from a document text to provide, from a topic scheme, a list of one or more topics that are relevant to the input concept, the method comprising:
- analyzing the input concept to arrive at a set of distinguishing features;
  
  converting candidate concept features to candidate terms;
  
  searching a database of concepts, previously classified according to the topic scheme, for concepts similar to the input concept based on features;
  
  ranking the similar concepts based on relevance score; and
  
  voting on topics associated with the concepts within the database to form the list of topics relevant to the input concept.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method as set forth in claim 8, the step of ranking including the steps of:
10. The method as set forth in claim 9, the step of ranking further including, before the step of retrieving, the steps of:
- sorting candidate terms by document frequency (DF) of each term, as number of knowledge base training concepts in which term occurs; and
  
  reducing candidate term list to least common terms.
11. The method as set forth in claim 8, the step of voting including the steps of:
- retrieving topics associated with each training concept from a knowledge base;
  
  grouping training concepts and scores by associated topics;
  
  calculating a total topic relevance score for each topic, as a sum of training concept scores for each topic; and
  
  sorting topics by total topic relevance score to create a topic list.
12. The method as set forth in claim 11, further comprising, within a hierarchical topic scheme, the steps of:
- grouping topics by tier;
  
  weighting the topic list according to number of occurrences of each tier topic;
  
  generating a final topic list using the weighted topic list; and
  
  sorting the final topic list by tier.
13. The method as set forth in claim 11, the step of sorting including comparing each total topic relevance score to a threshold and eliminating from the topic list those topics having a total topic relevance score below the threshold.
14. The method as set forth in claim 11, the step of sorting including the steps of:
- determining a number of times each topic occurs;
  
  comparing the number to a threshold; and
  
  eliminating from the topic list those topics having a number of occurrences below the threshold.

15. A computer-implemented method of processing an input concept from a document text to provide, from a topic scheme incorporating a plurality of training concepts, a list of one or more topics that are relevant to the input concept, the method comprising:
- retrieving topics associated with the training concepts from a knowledge base, the training concepts having been previously classified and scored in accordance with the topic scheme;
  
  grouping training concepts and scores by associated topics;
  
  calculating a total topic relevance score for each topic, as a sum of training concept scores for each topic; and
  
  sorting topics by total topic relevance score to create a topic list relevant to the input concept.
- View Dependent Claims (16)
- - 16. The method as set forth in claim 15, further comprising, within a hierarchical topic scheme, the steps of:

17. A computer-implemented method of processing an input concept from a document text to identify, within a knowledge base incorporating a plurality of training concepts, concepts similar to the input concept and to rank these similar concepts, the method comprising:
- identifying features of the input concept as candidate terms;
  
  retrieving, from the knowledge base, relevance scores for training concepts similar to the input concept;
  
  calculating a total relevance score for each retrieved training concept, as a sum of candidate term relevance scores for that concept; and
  
  sorting retrieved training concepts by total relevance scores.

18. A computer-implemented method of building a knowledge base for a legal topic classification system by identifying features within previously classified training concepts and generating relevance scores for these features, the method comprising the steps of:
- converting the features into terms;
  
  generating, for each training concept, term frequency (TF) for each term, as number of occurrences of that term in that training concept;
  
  generating, for each training concept, average term frequency (AVE_TF) of terms;
  
  generating, for each training concept, document frequency (DF) for each term, as total number of training concepts in which term appears;
  
  determining training set DBSIZE as total number of training concepts in the knowledge base;
  
  generating inverse document frequency (IDF) for each term; and
  
  generating a relevance score for each term for each concept.
- View Dependent Claims (19, 20)
- - 19. The method as set forth in claim 18, wherein when a length of a current concept, doclength, is greater than an average length of concepts in a set, aveDocLength, the relevance score is calculated using the formula TFwt×
    - IDF,
20. The method as set forth in claim 18, wherein when a length of a current concept, doclength, is less than or equal to an average length of concepts in a set, aveDocLength, the relevance score is calculated using the formula TFwt×
- IDF,where $TFwt = \frac{TF + TF / AVE_TF}{\begin{matrix} TF + TF / AVE_TF + \\ 2 (α + β \times (aveDocLength - doclength + 1) / aveDocLength) \end{matrix}}$ and IDF=log ((DBSIZE−
  
  DF+0.5)/(DF+0.05)).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RELX Inc. (RELX PLC)
Original Assignee
LexisNexis Group Inc. (RELX PLC)
Inventors
Peck, James M., Ahmed, Salahuddin, Humphrey, Timothy L., Lu, X. Allan, Wiltshire, James S. Jr., Morelock, John T.
Primary Examiner(s)
Black, Thomas
Assistant Examiner(s)
Hirl, Joseph P

Application Number

US09/633,266
Time in Patent Office

879 Days
Field of Search

706/45, 706/46, 706/12, 707/500, 700/90
US Class Current

706/12
CPC Class Codes

G06F 16/353   into predefined classes

G06N 5/025   Extracting rules from data

G06Q 10/10   Office automation; Time man...

G09B 7/00   Electrically-operated teach...

System and method for classifying legal concepts using legal topic scheme

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

199 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for classifying legal concepts using legal topic scheme

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

199 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links