System, method, and software for identifying historically related legal cases
First Claim
1. A computerized method implemented using a processor and memory, the method comprising:
- extracting information from a first input document;
retrieving one or more second documents based on the extracted information;
identifying one or more of the second documents as more probably related to the first input document than one or more of the other second documents using a learning machine; and
wherein the step of identifying the one or more of the second documents includes defining a multi-dimensional feature vector for each second document, with the vector having a set of features including a similarity feature indicating similarity of at least a portion of a second document to a portion of the first input document and processing the multi-dimensional feature vector using support-vector processing wherein each of the feature vectors is based on a title similarity score for one or more portions of a second document and a said first input document.
5 Assignments
0 Petitions
Accused Products
Abstract
The American legal system, judges and lawyers are continually researching an ever-expanding body of past judicial opinions, or case law, for the ones most relevant to resolution of new disputes. To facilitate these searches, some companies collect and publish the judicial opinions of courts across the United States in both paper and electronic forms, with some of the cases containing references to prior cases from other courts that have previously ruled on all or part of the same dispute. Identifying the prior cases is problematic, because, for example, conventional computer text-matching not only suggests too many non-prior cases, but also misses too many actual prior cases. Accordingly, the present inventors devised systems, methods, and software that generally facilitate identification of one or more documents that are related to a given document, and particularly facilitate identification of prior cases for a given case. One specific embodiment retrieves prior-case candidates based on information extracted from an input case, and then uses a support vector machine to determine which of the prior-case candidates are most probably prior cases for the input case.
-
Citations
26 Claims
-
1. A computerized method implemented using a processor and memory, the method comprising:
-
extracting information from a first input document; retrieving one or more second documents based on the extracted information; identifying one or more of the second documents as more probably related to the first input document than one or more of the other second documents using a learning machine; and wherein the step of identifying the one or more of the second documents includes defining a multi-dimensional feature vector for each second document, with the vector having a set of features including a similarity feature indicating similarity of at least a portion of a second document to a portion of the first input document and processing the multi-dimensional feature vector using support-vector processing wherein each of the feature vectors is based on a title similarity score for one or more portions of a second document and a said first input document. - View Dependent Claims (2, 3, 4)
-
-
5. A computerized method for retrieving documents, the method implemented using at least one processor and memory and comprising:
-
searching for one or more second documents based on at least one input document; identifying one or more second documents as more probably related to the at least one input document more than one or more of the other second documents using a learning machine; and wherein identifying the one or more of the second documents includes defining a multi-dimensional feature vector for each second document, with the vector having a set of features including a similarity feature indicating similarity of at least a portion of a said second document to a portion of the at least one input document and processing the multi-dimensional feature vector using support-vector processing, wherein each of the feature vectors is based on a title similarity score for one or more portions of a said second document and the at least one input document. - View Dependent Claims (6, 7, 8, 9, 10)
-
-
11. A computerized method for identifying related documents, the method implemented using a processor and memory and comprising:
-
receiving an input document; searching at least one database for a set of one or more related second documents based on content of the input document; identifying one or more of the related second documents as more probably related to the input document than one or more of the other related second documents using a support vector machine; and wherein identifying the one or more of the related second documents includes defining a multi-dimensional feature vector for each related second document, with the vector having a set of features including a similarity feature indicating similarity of at least a portion of the related second document to a portion of the input document and processing the multi-dimensional feature vector using support-vector processing wherein each of the feature vectors is based on a title similarity score for one or more portions of a said related second document and a said input document. - View Dependent Claims (12, 13)
-
-
14. A system comprising:
-
means, including a processor and memory, for extracting information from a first input document; means, including a processor and memory, for retrieving one or more second documents based on the extracted information; a learning machine for identifying one or more of the second documents as more probably related to the first input document than one or more of the other second documents; and wherein the learning machine comprises;
support-vector processor means, including a processor and memory, for defining a multi-dimensional feature vector for each second document, with the vector having a set of features including a similarity feature indicating similarity of at least a portion of the second document to a portion of the first input document wherein each of the feature vectors is based on a title similarity score for one or more portions of a said second document and a said first input document. - View Dependent Claims (15, 16, 17)
-
-
18. A computerized system for retrieving documents, comprising:
-
means, including a processor and memory, for searching one or more second documents based on at least one input document; a learning machine for identifying one or more of the second documents as more probably related to the at least one input document than one or more of the other second documents; and wherein the learning machine comprises;
support-vector processor means, including a processor and memory, for defining a multi-dimensional feature vector for each second document, with the vector having a set of features including a similarity feature indicating similarity of at least a portion of a said second document to a portion of the at least one input document wherein each of the feature vectors is based on a title similarity score for one or more portions of a said second document and the at least one input document. - View Dependent Claims (19, 20, 21, 22, 23)
-
-
24. A system for identifying related documents, the system comprising:
-
means, including a processor and memory, for searching at least one database for a set of one or more related second documents based on content of an input document; a support vector machine for identifying one or more of the related second documents as more probably related to the input document than one or more of the other related second documents; and wherein the support vector machine comprises;
support-vector processor means, including a processor and memory, for defining a multi-dimensional feature vector for each related second document, with the vector having a set of features including a similarity feature indicating similarity of at least a portion of the related second document to a portion of the input document wherein each of the feature vectors is based on a title similarity score for one or more portions of a said related second document and the input document. - View Dependent Claims (25, 26)
-
Specification