Method and system for describing and identifying concepts in natural language text for information retrieval and processing
First Claim
1. A method of information retrieval, performed on a computer system that matches text in documents against user-defined descriptions of concepts, comprising:
- a) identification of linguistic entities in the text of documents;
b) annotation of said identified linguistic entities in a text markup language to produce linguistically annotated documents;
c) identification of concepts using linguistic information, where said concepts are represented in a concept specification language and said concepts occur in one of;
1) said text of documents in which linguistic entities have been identified;
or 2) said linguistically annotated documents;
or 3) said stored linguistically annotated documents;
d) annotation of said identified concepts in said text markup language to produce conceptually annotated documents;
e) checking user-defined descriptions of concepts represented in said concept specification language; and
f) retrieval by matching said user-defined descriptions of concepts against said conceptually annotated documents.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for information retrieval that matches occurrences of concepts in natural language text documents against descriptions of concepts in user queries. Said method, implemented in a computer system, includes a preferred version of the method that comprises (1) annotating natural language text in documents and other text-forms with linguistic information and Concepts and Concept Rules expressed in a Concept Specification Language (CSL) for a particular domain, (2) pruning and optimizing synonyms for a particular domain, (3) defining and learning said CSL Concepts and Concept Rules, (4) checking user-defined descriptions of Concepts represented in CSL (including user queries), and (5) retrieval by matching said user-defined descriptions (and queries) against said annotated text. CSL is a language for expressing linguistically-based patterns. Said patterns can represent the linguistic manifestations of concepts in text. Said concepts may derive from the sublanguages used by experts to analyze specialized domains including, but not limited to, insurance claims, police incident reports medical reports, and aviation incident reports.
228 Citations
93 Claims
-
1. A method of information retrieval, performed on a computer system that matches text in documents against user-defined descriptions of concepts, comprising:
-
a) identification of linguistic entities in the text of documents;
b) annotation of said identified linguistic entities in a text markup language to produce linguistically annotated documents;
c) identification of concepts using linguistic information, where said concepts are represented in a concept specification language and said concepts occur in one of;
1) said text of documents in which linguistic entities have been identified;
or2) said linguistically annotated documents;
or3) said stored linguistically annotated documents;
d) annotation of said identified concepts in said text markup language to produce conceptually annotated documents;
e) checking user-defined descriptions of concepts represented in said concept specification language; and
f) retrieval by matching said user-defined descriptions of concepts against said conceptually annotated documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 71, 72, 73, 74, 75)
-
-
34. A method for information retrieval, performed on a computer system that matches text in documents and other text-forms against user-defined descriptions of concepts, comprising:
-
a) identification of linguistic entities in the text of documents;
b) annotation of said identified linguistic entities in, but not limited to, a Text Markup Language (TML) to produce linguistically annotated documents;
c) storage of said linguistically annotated documents;
d) identification of Concepts and Concept Rules using linguistic information, where said Concepts and Concept Rules are represented in a Concept Specification Language (CSL) and said Concepts-to-be-identified and Concept Rules-to-be-identified occur in one of;
1) said text of documents in which linguistic entities have been identified as per a), or 2) said linguistically annotated documents of b);
or3) said stored linguistically annotated documentsof c);
e) annotation of said identified Concepts and Concept Rules in said TML to produce conceptually annotated documents;
f) storage of said conceptually annotated documents;
g) defining and learning CSL Concepts and Concept Rules;
h) checking user-defined descriptions of Concepts and Concept Rules represented in CSL; and
i) retri val by matching said user-defined descriptions of CSL Concepts and Concept Rules against said conceptually annotated documents. - View Dependent Claims (70, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93)
-
Specification