Method and system for describing and identifying concepts in natural language text for information retrieval and processing
First Claim
1. A method of information retrieval, performed on a computer system that matches text in documents and other text-forms against user-defined descriptions of concepts, comprising:
- a) identification of linguistic entities in the text of documents and other text-forms;
b) annotation of said identified linguistic entities in a text markup language to produce linguistically annotated documents and other text-forms;
c) storage of said linguistically annotated documents and other text-forms;
d) identification of concepts using linguistic information, where said concepts are represented in a concept specification language and said concepts occur in one of;
1) said text of documents and other text-forms in which linguistic entities have been identified in step a);
or2) said linguistically annotated documents and other text-forms of step b);
or3) stored linguistically annotated documents and other text-forms of step c);
e) annotation of said identified concepts in said text markup language to produce conceptually annotated documents and other text-forms;
f) storage of said conceptually annotated documents and other text-forms;
g) defining and learning concept representations of said concept specification language, including;
1) marking up instances of concepts in the text of documents and other text-forms;
2) creating new concept representations in the concept specification language from said marked up instances of concepts; and
3) adding and, if necessary, integrating said new concept representations in the concept specification language with pre-existing concept representations in said language;
h) checking user-defined descriptions of concepts represented in said concept specification language; and
i) retrieval by matching said user-defined descriptions of concepts against said conceptually annotated documents and other text-forms.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for information retrieval that matches occurrences of concepts in natural language text documents against descriptions of concepts in user queries. Said method, implemented in a computer system, includes a preferred version of the method that comprises (1) annotating natural language text in documents and other text-forms with linguistic information and Concepts and Concept Rules expressed in a Concept Specification Language (CSL) for a particular domain, (2) pruning and optimizing synonyms for a particular domain, (3) defining and learning said CSL Concepts and Concept Rules, (4) checking user-defined descriptions of Concepts represented in CSL (including user queries), and (5) retrieval by matching said user-defined descriptions (and queries) against said annotated text. CSL is a language for expressing linguistically-based patterns. Said patterns can represent the linguistic manifestations of concepts in text. Said concepts may derive from the sublanguages used by experts to analyze specialized domains including, but not limited to, insurance claims, police incident reports medical reports, and aviation incident reports.
-
Citations
94 Claims
-
1. A method of information retrieval, performed on a computer system that matches text in documents and other text-forms against user-defined descriptions of concepts, comprising:
-
a) identification of linguistic entities in the text of documents and other text-forms; b) annotation of said identified linguistic entities in a text markup language to produce linguistically annotated documents and other text-forms; c) storage of said linguistically annotated documents and other text-forms; d) identification of concepts using linguistic information, where said concepts are represented in a concept specification language and said concepts occur in one of; 1) said text of documents and other text-forms in which linguistic entities have been identified in step a);
or2) said linguistically annotated documents and other text-forms of step b);
or3) stored linguistically annotated documents and other text-forms of step c); e) annotation of said identified concepts in said text markup language to produce conceptually annotated documents and other text-forms; f) storage of said conceptually annotated documents and other text-forms; g) defining and learning concept representations of said concept specification language, including; 1) marking up instances of concepts in the text of documents and other text-forms; 2) creating new concept representations in the concept specification language from said marked up instances of concepts; and 3) adding and, if necessary, integrating said new concept representations in the concept specification language with pre-existing concept representations in said language; h) checking user-defined descriptions of concepts represented in said concept specification language; and i) retrieval by matching said user-defined descriptions of concepts against said conceptually annotated documents and other text-forms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94)
-
Specification