Concept-based search and retrieval system

US 6,675,159 B1
Filed: 07/27/2000
Issued: 01/06/2004
Est. Priority Date: 07/27/2000
Status: Active Grant

First Claim

Patent Images

1. A method of performing concept-based searching of text documents comprising the steps of:

transforming said text documents into predicate structures to form predicate libraries of said documents;

inputting a natural language query;

creating a query predicate structure representing logical relationships between words in said natural language query, said predicate structure containing a predicate and an argument;

matching said query predicate structure to said document predicate structures in said predicate libraries; and

presenting said matched predicate structures from said text documents.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A concept-based indexing and search system indexes collections of documents with ontology-based predicate structures through automated and/or human-assisted methods. The system extracts the concepts behind user queries to return only those documents that match those concepts. The concept based search and retrieval system comprehends the intent behind a query from a user, and returns results matching that intent. The system can perform off-line searches for unanswered user queries and notify the user when a match is found.

1447 Citations

45 Claims

1. A method of performing concept-based searching of text documents comprising the steps of:
- transforming said text documents into predicate structures to form predicate libraries of said documents;
  
  inputting a natural language query;
  
  creating a query predicate structure representing logical relationships between words in said natural language query, said predicate structure containing a predicate and an argument;
  
  matching said query predicate structure to said document predicate structures in said predicate libraries; and
  
  presenting said matched predicate structures from said text documents.
- View Dependent Claims (2, 3, 4)
- - 2. A method of performing concept-based searching of text documents as recited in claim 1, wherein said predicate is one of a verb and a preposition.
  - 3. A method of performing concept-based searching of text documents as recited in claim 1, wherein said argument is any part of speech.
  - 4. A method of performing concept-based searching of text documents as recited in claim 1, wherein said argument is a noun.

5. A method of performing a concept-based searching of text documents comprising the steps of:
- transforming a natural language query into predicate structures representing logical relationships between words in said natural language query;
  
  providing an ontology containing lexical semantic information about words;
  
  transforming said text documents into predicate structures;
  
  probabilistically classifying said document predicate structures and said query predicate structures;
  
  filtering said document predicate structures against said query predicate structures to produce a set of said document predicate structures matching said query predicate structures; and
  
  ranking said set of matching predicate structures.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 6. A method of performing concept-based searching of text documents as recited in claim 5, further comprising the step of storing said ontology, said probabilistic classifications and said predicate structures in a data repository.
  - 7. A method of performing concept-based searching of text documents as recited in claim 5, wherein words and associated probabilities, comprising a statistically-derived category, are used to determine if a particular document belongs to a specific domain.
  - 8. A method of performing concept-based searching of text documents as recited in claim 7, further comprising the step of collecting all attributes occurring in said document and determining if said document belongs to said specified domain.
  - 9. A method of performing concept-based searching of text documents as recited in claim 6, further comprising the steps of:
10. A method of performing concept-based searching of text documents as recited in claim 5, wherein upon failure to match said document predicate structures to said query predicate structures, comparing documents added to said data repository or newly located ones of said documents to said query predicate structure, and notifying a user in the event of a match.
11. A method of performing concept-based searching of text documents as recited in claim 5, wherein upon failure to match said document predicate structures to said query predicate structures, determining whether said query is formulated in terms not previously included in said ontology, and if said determination is positive, designating said query terms as new concepts and adding said query terms to said ontology.
12. A method of performing concept-based searching of text documents as recited in claim 5, further comprising the step of clustering results of said search, said clustering step comprising the following steps of:
- forming a concept pattern vector from said document predicate structures;
  
  providing a feature map that self-adaptively clusters said concept pattern vectors according to said concept patterns in said documents;
  
  producing a cluster model representing documents, identified in said concept-based searching, that reflects statistical distribution of said concept pattern vectors representing said documents; and
  
  providing at least one sample from said cluster model to focus search results.
13. A method of performing concept-based searching of text documents as recited in claim 5, wherein said step of transforming said text documents into predicate structures comprises the steps of:
- removing words that serve as placeholders in English-language;
  
  removing lexemes representing adjective concepts;
  
  grouping proper nouns into single lexical nouns;
  
  removing modal verbs; and
  
  removing lexemes containing adverb concepts.
14. A method of performing concept-based searching of text documents as recited in claim 5, wherein said step of transforming a natural language query into predicate structures comprises the steps of:
- removing words that serve as placeholders in English-language;
  
  removing lexemes representing adjective concepts;
  
  grouping proper nouns into single lexical nouns;
  
  removing modal verbs;
  
  removing lexemes containing adverb concepts; and
  
  removing modal verbs from said query.
15. A method of performing concept-based searching of text documents as recited in claim 5, wherein said step of transforming said natural language query comprises the steps of:
- transforming said natural language query into multiple sequences of part-of-speech-tagged ontological concepts from said ontology;
  
  reducing the number of said multiple sequences based on rules relating to sequences of syntactic tags;
  
  creating syntactic tree structures, based on said syntactic tags, representing grammatical relations between said ontological concepts; and
  
  reducing the number of said tree structures based on rules relating to improbable syntactic structures, and rules concerning conflicting ontological specifications.
16. A method of performing concept-based searching of text documents as recited in claim 15, further comprising the step of converting said tree structures into predicate structures.
17. A method of performing concept-based searching of text documents as recited in claim 5, wherein said step of transforming said text documents comprises the steps oftransforming said documents into multiple sequences of part-of-speech-tagged ontological concepts from said ontology;
- reducing the number of said multiple sequences based on rules relating to sequences of syntactic tags;
  
  creating syntactic tree structures representing grammatical relations between said ontological concepts based on said syntactic tags; and
  
  reducing the number of said tree structures based on rules relating to improbable syntactic structures, and rules concerning conflicting ontological specifications.
18. A method of performing concept-based searching of text documents as recited as recited in claim 17, further comprising the step of converting said tree structures into predicate structures.
19. A method of performing concept-based searching of text documents as recited in claim 12, further comprising the step of using said ontology to develop said feature map to cluster said concept patterns.

20. An apparatus for use in an information retrieval system for retrieving information in response to a query, comprising:
- a query ontological parser that transforms a natural language query into predicate structures;
  
  an ontology providing information about words, said information comprising lexical semantic representation and syntactic types;
  
  a document ontological parser that transforms documents into predicate structures;
  
  a Bayes classifier probabilistically classifying said documents and said query;
  
  adaptive filters for filtering said documents against said query to produce a set of said documents matching said query; and
  
  a ranking module for ranking said set of matching documents.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 39, 41, 42, 43, 44, 45)
- - 21. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, further comprising a data repository storing said ontology, results from said Bayes classifier, and said predicate structures from said document ontological structure.
  - 22. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said query ontological parser comprises:
23. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 21, wherein said Bayes classifier comprises a learner that produces a set of trained document examples from data obtained from said data repository.
24. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 21, wherein said Bayes classifier comprises a reasoner that determines a probability that a classified document matches said query.
25. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said Bayes classifier comprises a reasoner that determines a probability that a classified document matches said query.
26. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 24, wherein said Bayes classifier is document-domain-specific so that words representative of a concept are used to determine if a particular document belongs to a specific domain, and said reasoner determines a probability that a pre-classified document belongs to said specific domain that said Bayes classifier is trained to classify.
27. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 26, further comprising an attribute extractor that collects all attributes occurring in said documents and sends said attributes to said reasoner to determine if said documents belong to said specified domain.
28. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 24, wherein said Bayes classifier is query-topic specific so that words that form said query are used to determine a topic of said query.
29. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 28, wherein said Bayes classifier further comprises a learner that produces a set of trained document examples from data obtained from said data repository.
30. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 29, wherein said reasoner classifies said topic based on said trained set of document examples and provides a list of possible topics ranked in order of probability of correctness.
31. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said document ontological parser comprises:
- a sentence lexer that transforms said documents into multiple sequences of part-of-speech-tagged ontological concepts from said ontology;
  
  post-lexer filters that reduce the number of said multiple sequences produced by said sentence lexer, based on rules relating to sequences of syntactic tags;
  
  a parser that creates syntactic tree structures representing grammatical relations between said ontological concepts based on said syntactic tags; and
  
  post-parser filters that reduce the number of said parse trees based on rules relating to improbable syntactic structures, and rules concerning conflicting ontological specifications.
32. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, further comprising a persistent agent maintaining at least one of said predicate structures extracted from said query,wherein, upon failure to match said documents to said query, documents added to said data repository or newly located ones of said documents parsed by said document ontological parser are compared to said at least one predicate structure extracted from said query, and a notification is sent to a user upon a match.
33. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, further comprising a persistent agent maintaining at least one of said predicate structures extracted from said query,wherein, upon failure to match said documents to said query, a determination is made whether said query is formulated in terms not previously included in said ontology, and if said determination is positive, said query terms are designated as new concepts and added to said ontology.
34. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said ranking module determines similarity between said query and each of said documents returned from said data repository.
35. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said ranking module determines similarity between said predicate structure of said query and each predicate structure of said documents returned from said data repository.
37. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said predicate structures for each of said documents forms at least one concept pattern vector for each of said documents.
39. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said query predicate structure and said document predicate structures comprise a predicate and an argument, said predicate is one of a verb and a preposition, and said argument is any part of speech.
41. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said adaptive filters comprise a feature map that clusters said matching documents according to concept patterns in said query and produces a cluster model representing a statistical probability distribution of said matching documents.
42. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 22, wherein said post-lexer filters comprise:
- a stop word filter that removes words that serve as placeholders in English-language;
  
  an adjective filter that removes lexemes representing adjective concepts;
  
  a proper noun filter that groups proper nouns into single lexical nouns;
  
  a modal verb filter that removes modal verbs;
  
  an adverb filter that removes lexemes containing adverb concepts; and
  
  a pseudo-predicate filter that removes verbs from said queries.
43. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 31, wherein said post-lexer filters comprise:
- a stop word filter that removes words that serve as placeholders in English-language;
  
  an adjective filter that removes lexemes representing adjective concepts;
  
  a proper noun filter that groups proper nouns into single lexical nouns;
  
  a modal verb filter that removes modal verbs; and
  
  an adverb filter that removes lexemes containing adverb concepts.
44. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 22, wherein said parser comprises a parse tree converter for converting parse trees into predicate structures.
45. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 31, wherein said parser comprises a parse tree converter for converting parse trees into predicate structures.

36. An apparatus for use in an information retrieval system for retrieving information in response to a query comprising:
- a query ontological parser that transforms a natural language query into predicate structures;
  
  an ontology providing information about words, said information comprising syntactic uses and definitions;
  
  a document ontological parser that transforms documents into predicate structures;
  
  a Bayes classifier probabilistically classifying said documents and said query;
  
  adaptive filters for filtering said predicate structures of said documents against said predicate structures of said query to group said documents according to similarity of concept patterns contained in said documents relative to said query or additional feedback; and
  
  a ranking module for ranking said set of matching predicate structures.
- View Dependent Claims (38, 40)
- - 38. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 36, wherein said predicate structures for each of said documents forms at least one concept pattern vector for each of said documents.
  - 40. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 36, wherein said query predicate structure and said document predicate structures comprise a predicate and an argument, said predicate is one of a verb and a preposition, and said argument is any part of speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Leidos, Inc. (Leidos Holdings, Inc.)
Original Assignee
Science Applications International Corporation
Inventors
Wang, Lei, Tseng, Jason Chun-Ming, Tijerino, Yuri Adrian, Caudill, Maureen, Busch, Justin Eliot, Graydon, Patrick John, Lin, Albert Deirchow, Klein, Kenneth Scott, Pancho, Bryner Sabido, Chinchor, Nancy Ann
Primary Examiner(s)
Metjahic, Safet
Assistant Examiner(s)
AL HASHEMI, SANA A

Application Number

US09/627,295
Time in Patent Office

1,258 Days
Field of Search

707/2, 707/104.1, 707/5, 707/103.R, 704/9
US Class Current

1/1
CPC Class Codes

G06F 16/2457   with adaptation to user needs

G06F 16/3334   Selection or weighting of t...

G06F 16/3338   Query expansion

G06F 16/353   into predefined classes

G06F 16/367   Ontology

G06F 40/205   Parsing

G06F 40/216   using statistical methods

G06F 40/253   Grammatical analysis; Style...

Y10S 707/99933   Query processing, i.e. sear...

Concept-based search and retrieval system

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

1447 Citations

45 Claims

Specification

Use Cases

Quick Links

Others

Concept-based search and retrieval system

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

1447 Citations

45 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others