Concept-based search and retrieval system
First Claim
Patent Images
1. A method of performing concept-based searching of text documents comprising the steps of:
- transforming said text documents into predicate structures to form predicate libraries of said documents;
inputting a natural language query;
creating a query predicate structure representing logical relationships between words in said natural language query, said predicate structure containing a predicate and an argument;
matching said query predicate structure to said document predicate structures in said predicate libraries; and
presenting said matched predicate structures from said text documents.
6 Assignments
0 Petitions
Accused Products
Abstract
A concept-based indexing and search system indexes collections of documents with ontology-based predicate structures through automated and/or human-assisted methods. The system extracts the concepts behind user queries to return only those documents that match those concepts. The concept based search and retrieval system comprehends the intent behind a query from a user, and returns results matching that intent. The system can perform off-line searches for unanswered user queries and notify the user when a match is found.
1447 Citations
45 Claims
-
1. A method of performing concept-based searching of text documents comprising the steps of:
-
transforming said text documents into predicate structures to form predicate libraries of said documents;
inputting a natural language query;
creating a query predicate structure representing logical relationships between words in said natural language query, said predicate structure containing a predicate and an argument;
matching said query predicate structure to said document predicate structures in said predicate libraries; and
presenting said matched predicate structures from said text documents. - View Dependent Claims (2, 3, 4)
-
-
5. A method of performing a concept-based searching of text documents comprising the steps of:
-
transforming a natural language query into predicate structures representing logical relationships between words in said natural language query;
providing an ontology containing lexical semantic information about words;
transforming said text documents into predicate structures;
probabilistically classifying said document predicate structures and said query predicate structures;
filtering said document predicate structures against said query predicate structures to produce a set of said document predicate structures matching said query predicate structures; and
ranking said set of matching predicate structures. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
determining a topic of said query predicate structure;
providing a set of trained document examples from said data repository;
classifying said topic based on said trained set of document examples; and
providing a list of possible topics ranked in order of probability of correctness.
-
-
10. A method of performing concept-based searching of text documents as recited in claim 5, wherein upon failure to match said document predicate structures to said query predicate structures, comparing documents added to said data repository or newly located ones of said documents to said query predicate structure, and notifying a user in the event of a match.
-
11. A method of performing concept-based searching of text documents as recited in claim 5, wherein upon failure to match said document predicate structures to said query predicate structures, determining whether said query is formulated in terms not previously included in said ontology, and if said determination is positive, designating said query terms as new concepts and adding said query terms to said ontology.
-
12. A method of performing concept-based searching of text documents as recited in claim 5, further comprising the step of clustering results of said search, said clustering step comprising the following steps of:
-
forming a concept pattern vector from said document predicate structures;
providing a feature map that self-adaptively clusters said concept pattern vectors according to said concept patterns in said documents;
producing a cluster model representing documents, identified in said concept-based searching, that reflects statistical distribution of said concept pattern vectors representing said documents; and
providing at least one sample from said cluster model to focus search results.
-
-
13. A method of performing concept-based searching of text documents as recited in claim 5, wherein said step of transforming said text documents into predicate structures comprises the steps of:
-
removing words that serve as placeholders in English-language;
removing lexemes representing adjective concepts;
grouping proper nouns into single lexical nouns;
removing modal verbs; and
removing lexemes containing adverb concepts.
-
-
14. A method of performing concept-based searching of text documents as recited in claim 5, wherein said step of transforming a natural language query into predicate structures comprises the steps of:
-
removing words that serve as placeholders in English-language;
removing lexemes representing adjective concepts;
grouping proper nouns into single lexical nouns;
removing modal verbs;
removing lexemes containing adverb concepts; and
removing modal verbs from said query.
-
-
15. A method of performing concept-based searching of text documents as recited in claim 5, wherein said step of transforming said natural language query comprises the steps of:
-
transforming said natural language query into multiple sequences of part-of-speech-tagged ontological concepts from said ontology;
reducing the number of said multiple sequences based on rules relating to sequences of syntactic tags;
creating syntactic tree structures, based on said syntactic tags, representing grammatical relations between said ontological concepts; and
reducing the number of said tree structures based on rules relating to improbable syntactic structures, and rules concerning conflicting ontological specifications.
-
-
16. A method of performing concept-based searching of text documents as recited in claim 15, further comprising the step of converting said tree structures into predicate structures.
-
17. A method of performing concept-based searching of text documents as recited in claim 5, wherein said step of transforming said text documents comprises the steps of
transforming said documents into multiple sequences of part-of-speech-tagged ontological concepts from said ontology; -
reducing the number of said multiple sequences based on rules relating to sequences of syntactic tags;
creating syntactic tree structures representing grammatical relations between said ontological concepts based on said syntactic tags; and
reducing the number of said tree structures based on rules relating to improbable syntactic structures, and rules concerning conflicting ontological specifications.
-
-
18. A method of performing concept-based searching of text documents as recited as recited in claim 17, further comprising the step of converting said tree structures into predicate structures.
-
19. A method of performing concept-based searching of text documents as recited in claim 12, further comprising the step of using said ontology to develop said feature map to cluster said concept patterns.
-
20. An apparatus for use in an information retrieval system for retrieving information in response to a query, comprising:
-
a query ontological parser that transforms a natural language query into predicate structures;
an ontology providing information about words, said information comprising lexical semantic representation and syntactic types;
a document ontological parser that transforms documents into predicate structures;
a Bayes classifier probabilistically classifying said documents and said query;
adaptive filters for filtering said documents against said query to produce a set of said documents matching said query; and
a ranking module for ranking said set of matching documents. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 39, 41, 42, 43, 44, 45)
a sentence lexer that transforms said natural language query into multiple sequences of part-of-speech-tagged ontological concepts from said ontology;
post-lexer filters that reduce the number of said multiple sequences produced by said sentence lexer, based on rules relating to sequences of syntactic tags;
a parser that creates syntactic tree structures representing grammatical relations between said ontological concepts based on said syntactic tags; and
post-parser filters that reduce the number of said parse trees based on rules relating to improbable syntactic structures, and rules concerning conflicting ontological specifications.
-
-
23. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 21, wherein said Bayes classifier comprises a learner that produces a set of trained document examples from data obtained from said data repository.
-
24. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 21, wherein said Bayes classifier comprises a reasoner that determines a probability that a classified document matches said query.
-
25. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said Bayes classifier comprises a reasoner that determines a probability that a classified document matches said query.
-
26. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 24, wherein said Bayes classifier is document-domain-specific so that words representative of a concept are used to determine if a particular document belongs to a specific domain, and said reasoner determines a probability that a pre-classified document belongs to said specific domain that said Bayes classifier is trained to classify.
-
27. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 26, further comprising an attribute extractor that collects all attributes occurring in said documents and sends said attributes to said reasoner to determine if said documents belong to said specified domain.
-
28. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 24, wherein said Bayes classifier is query-topic specific so that words that form said query are used to determine a topic of said query.
-
29. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 28, wherein said Bayes classifier further comprises a learner that produces a set of trained document examples from data obtained from said data repository.
-
30. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 29, wherein said reasoner classifies said topic based on said trained set of document examples and provides a list of possible topics ranked in order of probability of correctness.
-
31. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said document ontological parser comprises:
-
a sentence lexer that transforms said documents into multiple sequences of part-of-speech-tagged ontological concepts from said ontology;
post-lexer filters that reduce the number of said multiple sequences produced by said sentence lexer, based on rules relating to sequences of syntactic tags;
a parser that creates syntactic tree structures representing grammatical relations between said ontological concepts based on said syntactic tags; and
post-parser filters that reduce the number of said parse trees based on rules relating to improbable syntactic structures, and rules concerning conflicting ontological specifications.
-
-
32. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, further comprising a persistent agent maintaining at least one of said predicate structures extracted from said query,
wherein, upon failure to match said documents to said query, documents added to said data repository or newly located ones of said documents parsed by said document ontological parser are compared to said at least one predicate structure extracted from said query, and a notification is sent to a user upon a match. -
33. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, further comprising a persistent agent maintaining at least one of said predicate structures extracted from said query,
wherein, upon failure to match said documents to said query, a determination is made whether said query is formulated in terms not previously included in said ontology, and if said determination is positive, said query terms are designated as new concepts and added to said ontology. -
34. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said ranking module determines similarity between said query and each of said documents returned from said data repository.
-
35. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said ranking module determines similarity between said predicate structure of said query and each predicate structure of said documents returned from said data repository.
-
37. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said predicate structures for each of said documents forms at least one concept pattern vector for each of said documents.
-
39. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said query predicate structure and said document predicate structures comprise a predicate and an argument, said predicate is one of a verb and a preposition, and said argument is any part of speech.
-
41. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 20, wherein said adaptive filters comprise a feature map that clusters said matching documents according to concept patterns in said query and produces a cluster model representing a statistical probability distribution of said matching documents.
-
42. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 22, wherein said post-lexer filters comprise:
-
a stop word filter that removes words that serve as placeholders in English-language;
an adjective filter that removes lexemes representing adjective concepts;
a proper noun filter that groups proper nouns into single lexical nouns;
a modal verb filter that removes modal verbs;
an adverb filter that removes lexemes containing adverb concepts; and
a pseudo-predicate filter that removes verbs from said queries.
-
-
43. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 31, wherein said post-lexer filters comprise:
-
a stop word filter that removes words that serve as placeholders in English-language;
an adjective filter that removes lexemes representing adjective concepts;
a proper noun filter that groups proper nouns into single lexical nouns;
a modal verb filter that removes modal verbs; and
an adverb filter that removes lexemes containing adverb concepts.
-
-
44. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 22, wherein said parser comprises a parse tree converter for converting parse trees into predicate structures.
-
45. An apparatus for use in an information retrieval system for retrieving information in response to a query as recited in claim 31, wherein said parser comprises a parse tree converter for converting parse trees into predicate structures.
-
36. An apparatus for use in an information retrieval system for retrieving information in response to a query comprising:
-
a query ontological parser that transforms a natural language query into predicate structures;
an ontology providing information about words, said information comprising syntactic uses and definitions;
a document ontological parser that transforms documents into predicate structures;
a Bayes classifier probabilistically classifying said documents and said query;
adaptive filters for filtering said predicate structures of said documents against said predicate structures of said query to group said documents according to similarity of concept patterns contained in said documents relative to said query or additional feedback; and
a ranking module for ranking said set of matching predicate structures. - View Dependent Claims (38, 40)
-
Specification