Method and apparatus for automatic construction of faceted terminological feedback for document retrieval
First Claim
1. A method of selecting and organizing documents from a document corpus in response to a user-provided search expression, the method comprising the steps of:
- a. locating, within the document corpus, documents matching the search expression;
b. identifying, within the located documents, instances of a lexical construct conforming to a selected syntactic pattern that includes two or more parts of speech;
c. assigning dispersion rates to words within the lexical constructs, each dispersion rate corresponding to the number of textually distinct lexical constructs containing the word;
d. ranking the words in accordance with their dispersion rates;
e. presenting a list of at least sonic of the words;
f. facilitating selection of a listed word;
g. appending the word to a base search expression to form a new search expression; and
h. facilitating access to documents from the corpus matching the new search expression.
3 Assignments
0 Petitions
Accused Products
Abstract
Iterative information retrieval from a large database of textual or text-containing documents is facilitated by automatic construction of faceted representations. Facets are chosen heuristically based on lexical dispersion, a measure of the number of different words with which a particular search expression co-occurs within a given type of lexical construct (e.g., a noun phrase) appearing in the document set. Words having high dispersion rates represent “facets” that may be used to organize the documents conceptually in accordance with the search expression, effectively providing a concise, structured summary of the contents of a result set as well as presenting a set of candidate terms for query reformulation.
136 Citations
21 Claims
-
1. A method of selecting and organizing documents from a document corpus in response to a user-provided search expression, the method comprising the steps of:
-
a. locating, within the document corpus, documents matching the search expression;
b. identifying, within the located documents, instances of a lexical construct conforming to a selected syntactic pattern that includes two or more parts of speech;
c. assigning dispersion rates to words within the lexical constructs, each dispersion rate corresponding to the number of textually distinct lexical constructs containing the word;
d. ranking the words in accordance with their dispersion rates;
e. presenting a list of at least sonic of the words;
f. facilitating selection of a listed word;
g. appending the word to a base search expression to form a new search expression; and
h. facilitating access to documents from the corpus matching the new search expression. - View Dependent Claims (2, 3, 9, 10)
-
-
4. A method of selecting and organizing documents from a document corpus in response to a user-provided search expression, the method comprising the steps of:
-
a. locating, within the document corpus, documents matching the search expression;
b. identifying, within the located documents, instances of a lexical construct conforming to a selected syntactic pattern that includes two or more parts of speech;
c. assigning dispersion rates to words within the lexical constructs, each dispersion rate corresponding to the number of textually distinct lexical constructs containing the word;
d. ranking the words in accordance with their dispersion rates;
e. facilitating selection of at least one of the words;
f. for each selected word, presenting a sorted list of lexical constructs that appear in the located documents and contain the word; and
g. facilitating selection of a listed lexical construct. - View Dependent Claims (5, 6, 7, 8)
d. appending the lexical construct to a base search expression to form a new search expression; and
e. facilitating access to documents from the corpus that match the new search expression.
-
-
6. The method of claim 4 further comprising the step of facilitating access to documents from the corpus that match the selected lexical construct.
-
7. The method of claim 4 wherein the base search expression is the user-provided query.
-
8. The method of claim 4 wherein the base search expression is a new user-provided query.
-
11. Text-searching apparatus comprising:
-
a. a digitally searchable corpus of documents;
b. an interface for receiving a search expression;
c. a search module, responsive to the search expression, for locating documents in the corpus matching the search expression; and
d. a control module configured to;
i. identify, within the located documents, instances of a lexical construct conforming to a selected syntactic pattern that includes two or more parts of speech;
ii. assign dispersion rates to words within the lexical constructs, each dispersion rate corresponding to the number of textually distinct lexical constructs containing the word; and
iii. rank the words in accordance with their dispersion rates. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
a. a web server;
b. means for generating web pages for transmission to remote users via the server;
c. means for receiving, via the server, the search expression remotely entered into a generated web page.
-
-
13. The apparatus of claim 11 wherein the interface is configured to:
-
a. present a list of at least some of the words; and
b. facilitate selection of a listed word and append the selected word to a base search expression to form a new search expression, the search module receiving the new search expression and, in response thereto, locating documents from the corpus matching the new search expression.
-
-
14. The apparatus of claim 13 wherein the initial search expression is the user-provided expression.
-
15. The apparatus of claim 13 wherein the initial search expression is a new user-provided expression, the interface being configured to receive the new user-provided expression.
-
16. The apparatus of claim 11 wherein the control module is further configured to:
-
a. facilitate selection of at least one of the words; and
b. present, for each selected word, a sorted list of lexical constructs that appear in the located documents and contain the words, the interface being configured to present the list and facilitate selection of a lexical construct from the list.
-
-
17. The apparatus of claim 16 wherein the interface, in response to selection of a lexical construct, is further configured to append the lexical construct to an initial search expression to form a new search expression, the search module being responsive to the new search expression and locating documents from the document database matching the new search expression.
-
18. The apparatus of claim 17 wherein the initial search expression is the user-provided expression.
-
19. The apparatus of claim 17 wherein the initial search expression is a new user-provided expression, the interface being configured to receive the new user-provided expression.
-
20. The apparatus of claim 11 wherein the syntactic pattern is ?<
- adjective>
<
noun>
+.
- adjective>
-
21. The apparatus of claim 11 wherein the control module is further configured to remove lexical constructs matching a noiseword filter.
Specification