Multiple engine information retrieval and visualization system
DCFirst Claim
1. An information retrieval system for selectively retrieving documents from a document database, the system comprising:
- an input interface for accepting at least one user search query;
an n-gram search engine for retrieving documents from the document database based upon the at least one user search query, said n-gram search engine producing a common mathematical representation of each retrieved document;
a vector space model (VSM) search engine for retrieving documents from the document database based upon the at least one user search query, said VSM search engine producing a common mathematical representation of each retrieved document;
a display; and
visualization display means for mapping respective mathematical representations of the retrieved documents onto said display.
1 Assignment
Litigations
0 Petitions
Accused Products
Abstract
An information retrieval and visualization system utilizes multiple search engines for retrieving documents from a document database based upon user input queries. Search engines include an n-gram search engine and a vector space model search engine using a neural network training algorithm. Each search engine produces a common mathematical representation of each retrieved document. The retrieved documents are then combined and ranked. Mathematical representations for each respective document is mapped onto a display. Information displayed includes a three-dimensional display of keywords from the user input query. The three-dimensional visualization capability based upon the mathematical representation of information within the information retrieval and visualization system provides users with an intuitive understanding, with relevance feedback/query refinement techniques that can be better utilized, resulting in higher retrieval accuracy (precision).
447 Citations
40 Claims
-
1. An information retrieval system for selectively retrieving documents from a document database, the system comprising:
-
an input interface for accepting at least one user search query;
an n-gram search engine for retrieving documents from the document database based upon the at least one user search query, said n-gram search engine producing a common mathematical representation of each retrieved document;
a vector space model (VSM) search engine for retrieving documents from the document database based upon the at least one user search query, said VSM search engine producing a common mathematical representation of each retrieved document;
a display; and
visualization display means for mapping respective mathematical representations of the retrieved documents onto said display. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method for selectively retrieving documents from a document database using an information retrieval system comprising an n-gram search engine and a vector space model (VSM) search engine, the method comprising:
-
generating at least one user search query;
retrieving documents from the document database using the n-gram search engine based upon the at least one user search query, the n-gram search engine producing a common mathematical representation of each retrieved document;
retrieving documents from the document database using the VSM search engine based upon the at least one user search query, the VSM search engine producing a common mathematical representation of each retrieved document; and
mapping respective mathematical representations of the retrieved documents onto a display. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
producing a document context vector representation of each retrieved document; and
producing an axis context vector representation of each retrieved document.
-
-
23. A method according to claim 22, wherein the mapping comprises mapping the axis and the document context vector representations of the retrieved documents onto the display.
-
24. A method according to claim 22, wherein the document context vector is based upon a sum of all words in a document after reducing low content words.
-
25. A method according to claim 22, wherein the axis context vector is based upon a sum of all words in each axis after reducing low content words.
-
26. A method according to claim 21, wherein the mapping further comprising displaying keywords from the at least one user search query onto a three dimensional display.
-
27. A method according to claim 26, wherein the mapping comprises displaying retrieved documents in clusters surrounding respective user search queries.
-
28. A method according to claim 27, wherein the displaying comprises displaying retrieved documents in a color different from a color used for displaying the keywords from the at least one user search query.
-
29. A method according to claim 27, wherein the displaying comprises displaying different aspects of the retrieved documents on the three-dimensional display.
-
30. A method according to claim 21, further comprising providing a list of retrieved documents, and wherein each retrieved document has an assigned score indicating relevance to the search query with respect to the other retrieved documents.
-
31. A method according to claim 21, further comprising:
-
receiving an input for selecting a retrieved document mapped onto the display; and
displaying the text of the selected document on the display.
-
-
32. A method according to claim 21, further comprising combining and ranking retrieved documents from the n-gram search engine and the VSM search engine.
-
33. A method according to claim 21, wherein the n-gram search engine comprises n-gram training means for least frequency training of training documents.
-
34. A method according to claim 21, wherein the VSM search engine comprises VSM training means for processing training documents.
-
35. A method according to claim 21, further comprising providing relevance feedback means for accepting relevance feedback from a user.
-
36. A method according to claim 35, wherein providing relevance feedback means comprises selecting one or more retrieved documents as a next search query.
-
37. A method according to claim 21, further comprising assigning a weighting percentage to each search engine.
-
38. A method according to claim 21, wherein the generating comprises generating at least one keyword.
-
39. A method according to claim 21, wherein the generating comprises generating at least one document.
-
40. A method according to claim 21, wherein the generating comprises generating at least one document cluster.
Specification