Information retrieval system providing secondary content analysis on collections of information objects

US 6,581,056 B1
Filed: 06/27/1996
Issued: 06/17/2003
Est. Priority Date: 06/27/1996
Status: Expired due to Fees

First Claim

Patent Images

1. In an information retrieval system for accessing information from one or more Information Sources, an improved user workspace for interfacing into said information retrieval system for obtaining information responsive to a query directed to a plurality of Information Sources, said improved user workspace having a display, interface means for interfacing with said plurality of Information Sources, and query input means, said improvement comprising:

receiving means for receiving documents from said plurality of Information Sources;

storage means coupled to said receiving means, said storage means for storing a collection of documents;

a secondary content analysis engine for analyzing said collection of documents and generating statistical information relating to the content of said collection of documents; and

one or more functional means coupled to said secondary content analysis engine, each of said one or more functional means for generating information describing said collection that may be used for refining a query or understanding said collection.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information retrieval system having a secondary content analysis engine for use on collections of documents. Such collections of documents dynamically arise as a result of queries one or more, possibly distal, information sources. The secondary content analysis engine resides on an Information Access client computer system and allows the user to: 1) iteratively refine queries in more powerful ways than typically supported by relevance feedback or other query modification methods, 2) browse a medium-sized collection of documents (on the order of 1000 items) in more effective ways than is traditionally possible or 3) obtain more information for increasing user understanding of the collection.

Citations

20 Claims

1. In an information retrieval system for accessing information from one or more Information Sources, an improved user workspace for interfacing into said information retrieval system for obtaining information responsive to a query directed to a plurality of Information Sources, said improved user workspace having a display, interface means for interfacing with said plurality of Information Sources, and query input means, said improvement comprising:
- receiving means for receiving documents from said plurality of Information Sources;
  
  storage means coupled to said receiving means, said storage means for storing a collection of documents;
  
  a secondary content analysis engine for analyzing said collection of documents and generating statistical information relating to the content of said collection of documents; and
  
  one or more functional means coupled to said secondary content analysis engine, each of said one or more functional means for generating information describing said collection that may be used for refining a query or understanding said collection.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The improved user workspace as recited in claim 1 wherein one of said one or more functional means is a scatter/gather browser.
  - 3. The improved user workspace as recited in claim 1 wherein one of said one or more functional means is a snippet search browser.
  - 4. The improved user workspace as recited in claim 1 wherein one of said one or more functional means is a relative relevance analyzer.
  - 5. The improved user workspace as recited in claim 1 wherein one of said one or more functional means is a similarity search browser.
  - 6. The improved user workspace as recited in claim 1 wherein one of said one or more functional means is a collection summarizer.
  - 7. The improved user workspace as recited in claim 6 wherein one of said one or more functional means provides a visualization of said collection summarizer.
  - 8. The improved user workspace as recited in claim 1 wherein said collection of documents is ephemeral.
  - 9. The improved user workspace as recited in claim 8 wherein each document in said collection of documents is comprised primarily of text and said secondary content analysis engine is further comprised of:
10. The improved user workspace as recited in claim 9 wherein said statistics collector is further comprised of a first means for collecting statistics at a document level and a second means for collecting statistics as a collection level.

11. A method for querying a plurality of Information Sources comprising the steps of:
- a) a user generating a query from a client workspace;
  
  b) processing said query into a plurality of sub-queries, each of said plurality of sub-queries directed to a corresponding one of said plurality of Information Sources;
  
  c) receiving results of each of said sub-queries at said client workspace, each instance of said results of said sub-queries comprising one or more documents;
  
  d) generating a document collection from said results of each of said sub-queries;
  
  e) analyzing said document collection to create statistical content information for said document collection at said client workspace;
  
  f) creating query refinement information using said statistical content information;
  
  g) said user refining a query based on said query refinement information;
  
  h) said user repeating steps f)-g) until said refined query yields a result acceptable to said user.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method as recited in claim 11 wherein each document in said document collection includes a text portion and said step of analyzing said document collection to create statistical content information for said document collection at said client workspace is further comprised of the step of:
13. The method as recited in claim 12 wherein said step of creating query refinement information using said statistical content information is further comprised of the step of generating a scatter representation of said document collection into related groups, andsaid step of said user refining a query based on said query refinement information is comprised of said user selecting one or more of said related groups.
14. The method as recited in claim 12 wherein said step of said user refining a query based on said query refinement information is further comprised of the step of said user selecting relevant documents.
15. The method as recited in claim 14 wherein in subsequent iterations of said step f) creating query refinement information using said statistical content information is comprised of the step of said system generating a new query based on said selected relevant documents.
16. The method as recited in claim 12 wherein said step of creating query refinement information using said statistical content information is further comprised of the steps of:
- mapping each document to an n-dimensional vector space based on word occurrence;
  
  identifying similar documents based on their proximity in said n-dimensional vector space; and
  
  providing said identified similar documents as query refinement information.
17. The method as recited in claim 12 wherein said step of generating statistics from said remaining set of tokens is further comprised of the steps of:
- generating a first set of statistics at the document level; and
  
  generating a second set of statistics at the collection level.

18. An information retrieval system for retrieving information from a plurality of Information Sources, said plurality of Information Sources accessible through a network, said information retrieval system comprising:
- an information access client for performing queries requesting information from said one or more Information Sources, said information access client comprising a statistical content analysis engine for analyzing a collection of documents resulting from a query and generating statistical information reflecting said collection of documents, each document in said collection of documents having a text part, and one or more query refinement means coupled to said content analysis engine, each of said one or more query refinement means for generating information that may be used for refining a query;
  
  an intermediary server coupled to each of said one or more Information Sources and coupled to said information access client, said intermediary server for mediating queries between said information access client and said one or more Information Sources, said intermediary server comprising;
  
  network access means for accessing information servers on said network;
  
  means for translating an access request of said user workspace to particular protocols utilized by said each of said one or more Information Sources; and
  
  means for receiving the requested information from said access request and merging said received requested information into said collection of documents.
- View Dependent Claims (19, 20)
- - 19. The system as recited in claim 18 wherein said statistical content analysis engine is further comprised of:
20. The system as recited in claim 19 wherein said statistics collector is further comprised of a first means for collecting statistics at a document level and a second means for collecting statistics at a document collection level.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Rao, Ramana B.
Primary Examiner(s)
RONES, CHARLES

Application Number

US08/670,546
Time in Patent Office

2,546 Days
Field of Search

707/5, 707/10, 707/6, 707/4, 707/2, 707/16, 395/346, 395/140, 395/770, 704/10, 704/9
US Class Current

1/1
CPC Class Codes

G06F 16/338   Presentation of query results

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99936   Pattern matching access

Information retrieval system providing secondary content analysis on collections of information objects

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Information retrieval system providing secondary content analysis on collections of information objects

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links