Apparatus and method for discovering context groups and document categories by mining usage logs

US 6,502,091 B1
Filed: 02/23/2000
Issued: 12/31/2002
Est. Priority Date: 02/23/2000
Status: Expired due to Term

First Claim

Patent Images

1. An apparatus for relating user queries and documents, comprising:

a client configured to enable a user to submit user queries to locate documents;

a server having a data mining mechanism configured to receive the user queries and generate information retrieval sessions;

a communications pathway extending between the client and the server; and

a database provided in communication with the client and the server, the database storing data in the form of usage logs generated from the information retrieval sessions generated by a user at the client;

wherein the data mining mechanism includes a clustering algorithm identifying context groups and usage categories, and operative to identify query contexts associated with individual queries from the usage logs, partition the queries into context groups having similar contexts, and compute multiple context groups associated with specific query keywords from the usage logs.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus is provided for relating user queries and documents. The apparatus includes a client, a server, and a database being mutually coupled to a communications pathway. The client is configured to enable a user to submit user queries to locate documents. The server has a data mining mechanism configured to receive the user queries and generate information retrieval sessions. The database stores data in the form of usage logs generated from the information retrieval sessions. The data mining mechanism includes a clustering algorithm operative to identify context groups and usage categories. The data mining mechanism is operative to identify query contexts associated with individual queries from the usage logs, partition the queries into context groups having similar contexts, and compute multiple context groups associated with specific query keywords from the usage logs. A method is provided for associating user queries and documents in accordance with the apparatus.

Citations

31 Claims

1. An apparatus for relating user queries and documents, comprising:
- a client configured to enable a user to submit user queries to locate documents;
  
  a server having a data mining mechanism configured to receive the user queries and generate information retrieval sessions;
  
  a communications pathway extending between the client and the server; and
  
  a database provided in communication with the client and the server, the database storing data in the form of usage logs generated from the information retrieval sessions generated by a user at the client;
  
  wherein the data mining mechanism includes a clustering algorithm identifying context groups and usage categories, and operative to identify query contexts associated with individual queries from the usage logs, partition the queries into context groups having similar contexts, and compute multiple context groups associated with specific query keywords from the usage logs.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The apparatus of claim 1 wherein each query context comprises a set of all queries that belong to a connected component of G, wherein G is an undirected query graph with a vertex for each query in a usage log.
  - 3. The apparatus of claim 1 wherein individual context groups comprise one or more query contexts, wherein queries are grouped solely based on corresponding sets of opened document ids, with a query comprising at least one keyword.
  - 4. The apparatus of claim 1 wherein the clustering algorithm clusters together query contexts to form the context groups.
  - 5. The apparatus of claim 1 wherein each context group is represented as a directed acyclic graph (DAG) comprising a multi-level context DAG having at least one query context node and at least one general context node.
  - 6. The apparatus of claim 5 wherein individual documents are attached to individual nodes of the multi-level context DAG.
  - 7. The apparatus of claim 6 wherein a relevant document set, rel-doc (Q), that is associated with a query context, Q, comprises a group of individual documents attached to a query context node.
  - 8. The apparatus of claim 1 wherein individual queries are grouped based on corresponding sets of opened document ids from the usage logs.
  - 9. The apparatus of claim 1 wherein each query is an individual user retrieval session.

10. A method for relating user queries and documents using usage logs from retrieval sessions from a text retrieval system, comprising:
- identifying contexts associated with a user query comprising at least one specific query keyword;
  
  identifying user queries having similar query contexts;
  
  partitioning user queries into groups based upon similarity of the query contexts;
  
  merging the groups to compute multiple contexts associated with specific query keywords; and
  
  applying a clustering algorithm to identify similar query contexts based upon the query keywords to generate context groups that associate keywords with documents accessed by users.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method claim 10 wherein each query context is identified as a vector.
  - 12. The method of claim 11 wherein the clustering algorithm comprises a self-organizing map (SOM) clustering algorithm configured to order the vectors onto a map such that similar vectors lie proximate each other.
  - 13. The method of claim 11 wherein the map comprises a regular grid of units, with each unit being assigned a model vector, wherein each vector is mapped to the unit whose model vector best represents the input vector.
  - 14. The method of claim 10 wherein the step of partitioning comprises representing each context group as a directed acyclic graph, with each context group containing query contexts.
  - 15. The method of claim 14 wherein the acyclic graph comprises a multi-level context directed acyclic graph (DAG).

16. A method for associating user queries in the form of keywords and documents accessed from usage logs in response to submission of the user queries during user retrieval sessions from a text retrieval system, comprising:
- identifying contexts associated with a user query comprising at least one specific query keyword;
  
  identifying similar query contexts from individual user queries;
  
  partitioning the user queries into groups based upon the identified similar query contexts;
  
  associating the groups to identify at least one query context associated with each specific query keyword;
  
  clustering similar query contexts based upon the query keywords to generate context groups that associate the keywords with documents accessed by users; and
  
  graphically depicting the contexts associated with documents from most general to most specific.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16 wherein multi-level context directed acyclic graphs (DAGs) are used to display to a user the contexts associated with individual documents.
  - 18. The method of claim 16 wherein the step of graphically depicting the contexts comprises displaying a multi-level context directed acyclic graph (DAG) to a user via a graphical user interface.
  - 19. The method of claim 18 wherein the displayed DAG comprises query context nodes and general context nodes.
  - 20. The method of claim 19 wherein the query context nodes identify a document set comprising a relevant document set, rel-doc (Q), associated with a query context, Q, where the query context is a set of all the queries that belong to a connected component of a query graph, G.

21. An apparatus for relating user queries and documents, comprising:
- a client configured to enable a user to submit user queries to locate documents;
  
  a server having a data mining mechanism configured to receive the user queries and generate information retrieval sessions;
  
  a communications pathway extending between the client and the server; and
  
  a database provided in communication with the client and the server, the database storing data in the form of usage logs generated from the information retrieval sessions generated by a user at the client, wherein the data mining mechanism includes a clustering algorithm identifying context groups and usage categories, and operative to identify query contexts associated with individual queries from the usage logs, partition the queries into context groups having similar contexts, and compute multiple context groups associated with specific query keywords from the usage logs, and wherein each query context comprises a set of all queries that belong to a connected component of G, wherein G is an undirected query graph with a vertex for each query in a usage log.

22. An apparatus for relating user queries and documents, comprising:
- a client configured to enable a user to submit user queries to locate documents;
  
  a server having a data mining mechanism configured to receive the user queries and generate information retrieval sessions;
  
  a communications pathway extending between the client and the server; and
  
  a database provided in communication with the client and the server, the database storing data in the form of usage logs generated from the information retrieval sessions generated by a user at the client, wherein the data mining mechanism includes a clustering algorithm identifying context groups and usage categories, and operative to identify query contexts associated with individual queries from the usage logs, partition the queries into context groups having similar contexts, and compute multiple context groups associated with specific query keywords from the usage logs, and wherein individual context groups comprise one or more query contexts, wherein queries are grouped solely based on corresponding sets of opened document IDs, with a query comprising at least one keyword.

23. An apparatus for relating user queries and documents, comprising:
- a client configured to enable a user to submit user queries to locate documents;
  
  a server having a data mining mechanism configured to receive the user queries and generate information retrieval sessions;
  
  a communications pathway extending between the client and the server; and
  
  a database provided in communication with the client and the server, the database storing data in the form of usage logs generated from the information retrieval sessions generated by a user at the client, wherein the data mining mechanism includes a clustering algorithm identifying context groups and usage categories, and operative to identify query contexts associated with individual queries from the usage logs, partition the queries into context groups having similar contexts, and compute multiple context groups associated with specific query keywords from the usage logs, and wherein each context group is represented as a directed acyclic graph (DAG) comprising a multi-level context DAG having at least one query context node and at least one general context node.
- View Dependent Claims (24, 25)
- - 24. The apparatus of claim 23 wherein individual documents are attached to individual nodes of the multi-level context DAG.
  - 25. The apparatus of claim 24 wherein a relevant document set, rel-doc (Q), that is associated with a query context, Q, comprises a group of individual documents attached to a query context node.

26. An apparatus for relating user queries and documents, comprising:
- a client configured to enable a user to submit user queries to locate documents;
  
  a server having a data mining mechanism configured to receive the user queries and generate information retrieval sessions;
  
  a communications pathway extending between the client and the server; and
  
  a database provided in communication with the client and the server, the database storing data in the form of usage logs generated from the information retrieval sessions generated by a user at the client, wherein the data mining mechanism includes a clustering algorithm identifying context groups and usage categories, and operative to identify query contexts associated with individual queries from the usage logs, partition the queries into context groups having similar contexts, and compute multiple context groups associated with specific query keywords from the usage logs, and wherein individual queries are grouped based on corresponding sets of opened document IDs from the usage logs.

27. A method for relating user queries and documents using usage logs from retrieval sessions from a text retrieval system, comprising:
- identifying contexts associated with a user query comprising at least one specific query keyword;
  
  identifying user queries having similar query contexts;
  
  partitioning user queries into groups based upon similarity of the query contexts;
  
  merging the groups to compute multiple contexts associated with specific query keywords; and
  
  applying a clustering algorithm to identify similar query contexts based upon the query keywords to generate context groups that associate keywords with documents accessed by users, wherein each query context is identified as a vector.
- View Dependent Claims (28, 29)
- - 28. The method of claim 27 wherein the clustering algorithm comprises a self-organizing map (SOM) clustering algorithm configured to order the vectors onto a map such that similar vectors lie proximate each other.
  - 29. The method of claim 27 wherein the map comprises a regular grid of units, with each unit being assigned a model vector, wherein each vector is mapped to the unit whose model vector best represents the input vector.

30. A method for relating user queries and documents using usage logs from retrieval sessions from a text retrieval system, comprising:
- identifying contexts associated with a user query comprising at least one specific query keyword;
  
  identifying user queries having similar query contexts;
  
  partitioning user queries into groups based upon similarity of the query contexts;
  
  merging the groups to compute multiple contexts associated with specific query keywords; and
  
  applying a clustering algorithm to identify similar query contexts based upon the query keywords to generate context groups that associate keywords with documents accessed by users, wherein the step of partitioning comprises representing each context group as a directed acyclic graph, with each context group containing query contexts.
- View Dependent Claims (31)
- - 31. The method of claim 30 wherein the acyclic graph comprises a multi-level context directed acyclic graph (DAG).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Hewlett-Packard Company (HP Inc.)
Inventors
Chundi, Parvathi, Hsu, Meichun, Dayal, Umeshwar
Primary Examiner(s)
Robinson, Greta
Assistant Examiner(s)
Black, Linh

Application Number

US09/511,195
Time in Patent Office

1,042 Days
Field of Search

707/7-10, 707/100, 707/104.1, 704/9, 705/4, 705/5, 395/749
US Class Current

707/738
CPC Class Codes

G06F 16/00   Information retrieval; Data...

G06F 16/951   Indexing; Web crawling tech...

Y10S 707/917   Text

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Apparatus and method for discovering context groups and document categories by mining usage logs

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for discovering context groups and document categories by mining usage logs

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links