Method for Indexing for Retrieving Documents Using Particles
First Claim
1. A computer implemented method for indexing and retrieving documents in a database, comprising:
- the steps of;
converting each document in a collection of documents to a document particle graph, the document graph including particles selected from a set of the particles;
extracting, for each document, a set of document keys from the corresponding particle graph;
storing the document keys for each document in an index to a database storing the collection of documents;
converting a query to a query particle graph including a set of query particles, the query graph including particles selected from the set of the particles;
extracting a set of query keys from, the query particle graph;
retrieving relevant documents from the database according to the query keys and the document keys stored in the index; and
outputting the relevant documents to a user.
1 Assignment
0 Petitions
Accused Products
Abstract
An information retrieval system stores and retrieves documents using particles and a particle-based language model A set of particles for a collection of documents in a particular language is constructed from training documents such that a perplexity of the particle-based language model is substantially lower than the perplexity of a word-based language model constructed from the same training documents. The documents can then be converted to document particle graphs from which particle-based keys are extracted to form an index to the documents. Users can then retrieve relevant documents using queries also in the form of particle graphs.
30 Citations
26 Claims
-
1. A computer implemented method for indexing and retrieving documents in a database, comprising:
- the steps of;
converting each document in a collection of documents to a document particle graph, the document graph including particles selected from a set of the particles; extracting, for each document, a set of document keys from the corresponding particle graph; storing the document keys for each document in an index to a database storing the collection of documents; converting a query to a query particle graph including a set of query particles, the query graph including particles selected from the set of the particles; extracting a set of query keys from, the query particle graph; retrieving relevant documents from the database according to the query keys and the document keys stored in the index; and outputting the relevant documents to a user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- the steps of;
-
24. An information retrieval system, comprising:
-
means for converting each document in a collection of documents to a document particle graph, the document graph, the document graph including particles selected from a set of the particles; means for extracting, for each document, a set of document keys from the corresponding particle graph; means for storing the document keys for each document in an index to a database storing the collection of documents; means for converting a query to query particle graph including a set of query particles, the query graph including particles selected from the set of the particles; means for extracting a set of query keys from the query particle graph; means for retrieving relevant documents from the database according to the query keys and the document keys stored in the index; and means for outputting the relevant documents to a user.
-
-
25. A computer implemented method for indexing and retrieving documents in a database, comprising the steps of:
-
constructing a particle set from training documents using a particle-based language model, in which a perplexity of the particle-based language model is substantially lower than the perplexity of a word-based language model constructed from the same training documents; converting each document in a collection of documents to a document particle graph, the document graph including particles selected from the set of the particles; extracting, for each document, a set of document keys from the corresponding particle graph to form an index to the document; and retrieving, by a user, relevant documents using queries in a form of a query particle graph and keys extracted from the query particle graph.
-
-
26. An information retrieval system, comprising;
-
a database storing a collection of documents; an index to the database, in which entries in the index are in the form of particles, in which the particles are selected from a set of particles constructed, from training documents using a particle-based language model, and in which a perplexity of the particle-based language model is substantially lower than the perplexity of a word-based language model constructed from the same training;
documents; andmeans for accessing the documents by a user via the index using the particles.
-
Specification