Methods and systems for automatically generating semantic/concept searches
First Claim
Patent Images
1. A method comprising:
- receiving, at one or more computer systems, a plurality of documents;
for each term in a set of terms associated with each document in the plurality of documents, generating a term vector for the term with one or more processors associated with the one or more computer systems based on a set of randomly indexed document vectors associated with documents in the plurality of documents in which the term appears, wherein each of the randomly indexed document vectors comprises a set of values, each of the set of values being a random value;
storing each generated term vector in association with its corresponding term in a storage device associated with the one or more computer systems;
generating, with the one or more processors associated with the one or more computer systems, a document vector for each document in the plurality of documents based on a set of term vectors associated with terms that appear in the document, wherein the set of term vectors is merged into each document vector based on the frequency of each term in each document;
storing each generated document vector in association with its corresponding document in the storage device associated with the one or more computer systems;
generating, with the one or more processors associated with the one or more computer systems, a query term vector for one or more query terms based on term vectors for terms that correspond to the one or more query terms;
generating, with the one or more processors associated with the one or more computer systems, a query based on a set of terms whose term vectors satisfy one or more conditions related to the query term vector; and
executing, with the one or more processors associated with the one or more computer systems, the query to obtain a set of documents in the plurality of documents that are relevant to a concept defined by the one or more query terms.
8 Assignments
0 Petitions
Accused Products
Abstract
In various embodiments, a semantic space associated with a corpus of electronically stored information (ESI) may be created and used for concept searches. Documents (and any other objects in the ESI, in general) may be represented as vectors in the semantic space. Vectors may correspond to identifiers, such as, for example, indexed terms. The semantic space for a corpus of ESI can be used in information filtering, information retrieval, indexing, and relevancy rankings.
82 Citations
20 Claims
-
1. A method comprising:
-
receiving, at one or more computer systems, a plurality of documents; for each term in a set of terms associated with each document in the plurality of documents, generating a term vector for the term with one or more processors associated with the one or more computer systems based on a set of randomly indexed document vectors associated with documents in the plurality of documents in which the term appears, wherein each of the randomly indexed document vectors comprises a set of values, each of the set of values being a random value; storing each generated term vector in association with its corresponding term in a storage device associated with the one or more computer systems; generating, with the one or more processors associated with the one or more computer systems, a document vector for each document in the plurality of documents based on a set of term vectors associated with terms that appear in the document, wherein the set of term vectors is merged into each document vector based on the frequency of each term in each document; storing each generated document vector in association with its corresponding document in the storage device associated with the one or more computer systems; generating, with the one or more processors associated with the one or more computer systems, a query term vector for one or more query terms based on term vectors for terms that correspond to the one or more query terms; generating, with the one or more processors associated with the one or more computer systems, a query based on a set of terms whose term vectors satisfy one or more conditions related to the query term vector; and executing, with the one or more processors associated with the one or more computer systems, the query to obtain a set of documents in the plurality of documents that are relevant to a concept defined by the one or more query terms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer-readable medium storing a plurality of instructions that cause a computer to perform operations comprising:
-
generating, for each term in a set of terms associated with each document in a plurality of documents, a term vector for the term based on a set of randomly indexed document vectors associated with documents in the plurality of documents in which the term appears, wherein each of the randomly indexed document vectors comprises a set of values, each of the set of values being a random value; storing each generated term vector in association with its corresponding term; generating a document vector for each document in the plurality of documents based on a set of term vectors associated with terms that appear in the document, wherein the set of term vectors is merged into each document vector based on the frequency of each term in each document; storing each generated document vector in association with its corresponding document; generating a query term vector for one or more query terms based on term vectors for terms that correspond to the one or more query terms; generating a query based on a set of terms whose term vectors satisfy one or more conditions related to the query term vector; and executing the query to obtain a set of documents in the plurality of documents that are relevant to a concept defined by the one or more query terms. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification