Context vector generation and retrieval
First Claim
1. In a computer having a processor, a storage, and a display unit, a computer implemented method of generating context vectors representing information items for retrieval of the information items or records containing the information items, the method comprising:
- initializing the context vectors such that the context vectors are substantially orthogonal to each other in a vector space;
assigning a context vector to each of a plurality of information items;
determining proximal co-occurrences of the information items;
adjusting the context vectors based on the proximal co-occurrences of the information items, such that the information items that frequently proximally co-occur have context vectors with similar orientations in the vector space; and
using said adjusted context vectors for retrieving said information items or records containing said information items; and
displaying on said display unit said retrieved information items or records containing the information items.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for generating context vectors for use in storage and retrieval of documents and other information items. Context vectors represent conceptual relationships among information items by quantitative means. A neural network operates on a training corpus of records to develop relationship-based context vectors based on word proximity and co-importance using a technique of “windowed co-occurrence”. Relationships among context vectors are deterministic, so that a context vector set has one logical solution, although it may have a plurality of physical solutions. No human knowledge, thesaurus, synonym list, knowledge base, or conceptual hierarchy, is required. Summary vectors of records may be clustered to reduce searching time, by forming a tree of clustered nodes. Once the context vectors are determined, records may be retrieved using a query interface that allows a user to specify content terms, Boolean terms, and/or document feedback. The present invention further facilitates visualization of textual information by translating context vectors into visual and graphical representations. Thus, a user can explore visual representations of meaning, and can apply human visual pattern recognition skills to document searches.
343 Citations
35 Claims
-
1. In a computer having a processor, a storage, and a display unit, a computer implemented method of generating context vectors representing information items for retrieval of the information items or records containing the information items, the method comprising:
-
initializing the context vectors such that the context vectors are substantially orthogonal to each other in a vector space; assigning a context vector to each of a plurality of information items; determining proximal co-occurrences of the information items; adjusting the context vectors based on the proximal co-occurrences of the information items, such that the information items that frequently proximally co-occur have context vectors with similar orientations in the vector space; and using said adjusted context vectors for retrieving said information items or records containing said information items; and displaying on said display unit said retrieved information items or records containing the information items. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20)
-
-
18. In a computer having a processor, a database, and a display unit, a computer implemented method of generating vectors representing information items for retrieval of the information items, the method comprising:
-
selecting a set of R information items in the database; determining for the selected set of information elements an R×
R mutual co-occurrence matrix based on proximal co-occurences of the information items in a plurality of documents;applying Singular Value Decomposition to the mutual co-occurrence matrix to produce a set of first context vectors; and wherein each first context vector is uniquely associated with one of the selected information items, and wherein information items having similar meaning have respective first context vectors with similar orientations in the vector space; and using said context vectors for retrieval of information items from said database; and displaying said retrieved information items on said display unit.
-
-
21. In a computer having a processor, a database, and a display unit, a computer implemented method of retrieving a record from the database containing a plurality of records, each record containing at least one information item having an associated context vector, the method comprising:
-
for each of a plurality of information items, storing context vectors associated with the information item the context vectors having the properties that information items having similar meaning have context vectors with similar orientations in a vector space, and information items having dissimilar meanings have context vectors with dissimilar orientations in the vector space; for each of the plurality of records from said database containing said plurality of records, storing a summary context vector derived from context vectors respectively associated with information items that comprise the record; receiving a query; deriving at least one query information item from the query; generating a query context vector from the query information item; and selecting at least one record from said database having a summary context vector with orientation in the vector space that is similar to the orientation of the query context vector; and displaying said selected at least one record on said display unit. - View Dependent Claims (22)
-
-
23. In a computer having a processor, a data storage, and a display unit, a computer implemented method of providing a space for information items, said information items encoding and representing predetermined meanings, the method comprising:
-
selecting a set of first information items in a corpus of records in said data storage; creating a first set of context vectors based on proximal co-occurrences of the first information items, each first context vector uniquely associated with one of the first information items, the context vectors having an orientation in a vector space, such that first information items having similar predetermined encoded meaning have context vectors with similar orientations in the vector space; selecting a set of second information items in said data storage, the second information items being different from the first information items; selecting a subset of the first information items; for each first information item in the subset, selecting a corresponding second information item representing a predetermined encoded meaning substantially identical to the predetermined encoded meaning of the first information item; for each of the selected second information items, associating the second information item with the context vector of the corresponding first information item; assigning a context vector to each non-selected second information item; adjusting the context vectors of the non-selected second information items using the context vectors of the selected second information items; and using any of said context vectors for later retrieval of said first information items; and displaying said retrieved first information items on said display unit. - View Dependent Claims (24, 25, 26)
-
-
27. In a computer having a processor, a database of records, and a display unit, a computer implemented method of generating a dictionary of information items for said database of records, each record including at least one information item, each information item associated with a context vector, each information item having a determinate proximity to other information items in a record, wherein a neighbor information item is an information item that occurs proximate a target information item in at least one record in the database, the method comprising:
-
initializing the context vectors such that initial context vectors are substantially orthogonal to each other in a vector space; associating the context vectors with information items in the dictionary of information items for said database of records; and for each information item being a target information item; selecting neighbor information items of the target information item in at least one record; and modifying the context vector of the target information item using the context vectors of each selected neighbor information items as a function of the proximity of each neighbor information item to the target information item, and a co-importance of the target information item and the neighbor information item; using said dictionary for retrieving information items representing media during a search; and displaying said retrieved information items on said display unit. - View Dependent Claims (28, 29, 30, 31, 32)
-
-
33. In a computer system including a storage device containing a plurality of records, each record containing a plurality of information items, a computer readable medium for configuring and controlling the computer system to generate a plurality of context vectors, the computer readable medium comprising:
-
an initial context vector generation module, adapted to read and write to the storage device, which initializes to each of a plurality of selected information items an initial context vector, such that the initial context vectors are substantially orthogonal to each other in a vector space, and which writes the initial context vectors to the storage device in association with respective information items; a vector training module, adapted to read and write to the storage device, for modifying the context vector of a selected information item, being a target information item, using the context vectors of neighbor information items that proximally co-occur with the target information item, as a function of the proximity of each neighbor information item to the target information item, and a co-importance of the target information item and the neighbor information item; a retrieval module configured to use said initial context vector generation module and said vector training module in retrieving information items or records containing said information items, said information items representing media; and a display module for displaying said retrieved information items or records.
-
-
34. In a computer having a processor, a database, and a display unit, a computer implemented method of automatically indexing documents using a defined index of terms, the method comprising:
-
providing an indexed collection of documents in said database, each document having at least one index term assigned to the document; providing a plurality terms, including the index terms, each term associated with a context vector, the context vector having the properties that that terms having similar meaning have context vectors with similar orientations in a vector space, terms having dissimilar meanings have context vectors with dissimilar orientations in the vector space, and terms which frequently proximally co-occur have context vectors with similar orientations in the vector space; and generating for each indexed document a context vector from the context vectors of selected terms that comprise the document; receiving a new document in said database to be indexed; generating a new context vector of the new document, the new context vector generated from the context vectors of selected terms that comprise the new document; selecting at least one indexed document having a context vector similar to the new context vector; assigning to the new document at least one index term assigned to a selected indexed document; such that said new document can later be retrieved from said database; and
displaying said retrieved document on said display unit. - View Dependent Claims (35)
-
Specification