System and method of context vector generation and retrieval
First Claim
1. In a computer having a processor and storage, a computer-implemented process of generating a set of summary vectors in a relative vector space such that for any subset of summary vectors associated with a subset of records there is a single logical relative orientation of the summary vectors that defines the relative meaning of the records, and a plurality of absolute orientations of the summary records, comprising the steps of:
- (a) providing a training set of records for processing by the processor, each record containing a plurality of information elements;
(b) assigning to selected information elements in each record an initial context vector consisting solely of a plurality of randomly generated component data values;
(c) for selected information elements in each record, modifying the initial context vector of the selected information element by a function of the context vectors of information elements within a selected proximity to the selected information element and a proximity constraint that varies a magnitude of the modification to the initial context vector;
(d) for each record, determining a summary vector by combining the modified context vectors of the information elements of the record according to program instructions in the storage and executed on the processor; and
(e) storing the determined summary vectors in the computer storage.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for generating context vectors for use in storage and retrieval of documents and other information items. Context vectors represent conceptual relationships among information items by quantitative means. A neural network operates on a training corpus of records to develop relationship-based context vectors based on word proximity and co-importance using a technique of "windowed co-occurrence". Relationships among context vectors are deterministic, so that a context vector set has one logical solution, although it may have a plurality of physical solutions. No human knowledge, thesaurus, synonym list, knowledge base, or conceptual hierarchy, is required. Summary vectors of records may be clustered to reduce searching time, by forming a tree of clustered nodes. Once the context vectors are determined, records may be retrieved using a query interface that allows a user to specify content terms, Boolean terms, and/or document feedback. The present invention further facilitates visualization of textual information by translating context vectors into visual and graphical representations. Thus, a user can explore visual representations of meaning, and can apply human visual pattern recognition skills to document searches.
-
Citations
39 Claims
-
1. In a computer having a processor and storage, a computer-implemented process of generating a set of summary vectors in a relative vector space such that for any subset of summary vectors associated with a subset of records there is a single logical relative orientation of the summary vectors that defines the relative meaning of the records, and a plurality of absolute orientations of the summary records, comprising the steps of:
-
(a) providing a training set of records for processing by the processor, each record containing a plurality of information elements; (b) assigning to selected information elements in each record an initial context vector consisting solely of a plurality of randomly generated component data values; (c) for selected information elements in each record, modifying the initial context vector of the selected information element by a function of the context vectors of information elements within a selected proximity to the selected information element and a proximity constraint that varies a magnitude of the modification to the initial context vector; (d) for each record, determining a summary vector by combining the modified context vectors of the information elements of the record according to program instructions in the storage and executed on the processor; and (e) storing the determined summary vectors in the computer storage. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A system for generating a set of summary vectors, comprising:
-
a storage device containing a training set of records, each record containing a plurality of information elements; an initial context vector generator, coupled to the storage device, for assigning to selected information elements in each record an initial context vector comprising solely a plurality of randomly generated component data values; an iterative training device, coupled to the storage device, for modifying the initial context vector of the selected information elements by a function of the context vectors of information elements within a selected proximity to the selected information element and a proximity constraint that varies a magnitude of the modification to the initial context vector; a vector combiner, coupled to the storage device, for combining the context vectors of the information elements of each record to determine a summary vector for the record and storing the determined summary vectors in the storage device. - View Dependent Claims (25, 26, 27, 28)
-
-
29. A computer-implemented process of generating a dictionary of information elements each associated with a context vector for a database of records, each record containing at least one information element, each information element having a determinate proximity to other information elements, comprising the steps of:
-
(a) selecting a plurality of information elements for including in the dictionary; (b) assigning to the selected information elements an initial context vector consisting solely of a plurality of randomly generated component data values; (c) for each selected information element being a target information element, iteratively modifying its context vector by the context vectors of neighbor information elements as a function of the proximity of the neighbor information elements to the target information element, the modified context vectors of the selected information elements forming the dictionary of context vectors. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36)
-
-
37. In a computer system including a storage device containing a plurality of records, each record containing a plurality of information elements, a computer readable memory for configuring and controlling a processor in the computer system to generate a dictionary of context vectors, the computer readable memory comprising:
-
an initial context vector generator, coupled to the storage device, for assigning to selected information elements in each record an initial context vector comprising solely of a plurality of randomly generated component data values; and
,an iterative training device, coupled to the storage device, for iteratively modifying the context vector of each selected information element, being a target information element, by the context vectors of neighbor information elements as a function of the proximity of the neighbor information element to the target information element, the modified context vectors of the selected information elements forming the dictionary of context vectors. - View Dependent Claims (38, 39)
-
Specification