Three-dimensional latent semantic analysis
First Claim
Patent Images
1. A method, comprising:
- accessing, by one or more processors, a plurality of information files;
generating, by the one or more processors, term-passage matrix data based on the plurality of information files;
decomposing the term-passage matrix data to generate a reduced-dimensional semantic space,wherein a number of rows of the term-passage matrix data corresponds to a sum of a number of distinct words in the plurality of information files and a number of distinct word pairs in the plurality of information files,wherein a number of columns of the term-passage matrix data corresponds to a number of passages in the plurality of information files, andwherein the term-passage matrix data indicates a frequency of occurrence of each individual word of the distinct words in the plurality of information files and a frequency of occurrence of each individual word pair of the distinct word pairs in the plurality of information files;
responsive to a query, determining a pseudo object associated with the query in the reduced-dimensional semantic space;
examining one or more similarities between the pseudo object and words in the plurality of information files in the reduced-dimensional semantic space; and
determining a passage from the plurality of information files based on the one or more similarities.
6 Assignments
0 Petitions
Accused Products
Abstract
In some examples, a computing system may access multiple information files, generate term-passage matrix data based on the multiple information files, and decompose the term-passage matrix data to generate a reduced-dimensional semantic space, which may be used for information retrieval.
58 Citations
16 Claims
-
1. A method, comprising:
-
accessing, by one or more processors, a plurality of information files; generating, by the one or more processors, term-passage matrix data based on the plurality of information files; decomposing the term-passage matrix data to generate a reduced-dimensional semantic space, wherein a number of rows of the term-passage matrix data corresponds to a sum of a number of distinct words in the plurality of information files and a number of distinct word pairs in the plurality of information files, wherein a number of columns of the term-passage matrix data corresponds to a number of passages in the plurality of information files, and wherein the term-passage matrix data indicates a frequency of occurrence of each individual word of the distinct words in the plurality of information files and a frequency of occurrence of each individual word pair of the distinct word pairs in the plurality of information files; responsive to a query, determining a pseudo object associated with the query in the reduced-dimensional semantic space; examining one or more similarities between the pseudo object and words in the plurality of information files in the reduced-dimensional semantic space; and determining a passage from the plurality of information files based on the one or more similarities. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions executable by one or more processors to perform operations comprising:
-
generating term-passage matrix data to represent a plurality of information files, wherein the term-passage matrix data indicates a frequency of occurrence of each individual word of a plurality of distinct words in the plurality of information files and further indicates a frequency of occurrence of each individual word combination of a plurality of distinct word combinations in the plurality of information files; decomposing the term-passage matrix data to generate a reduced-dimensional semantic space; in response to a query, determining a pseudo object associated with the query in the reduced-dimensional semantic space; examining one or more similarities between the pseudo object and words in the plurality of the information files in the reduced-dimensional semantic space; and determining a passage from the plurality of information files based on the one or more similarities. - View Dependent Claims (8, 9, 10, 11)
-
-
12. An apparatus, comprising:
-
one or more processors; and a memory configured to store a plurality of components executable by the one or more processors, the plurality of components comprising; an information accessing module configured to access a plurality of information files; a latent semantic analysis (LSA) module configured to; generate term-passage matrix data based on the plurality of information files, wherein a number of rows of the term-passage matrix data corresponds to a sum of a number of distinct words in the plurality of information files and a number of distinct word pairs in the plurality of information files, wherein a number of columns of the term-passage matrix data corresponds to a number of passages in the plurality of information files, and wherein the term-passage matrix data indicates a frequency of occurrence of each individual word of the distinct words in the plurality of information files and a frequency of occurrence of each individual word pair of the distinct word pairs in the plurality of information files; and generate a reduced-dimensional semantic space based on the term-passage matrix data; and an information retrieval module configured to; responsive to a query, determine a pseudo object associated with the query in the reduced-dimensional semantic space; examine one or more similarities between the pseudo object and words in the plurality of information files in the reduced-dimensional semantic space; and determine a passage in the plurality of information files based on the one or more similarities. - View Dependent Claims (13, 14, 15, 16)
-
Specification