System and method of structuring data for search using latent semantic analysis techniques
First Claim
Patent Images
1. A computer-based method of organizing data for search, the method comprising the steps of:
- accessing a domain corpus;
parsing the domain corpus into a plurality of documents;
parsing each document into at least one term that corresponds to the document;
generating a term-to-document matrix that correlates each document with the at least one term that corresponds to the document, the at least one term defining a document node for the document;
performing a singular value decomposition and a dimension reduction on the term-to-document matrix to form a reformed term-to-document matrix having document nodes with fewer dimensions than the document nodes of the term-to-document matrix;
comparing at least one document node of the reformed term-to-document matrix against another document node of the reformed term-to-document matrix; and
combining at least one document node of the term-to-document matrix with another document node of the term-to-document matrix, based on the comparison of the at least one document node of the reformed tem-to-document matrix against the another document node of the reformed term-to-document matrix, to form a combined document node representing the combination of the at least one document node of the term-to-document matrix with the another document node of the term-to-document matrix, thereby clustering at least two document nodes of the term-to-document matrix.
3 Assignments
0 Petitions
Accused Products
Abstract
The disclosed embodiments provide a system and method for using modified Latent Semantic Analysis techniques to structure data for efficient search and display. The present invention creates a hierarchy of clustered documents, representing the topics of a domain corpus, through a process of optimal agglomerative clustering. The output from a search query is displayed in a fisheye view corresponding to the hierarchy of clustered documents. The fisheye view may link to a two-dimensional self-organizing map that represents semantic relationships between documents.
-
Citations
23 Claims
-
1. A computer-based method of organizing data for search, the method comprising the steps of:
-
accessing a domain corpus; parsing the domain corpus into a plurality of documents; parsing each document into at least one term that corresponds to the document; generating a term-to-document matrix that correlates each document with the at least one term that corresponds to the document, the at least one term defining a document node for the document; performing a singular value decomposition and a dimension reduction on the term-to-document matrix to form a reformed term-to-document matrix having document nodes with fewer dimensions than the document nodes of the term-to-document matrix; comparing at least one document node of the reformed term-to-document matrix against another document node of the reformed term-to-document matrix; and combining at least one document node of the term-to-document matrix with another document node of the term-to-document matrix, based on the comparison of the at least one document node of the reformed tem-to-document matrix against the another document node of the reformed term-to-document matrix, to form a combined document node representing the combination of the at least one document node of the term-to-document matrix with the another document node of the term-to-document matrix, thereby clustering at least two document nodes of the term-to-document matrix. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 23)
-
-
13. A non-transitory computer readable medium storing instructions that when executed by a processor, cause the processor to:
-
access a domain corpus; parse the domain corpus into a plurality of documents; parse each document into at least one term that corresponds to the document; generate a term-to-document matrix that correlates each document with the at least one term that corresponds to the document, the at least one term defining a document node for the document; perform singular value decomposition and a dimension reduction on the term-to-document matrix to form a reformed term-to-document matrix having document nodes with fewer dimensions than the document nodes of the term-to-document matrix; compare at least one document node of the reformed term-to-document matrix against another document node of the reformed term-to-document matrix; and combine at least one document node of the term-to-document matrix with another document node of the term-to-document matrix, based on the comparison of the at least one document node of the reformed term-to-document matrix against the another document node of the reformed term-to-document matrix, to form a combined document node representing the combination of the at least one document node of the term-to-document matrix with the another document node of the term-to-document matrix, thereby clustering at least two document nodes of the term-to-document matrix. - View Dependent Claims (14, 15, 16)
-
-
17. A computer-based method of ascertaining semantic relationships between documents, the method comprising the steps of:
-
accessing a domain corpus; parsing the domain corpus into a plurality of documents; parsing each document into at least one term that corresponds to the document; generating a term-to-document matrix that correlates each document with the at least one term that corresponds to the document, the at least one term defining a document node for the document; performing a singular value decomposition and a dimension reduction on the term-to-document matrix to from a reformed term-to-document matrix having document nodes with fewer dimensions than the document nodes of the term-to-document matrix; selecting a document node of the reformed term-to-document matrix; filtering terms of at least one document node of the reformed term-to/-document matrix that is not the selected document node, based on which at least one term defines a document node in the term-to-docuinent matrix that corresponds to the document node that has been selected, to form at least one filtered document node that has not been selected; and displaying an output of a similarity between the document node that has been selected, and the at least one filtered document node that has not been selected, thereby displaying a semantic relationship between the document node that has been selected and the at least one filtered document node that has not been selected. - View Dependent Claims (18, 19, 20)
-
-
21. A non-transitory computer readable medium storing instructions that when executed by a processor, cause the processor to:
-
access a domain corpus; parse the domain corpus into a plurality of documents; parse each document into at least one term that corresponds to the document; generate a term-to-document matrix that correlates each document with the at least one term that corresponds to the document, the at least one term defining a document node for the document; perform a singular value decomposition and a dimension reduction on the term-to-document matrix to form a reformed term-to-document matrix having document nodes with fewer dimensions than the document nodes of the term-to-document matrix; select a document node of the reformed term-to-document matrix; filter terms of at least one document node of the reformed term-to-document matrix that is not the selected document node, based on which at least one term defines a document node in the term-to-document matrix that corresponds to the document node that has been selected, to form at least one filtered document node that has not been selected; and display an output of a similarity between the document node that has been selected, and the at least one filtered document node that has not been selected, thereby displaying a semantic relationship between the document node that has been selected and the at least one filtered document node that has not been selected. - View Dependent Claims (22)
-
Specification