Document processing
First Claim
Patent Images
1. A computerized search system comprising a processor and a storage device including instructions that are configured to run on the processor for accessing digitally stored data, comprising:
- a document interface configured to access a plurality of digitally stored documents that each includes content that relates to one or more subject matter topics,a topic model interface configured to access one or more digitally stored topic models that each classify information about one of the subject matter topics that can occur in the content, wherein the topic model interface is configured to access topic models that include hierarchical concept maps that map each of one or more ancestor concepts to a plurality of descendent concepts,document fingerprinting logic embodied in the computerized search system and responsive to the digitally stored topic models through the topic model interface and to the digitally stored documents through the document interface, and configured to create document fingerprints that each include a set of identifiers that each identify one of the topics from the digitally stored topic models in the content of the digitally stored documents,a query interface configured to receive user-specified queries,query fingerprinting logic embodied in the computerized search system and responsive to the digitally stored topic models through the topic model interface and to queries through the query interface, and configured to create query fingerprints that identify topics from the stored topic models in the queries, andsearch logic embodied in the computerized search system and configured to identify one or more of the digitally stored documents that are relevant to the queries, based on the query fingerprints and the document fingerprints.
1 Assignment
0 Petitions
Accused Products
Abstract
A search system and method are disclosed. In one general aspect the method features fingerprinting digitally stored documents and queries based on one or more topic models. Similarities between document and query fingerprints can be detected to search for documents. Topic summaries can also be derived from the documents and queries based on the topic models.
-
Citations
35 Claims
-
1. A computerized search system comprising a processor and a storage device including instructions that are configured to run on the processor for accessing digitally stored data, comprising:
-
a document interface configured to access a plurality of digitally stored documents that each includes content that relates to one or more subject matter topics, a topic model interface configured to access one or more digitally stored topic models that each classify information about one of the subject matter topics that can occur in the content, wherein the topic model interface is configured to access topic models that include hierarchical concept maps that map each of one or more ancestor concepts to a plurality of descendent concepts, document fingerprinting logic embodied in the computerized search system and responsive to the digitally stored topic models through the topic model interface and to the digitally stored documents through the document interface, and configured to create document fingerprints that each include a set of identifiers that each identify one of the topics from the digitally stored topic models in the content of the digitally stored documents, a query interface configured to receive user-specified queries, query fingerprinting logic embodied in the computerized search system and responsive to the digitally stored topic models through the topic model interface and to queries through the query interface, and configured to create query fingerprints that identify topics from the stored topic models in the queries, and search logic embodied in the computerized search system and configured to identify one or more of the digitally stored documents that are relevant to the queries, based on the query fingerprints and the document fingerprints. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A computer implemented method, employing a processor and a storage device including instructions that are configured to run on the processer, comprising:
-
accessing digitally stored documents that each include that relates to one or more subject matter topics, accessing one or more topic models that each classify information about one of the subject matter topics that can occur in the content, wherein the step of accessing accesses topic models that include hierarchical concept maps that map each of one or more ancestor concepts to a plurality of descendent concepts, fingerprinting the documents accessed in the step of accessing documents based on the topic models accessed in the step of accessing topic models, accessing one or more user-specified queries, fingerprinting the queries accessed in the step of accessing queries based on the topic models accessed in the step of accessing topic models, and detecting similarities between fingerprints produced by the step of fingerprinting documents and the step of fingerprinting queries.
-
-
35. A computerized search system, comprising a processor and a storage device including instructions that are configured to run on the processor for accessing digitally stored data, comprising:
-
means for accessing digitally stored documents that each content that relates to one or more subject matter topics, means for accessing one or more topic models that each classify information about one of the subject matter topics that can occur in the content, means for fingerprinting documents accessed by the means for accessing documents based on the topic models accessed by the means for accessing topic models, means for accessing one or more user-specified queries, wherein the means for accessing accesses topic models that include hierarchical concept maps that map each of one or more ancestor concepts to a plurality of descendent concepts, means for fingerprinting the queries accessed by the means for accessing queries based on the topic models accessed by the means for accessing topic models, and means for detecting similarities between fingerprints produced by the means for fingerprinting documents and the means for fingerprinting queries.
-
Specification