Document analyzer and metadata generation and use
First Claim
1. A method comprising:
- receiving a collection of text-based terms associated with a document;
performing a statistical analysis on the text-based terms to identify a distribution and relative frequency of the text-based terms in the document;
receiving a listing of multiple themes associated with the document, the listing of multiple themes being derived as a result of performing a semantic analysis of the document; and
utilizing the distribution and relative frequency information derived from the statistical analysis to rank the multiple themes.
2 Assignments
0 Petitions
Accused Products
Abstract
A document analyzer receives a collection of text-based terms associated with a document. The document analyzer performs a statistical analysis on the text-based terms to identify a distribution of where the text-based terms appear in the document and relative frequency indicating how often the text-based terms appear in the document. The document analyzer utilizes the distribution and relative frequency information derived from the statistical analysis to rank multiple themes associated with the document. For example, a received listing of multiple themes may not be presented in any useful order, although it can be assumed that the themes in the listing are present in the document. Based on application of distribution and relative frequency information derived from the analysis, the document analyzer can identify which themes are most relevant to the document as a whole and/or which of themes correspond to different portions (e.g., pages or sections) of the document.
-
Citations
26 Claims
-
1. A method comprising:
-
receiving a collection of text-based terms associated with a document; performing a statistical analysis on the text-based terms to identify a distribution and relative frequency of the text-based terms in the document; receiving a listing of multiple themes associated with the document, the listing of multiple themes being derived as a result of performing a semantic analysis of the document; and utilizing the distribution and relative frequency information derived from the statistical analysis to rank the multiple themes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method comprising:
-
receiving a ranking of text-based terms associated with a document, the ranking being based on a statistical analysis of the text-based terms in the document; receiving a listing of multiple themes associated with the document, the listing of multiple themes being derived as a result of performing a semantic analysis of the document; and utilizing the ranking of the text-based terms to rank the multiple themes. - View Dependent Claims (15, 16)
-
-
17. A computer readable medium having computer code thereon, the medium comprising:
-
instructions for receiving a collection of text-based terms associated with a document; instructions for performing a statistical analysis on the text-based terms to identify a distribution and relative frequency of the text-based terms in the document; instructions for receiving a listing of multiple themes associated with the document, the listing of multiple themes being derived as a result of performing a semantic analysis of the document; and instructions for utilizing the distribution and relative frequency information derived from the statistical analysis to rank the multiple themes. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
-
25. A computer program product including a computer-readable medium having instructions stored thereon for processing data information, such that the instructions, when carried out by a processing device, enable the processing device to perform the operations of:
-
receiving a collection of text-based terms associated with a document; performing a statistical analysis on the text-based terms to identify a distribution and relative frequency of the text-based terms in the document; receiving a listing of multiple themes associated with the document, the listing of multiple themes being derived as a result of performing a semantic analysis of the document; and utilizing the distribution and relative frequency information derived from the statistical analysis to rank the multiple themes.
-
-
26. A computer system comprising:
-
a processor; a memory unit that stores instructions associated with an application executed by the processor; and an interconnect coupling the processor and the memory unit, enabling the computer system to execute the application and perform operations of; receiving a collection of text-based terms associated with a document; performing a statistical analysis on the text-based terms to identify a distribution and relative frequency of the text-based terms in the document; receiving a listing of multiple themes associated with the document, the listing of multiple themes being derived as a result of performing a semantic analysis of the document; and utilizing the distribution and relative frequency information derived from the statistical analysis to rank the multiple themes.
-
Specification