Determination of a semantic snapshot
First Claim
1. A method of characterizing a document wherein a series of statistical properties of text in the document is determined, the method comprising:
- determining a list of words occurring in the document;
determining a frequency of occurrence for each word in the list; and
building up the series with pairs, each pair having one word from the list and the frequency of that word, wherein the series forms a semantic snapshot of the document.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for characterizing a document are described, particularly for the recognition, organization or relating of documents, for which purpose a series of statistical properties of the text in the document is determined. A list of words occurring in the document is determined and a frequency of occurrence is determined for each word in the list. The series is then built up of pairs respectively of one word from the list and the frequency of that word, where the series forms a semantic snapshot of the document. The semantic snapshot is used for comparing documents with one another or for comparing with a semantic snapshot of a specific area of attention or subject, so that the relevance of the document to that subject is determined.
-
Citations
29 Claims
-
1. A method of characterizing a document wherein a series of statistical properties of text in the document is determined, the method comprising:
-
determining a list of words occurring in the document;
determining a frequency of occurrence for each word in the list; and
building up the series with pairs, each pair having one word from the list and the frequency of that word, wherein the series forms a semantic snapshot of the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer program product embodied on at least one computer-readable medium, for characterizing a document, the computer program product comprising computer-executable instructions for:
-
determining a list of words occurring in the document;
determining a frequency of occurrence for each word in the list; and
building up the series with pairs, each pair having one word from the list and the frequency of that word, wherein the series forms a semantic snapshot of the document. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A data signal, wherein the signal represents a data structure of a semantic snapshot as formed by:
-
determining a list of words occurring in a document;
determining a frequency of occurrence for each word in the list; and
building up a series of statistical properties of text in the document with pairs, each pair having one word from the list and the frequency of the word, wherein the series forms the semantic snap shop of the document. - View Dependent Claims (19)
-
-
20. An apparatus for processing documents, the apparatus comprising:
-
a module for characterizing a document by using a series of statistical properties of text of the document, wherein the module determines a list of words occurring in the document, determines a frequency of occurrence for each word in the list, and builds up the series from pairs, each pair having one word from the list and the frequency of that word, wherein the series forms a semantic snapshot of the document. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification