Organising and storing documents
First Claim
1. A data handling device for organising and storing documents for subsequent retrieval, the documents having associated metadata terms, the device comprising:
- means configured to provide access to a store of existing metadata;
means configured to analyse the existing metadata to generate statistical data as to co-occurrence of pairs of terms in the metadata of a single document;
means configured to analyse a fresh document to assign to the fresh document a set of terms and configured to determine a measure of a strength of association of each term with the document;
means configured to determine for each term of the set of terms a score that is a monotonically increasing function of (a) the strength of association with the document and of (b) a relative frequency of co-occurrence, in the existing metadata, of the term and another term that occurs in the set of terms;
means configured to select, as metadata for the fresh document, a subset of the terms in the set of terms having highest scores.
1 Assignment
0 Petitions
Accused Products
Abstract
A data handling device has access to a store of existing metadata pertaining to existing documents having associated metadata terms. It analyses the metadata to generate statistical data as to the co-occurrence of pairs of terms in the metadata of one and the same document. When a fresh document is received, it is analysed to assign to it a set of terms and determine for each a measure of their strength of association with the document. Then, for each term of the set, a score is generated that is a monotonically increasing function of (a) the strength of association with the document and of (b) the relative frequency of co-occurrence of that term and another term that occurs in the set; metadata for the fresh document are then selected as the subset of the terms in the set having the highest scores.
-
Citations
18 Claims
-
1. A data handling device for organising and storing documents for subsequent retrieval, the documents having associated metadata terms, the device comprising:
-
means configured to provide access to a store of existing metadata; means configured to analyse the existing metadata to generate statistical data as to co-occurrence of pairs of terms in the metadata of a single document; means configured to analyse a fresh document to assign to the fresh document a set of terms and configured to determine a measure of a strength of association of each term with the document; means configured to determine for each term of the set of terms a score that is a monotonically increasing function of (a) the strength of association with the document and of (b) a relative frequency of co-occurrence, in the existing metadata, of the term and another term that occurs in the set of terms; means configured to select, as metadata for the fresh document, a subset of the terms in the set of terms having highest scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of organising and storing documents in a computer system for subsequent retrieval, the documents having associated metadata terms, the method comprising:
-
providing access to a store of existing metadata in the computer system; analysing the existing metadata to generate statistical data as to co-occurrence of pairs of terms in the metadata of a single document; analysing a fresh document to assign to the fresh document a set of terms and determine for each term of the set a measure of a strength of association of the term with the document; determining for each term of the set a score that is a monotonically increasing function of (a) the strength of association with the document and of (b) a relative frequency of co-ocurrence, in the existing metadata, of the term and another term that occurs in the set; and selecting, as metadata for the fresh document, a subset of the terms in the set having highest scores. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification