METHOD, APPARATUS, AND COMPUTER STORAGE MEDIUM FOR AUTOMATICALLY ADDING TAGS TO DOCUMENT
First Claim
1. A method for automatically adding a tag to a document, comprising:
- determining, by an apparatus comprising a processor, a plurality of candidate tag words corresponding to the document;
determining, by the apparatus, a corpus comprising a plurality of texts;
selecting, by the apparatus, commonly-used words from the corpus as characteristic words;
determining, by the apparatus, for each of the characteristic words and each of the candidate tag words, a probability for co-occurrence of the candidate tag word with the characteristic word;
abstracting, by the apparatus, characteristic words from the document;
calculating, by the apparatus, a weight for each of the abstracted characteristic words;
calculating, by the apparatus, in the corpus, a weighted probability for co-occurrence of each of the candidate tag words with all of the characteristic words abstracted from the document; and
selecting, by the apparatus, the candidate tag word with a high weighted co-occurrence probability as a tag word to be added to the document.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for automatically adding a tag to a document are provided. The method comprises: determining a plurality of candidate tag words corresponding to the document; determining a corpus comprising a plurality of texts; selecting commonly-used words from the corpus as characteristic words; determining, for each of the characteristic words and each of the candidate tag words, a probability for co-occurrence of the candidate tag word with the characteristic word; abstracting characteristic words from the document, and calculating a weight for each of the abstracted characteristic words; and calculating, in the corpus, a weighted probability for co-occurrence of each of the candidate tag words with all of the characteristic words abstracted from the document; selecting the candidate tag word with a high weighted co-occurrence probability as a tag word to be added to the document.
-
Citations
20 Claims
-
1. A method for automatically adding a tag to a document, comprising:
-
determining, by an apparatus comprising a processor, a plurality of candidate tag words corresponding to the document; determining, by the apparatus, a corpus comprising a plurality of texts; selecting, by the apparatus, commonly-used words from the corpus as characteristic words; determining, by the apparatus, for each of the characteristic words and each of the candidate tag words, a probability for co-occurrence of the candidate tag word with the characteristic word; abstracting, by the apparatus, characteristic words from the document; calculating, by the apparatus, a weight for each of the abstracted characteristic words; calculating, by the apparatus, in the corpus, a weighted probability for co-occurrence of each of the candidate tag words with all of the characteristic words abstracted from the document; and selecting, by the apparatus, the candidate tag word with a high weighted co-occurrence probability as a tag word to be added to the document. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for automatically adding a tag to a document, comprising:
-
a candidate tag word determining module comprising a processor, configured to determine a plurality of candidate tag words corresponding to the document; a co-occurrence probability determining module comprising a processor, configured to determine a corpus comprising a plurality of texts, select commonly-used words from the corpus as characteristic words, and determine, for each of the characteristic words and each of the candidate tag words, a probability for co-occurrence of the candidate tag word with the characteristic word; a weight calculating module comprising a processor, configured to abstract characteristic words from the document, and calculate a weight for each of the abstracted characteristic words; a weighted co-occurrence probability calculating module comprising a processor, configured to calculate, in the corpus, a weighted probability for co-occurrence of each of the candidate tag words with all of the characteristic words abstracted from the document; and a tag word adding module comprising a processor, configured to select the candidate tag word with a high weighted co-occurrence probability as a tag word to be added to the document. - View Dependent Claims (9, 10, 11, 12, 13, 14, 16, 17, 18, 19)
-
-
15. A computer storage medium storing computer program codes for implementing a method for automatically adding a tag to a document, executable by a computer, wherein the computer program codes comprise:
-
instructions for determining a plurality of candidate tag words corresponding to the document; instructions for determining a corpus comprising a plurality of texts; instructions for selecting commonly-used words from the corpus as characteristic words; instructions for determining, for each of the characteristic words and each of the candidate tag words, a probability for co-occurrence of the candidate tag word with the characteristic word; instructions for abstracting characteristic words from the document; instructions for calculating a weight for each of the abstracted characteristic words; instructions for calculating, in the corpus, a weighted probability for co-occurrence of each of the candidate tag words with all of the characteristic words abstracted from the document; and instructions for selecting the candidate tag word with a high weighted co-occurrence probability as a tag word to be added to the document. - View Dependent Claims (20)
-
Specification