Method, device, and computer storage media for adding hyperlink to text
First Claim
1. A method for adding hyperlinks to hyperlink words in a text, comprising:
- creating a hyperlink word list in advance, the hyperlink word list comprising a plurality of hyperlink words;
collecting a variety of texts, and generating a characteristic word list by implementing word segmentation processing for each of the texts, the characteristic word list comprising a plurality of characteristic words;
respectively determining an IDF (inverse document frequency) value for each characteristic word after generating a characteristic word list by implementing word segmentation processing for each of the texts, wherein the IDF value is calculated by following processes;
obtaining a quotient by a quantity of the variety of texts collected divided by a quantity of texts appearing the characteristic word, and calculating a logarithm of the quotient;
for each of the characteristic words, computing a co-occurrence frequency between each of the characteristic words and each of the hyperlink words;
considering each text to be added a hyperlink as a text X, and processing the text X by following steps;
carrying out the word segmentation processing to the text X, and obtaining a segmentation result;
extracting the hyperlink words occurred in the hyperlink word list and the characteristic words occurred in the characteristic word list from the segmentation result;
computing a weight of each of the hyperlink words that are occurred in the hyperlink word list, and computing a weight of each of the characteristic words that are occurred in the characteristic word list, which comprises;
for each hyperlink word H, calculating the weight WH of the hyperlink word H;
WH=TFH*IDFH;
wherein, TFH represents TF (term frequency) value of the hyperlink word H and TFH refers to the quantity of the hyperlink word H appearing in the text X, and IDFH represents IDF value of the hyperlink word H;
for each characteristic word F, calculating the weight WH of the characteristic word F;
WF=TFF*IDFF;
wherein, TFF represents TF value of the characteristic word F, and IDFF represents IDF value of the characteristic word F;
determining a final weight of each of the hyperlink words according to each co-occurrence frequency and the weights of the hyperlink words;
descendingly sorting the hyperlink words occurred in the hyperlink word list according to the final weights of the hyperlink words, and obtaining K numbers of hyperlink words that are arranged in first; and
adding hyperlinks to the K numbers of hyperlink words, wherein K is a positive integer.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and devices for adding hyperlink to text are disclosed: generating hyperlink word list and characteristic word list in advance, determining co-occurrence frequency with each hyperlink word; to each text X which to be added the hyperlink, word segmentation processing them respectively, extracting the hyperlink word occurred in the hyperlink word list and the characteristic word occurred in the characteristic word list from results of word segmentation, determining weights of each extracted hyperlink word and extracted characteristic word, getting final weights of each extracted hypertext link word according to the co-occurrence frequency of each extracted characteristic word and each extracted hyperlink word and the weights; descendingly sorting each extracted hyperlink word according to the final weights, adding hyperlink to first k hyperlink words, and K is positive integer. Applying the solution, it can improve the relativity of the added hyperlink and the text, and it is easy to implement.
15 Citations
10 Claims
-
1. A method for adding hyperlinks to hyperlink words in a text, comprising:
-
creating a hyperlink word list in advance, the hyperlink word list comprising a plurality of hyperlink words; collecting a variety of texts, and generating a characteristic word list by implementing word segmentation processing for each of the texts, the characteristic word list comprising a plurality of characteristic words; respectively determining an IDF (inverse document frequency) value for each characteristic word after generating a characteristic word list by implementing word segmentation processing for each of the texts, wherein the IDF value is calculated by following processes;
obtaining a quotient by a quantity of the variety of texts collected divided by a quantity of texts appearing the characteristic word, and calculating a logarithm of the quotient;for each of the characteristic words, computing a co-occurrence frequency between each of the characteristic words and each of the hyperlink words; considering each text to be added a hyperlink as a text X, and processing the text X by following steps; carrying out the word segmentation processing to the text X, and obtaining a segmentation result; extracting the hyperlink words occurred in the hyperlink word list and the characteristic words occurred in the characteristic word list from the segmentation result; computing a weight of each of the hyperlink words that are occurred in the hyperlink word list, and computing a weight of each of the characteristic words that are occurred in the characteristic word list, which comprises; for each hyperlink word H, calculating the weight WH of the hyperlink word H;
WH=TFH*IDFH;
wherein, TFH represents TF (term frequency) value of the hyperlink word H and TFH refers to the quantity of the hyperlink word H appearing in the text X, and IDFH represents IDF value of the hyperlink word H; for each characteristic word F, calculating the weight WH of the characteristic word F;
WF=TFF*IDFF;
wherein, TFF represents TF value of the characteristic word F, and IDFF represents IDF value of the characteristic word F; determining a final weight of each of the hyperlink words according to each co-occurrence frequency and the weights of the hyperlink words; descendingly sorting the hyperlink words occurred in the hyperlink word list according to the final weights of the hyperlink words, and obtaining K numbers of hyperlink words that are arranged in first; and adding hyperlinks to the K numbers of hyperlink words, wherein K is a positive integer. - View Dependent Claims (2, 3, 4)
-
-
5. A device for adding hyperlinks to hyperlink words in a text, comprising:
-
a preprocessing module, configured to create a hyperlink word list in advance, collect a variety of texts, generate a characteristic word list by implementing word segmentation processing for each of the texts, and for each of the characteristic words computer a co-occurrence frequency between each characteristic word list and each hyperlink word, wherein the hyperlink word list comprising a plurality of hyperlink words and the characteristic word list comprising a plurality of characteristic words; the preprocessing module, further configured to respectively determine an inverse document frequency (IDF) value for each characteristic word, wherein the IDF value is calculated by obtaining a quotient by a quantity of the variety of texts collected divided by a quantity of texts appearing the characteristic words, and calculating a logarithm of the quotient; an adding module, configured to consider each text to be added a hyperlink as a text X, and process the text by following steps; carrying out the word segmentation processing to the text X, and obtaining a segmentation result; extracting the hyperlink words occurred in the hyperlink word list and the characteristic words occurred in the characteristic word list from the segmentation result; computing a weight of each of the hyperlink words that are occurred in the hyperlink word list, and computing a weight of each of the characteristic words that are occurred in the characteristic word list; determining a final weight of each of the hyperlink words according to each co-occurrence frequency and the weights of the hyperlink words; descendingly sorting the hyperlink words occurred in the hyperlink word list according to the final weights of the hyperlink words, and obtaining K numbers of hyperlink words that are arranged in first; and adding hyperlinks to the K numbers of hyperlink words, wherein, K is a positive integer; the adding module comprising a processing sub-unit module configured to extract the hyperlink words occurred in the hyperlink word list and the characteristic words occurred in the characteristic word list from the segmentation result;
for each hyperlink word H, calculate the weight WH of the hyperlink word H;
WH=TFH*IDFH, wherein TFH represents TF (term frequency) value of the hyperlink word H and TFH refers to the quantity of the hyperlink word H appearing in the text X, and IDFH represents IDF value of the hyperlink word H;
for each characteristic word F, calculate the weight WH of the extracted characteristic word F;
WF=TFF*IDFF;
wherein TFF represents TF value of the characteristic word F, and IDFF represents IDF value of the characteristic word F. - View Dependent Claims (6, 7, 8, 9)
-
-
10. A non-transitory computer readable storage medium, storing one or more programs for execution by one or more processors of a computer having a display, the one or more programs comprising instructions for:
-
creating a hyperlink word list in advance, the hyperlink word list comprising a plurality of hyperlink words; collecting a variety of texts, and generating a characteristic word list by implementing word segmentation processing for each of the texts, the characteristic word list comprising a plurality of characteristic words; respectively determining an IDF (inverse document frequency) value for each characteristic word after generating a characteristic word list by implementing word segmentation processing for each of the texts, wherein the IDF value is calculated by following processes;
obtaining a quotient by a quantity of the variety of texts collected divided by a quantity of texts appearing the characteristic word, and calculating a logarithm of the quotient;for each of the characteristic words, computing a co-occurrence frequency between each of the characteristic words and each of the hyperlink words; considering each text to be added a hyperlink as a text X, and processing the text X by following steps; carrying out the word segmentation processing to the text X, and obtaining a segmentation result; extracting the hyperlink words occurred in the hyperlink word list and the characteristic words occurred in the characteristic word list from the segmentation result; computing a weight of each of the hyperlink words that are occurred in the hyperlink word list, and computing a weight of each of the characteristic words that are occurred in the characteristic word list, which comprises; for each hyperlink word H, calculating the weight WH of the hyperlink word H;
WH=TFH*IDFH;
wherein, TFH represents TF (term frequency) value of the hyperlink word H and TFH refers to the quantity of the hyperlink word H appearing in the text X, and IDFH represents IDF value of the hyperlink word H; for each characteristic word F, calculating the weight WH of the characteristic word F;
WF=TFF*IDFF;
wherein, TFF represents TF value of the characteristic word F, and IDFF represents IDF value of the characteristic word F; determining a final weight of each of the hyperlink words according to each co-occurrence frequency and the weights of the hyperlink words; descendingly sorting the hyperlink words occurred in the hyperlink word list according to the final weights of the hyperlink words, and obtaining K numbers of hyperlink words that are arranged in first; and adding hyperlinks to the K numbers of hyperlink words, wherein K is a positive integer.
-
Specification