×

Method, device, and computer storage media for adding hyperlink to text

  • US 9,483,447 B2
  • Filed: 02/08/2013
  • Issued: 11/01/2016
  • Est. Priority Date: 03/29/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method for adding hyperlinks to hyperlink words in a text, comprising:

  • creating a hyperlink word list in advance, the hyperlink word list comprising a plurality of hyperlink words;

    collecting a variety of texts, and generating a characteristic word list by implementing word segmentation processing for each of the texts, the characteristic word list comprising a plurality of characteristic words;

    respectively determining an IDF (inverse document frequency) value for each characteristic word after generating a characteristic word list by implementing word segmentation processing for each of the texts, wherein the IDF value is calculated by following processes;

    obtaining a quotient by a quantity of the variety of texts collected divided by a quantity of texts appearing the characteristic word, and calculating a logarithm of the quotient;

    for each of the characteristic words, computing a co-occurrence frequency between each of the characteristic words and each of the hyperlink words;

    considering each text to be added a hyperlink as a text X, and processing the text X by following steps;

    carrying out the word segmentation processing to the text X, and obtaining a segmentation result;

    extracting the hyperlink words occurred in the hyperlink word list and the characteristic words occurred in the characteristic word list from the segmentation result;

    computing a weight of each of the hyperlink words that are occurred in the hyperlink word list, and computing a weight of each of the characteristic words that are occurred in the characteristic word list, which comprises;

    for each hyperlink word H, calculating the weight WH of the hyperlink word H;


    WH=TFH*IDFH;

    wherein, TFH represents TF (term frequency) value of the hyperlink word H and TFH refers to the quantity of the hyperlink word H appearing in the text X, and IDFH represents IDF value of the hyperlink word H;

    for each characteristic word F, calculating the weight WH of the characteristic word F;


    WF=TFF*IDFF;

    wherein, TFF represents TF value of the characteristic word F, and IDFF represents IDF value of the characteristic word F;

    determining a final weight of each of the hyperlink words according to each co-occurrence frequency and the weights of the hyperlink words;

    descendingly sorting the hyperlink words occurred in the hyperlink word list according to the final weights of the hyperlink words, and obtaining K numbers of hyperlink words that are arranged in first; and

    adding hyperlinks to the K numbers of hyperlink words, wherein K is a positive integer.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×