×

Classifying text into hierarchical categories

  • US 8,145,636 B1
  • Filed: 03/13/2009
  • Issued: 03/27/2012
  • Est. Priority Date: 03/13/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • classifying a text into first subject matter categories;

    identifying one or more second subject matter categories in a plurality of second subject matter categories, each of the second subject matter categories being a hierarchical classification of a plurality of confirmed valid search results for queries, and wherein at least one query for each identified second subject matter category comprises a term in the text;

    filtering the identified second subject matter categories by excluding identified second subject matter categories whose ancestors are not among the first subject matter categories;

    for each second subject matter category in the filtered second subject matter categories;

    extracting one or more constituent terms from the queries of whose confirmed valid search results the second subject matter category is the hierarchical classification, where the constituent terms appear in the text;

    calculating an initial weight of the second subject matter category, the calculating comprising determining a sum of term frequency-inverse document frequency (tf-idf) values of each extracted constituent term in relation to a corpus of documents; and

    selecting the second subject matter category based on the initial weight and based on a threshold where the threshold specifies a degree of relatedness between a selected subject matter category and the text; and

    where the selected second subject matter categories are a sufficient basis for recommending to a user content associated with one or more of the selected second subject matter categories.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×