×

IDENTIFYING CULTURAL BACKGROUND FROM TEXT

  • US 20130282362A1
  • Filed: 03/28/2013
  • Published: 10/24/2013
  • Est. Priority Date: 03/28/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method for determining a diaculture of text, comprising:

  • tokenizing words of the text with one or more processors according to a rule set to generate tokenized text, the rule set defining;

    a first set of grammatical types of words, which are words that are replaced, in the tokenizing, with tokens that respectively indicate a grammatical type of a respective word, anda second set of grammatical types of words, which are words that are passed, in the tokenizing, as tokens without changing;

    constructing grams from the tokenized text, each gram including one or more of consecutive tokens from the tokenized text; and

    comparing the grams to a training data set that corresponds to a known diaculture to obtain a comparison result that indicates how well the text matches the training data set for the known diaculture.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×