×

Systematic mass normalization of international titles

  • US 10,678,827 B2
  • Filed: 02/26/2016
  • Issued: 06/09/2020
  • Est. Priority Date: 02/26/2016
  • Status: Active Grant
First Claim
Patent Images

1. A system for generating a database of labeled foreign titles comprising:

  • an interface to receive a title in a second language; and

    a processor to;

    store n-grams each with associated labels in a first language in a first database, wherein the first language and the second language are different;

    sanitize the title in the second language into a sanitized title in the second language;

    translate the sanitized title in the second language into a translated title in the first language;

    break the translated title in the first language into intermediary n-grams in the first language;

    determine parent n-grams of an intermediary n-gram of the intermediary n-grams, wherein the intermediary n-gram is a sub-string of the parent n-grams;

    retrieve a set of labels associated with the parent n-grams using the first database, wherein the set of labels are in the first language; and

    in response to determining that a matching threshold of a label of the set of labels is met, assign the label to the intermediary n-gram, wherein the matching threshold is a frequency the label occurs in the set of labels associated with the parent n-grams, wherein assigning the label comprises storing the label in the first language in a second database, and wherein the second database stores the title in the second language and the intermediary n-gram in the first language.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×