×

Machine learning dialect identification

  • US 9,899,020 B2
  • Filed: 09/23/2016
  • Issued: 02/20/2018
  • Est. Priority Date: 02/13/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • selecting, by a computing device, an initial training data set as a current training data set, wherein the initial training data set is selected by;

    receiving one or more initial content items; and

    establishing dialect parameters of one or more of the initial content items, the establishing comprising;

    identifying the one or more of the initial content items associated with one or more specified geographic locations identified as correlated to a dialect;

    establishing the one or more specified geographic locations as part of the dialect parameters;

    generating, by the computing device and based on the initial training data set, a dialect classifier configured to detect language dialects of content items to be classified;

    augmenting, by the computing device, the current training data set with additional training data by applying the dialect classifier to candidate content items;

    updating the dialect classifier based on the augmented current training data set; and

    applying the dialect classifier to transform an input in a source language to an output in a target language, an output in the source language, or an output in a dialect of the source language.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×