×

Identifying multiple languages in a content item

  • US 10,180,935 B2
  • Filed: 02/02/2017
  • Issued: 01/15/2019
  • Est. Priority Date: 12/30/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method for improving language processing technologies by determining language segments of a content item, comprising:

  • receiving a content item derived from a social network item, the content item comprising two or more words, wherein at least a first portion of the two or more words were composed in a first language and at least a second portion of the two or more words were composed in a second language different from the first language;

    tokenizing the content item into an ordered set of tokens comprising one or more tokens;

    identifying;

    the first language for a first set of the one or more tokens by a machine learning model, andthe second language for a second set of the one or more tokens by the machine learning model,wherein the identifying is performed by maximizing a probability computed for the ordered set of tokens based on a combination of transition probabilities, a respective transition probability corresponding to each token after the first token in the ordered set of tokens, wherein each respective transition probability indicates a likelihood of switching from a language of a previous token to a language of a current token in the ordered set of tokens; and

    grouping consecutive ones of the one or more tokens into the language segments based on the identifying, wherein a first of the language segment corresponds to the first language and a second language segment corresponds to the second language.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×