×

IDENTIFYING MULTIPLE LANGUAGES IN A CONTENT ITEM

  • US 20180189259A1
  • Filed: 02/02/2017
  • Published: 07/05/2018
  • Est. Priority Date: 12/30/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method for improving language processing technologies by determining language segments of a content item, comprising:

  • receiving a content item derived from a social network item, the content item comprising two or more words, wherein at least a first portion of the two or more words were composed in a first language and at least a second portion of the two or more words were composed in a second language different from the first language;

    tokenizing the content item into an ordered set of tokens comprising one or more tokens;

    identify;

    the first language for a first set of the one or more tokens by a machine learning model, andthe second language for a second set of the one or more tokens by the machine learning model,wherein the identifying is performed by maximizing a probability computed for the ordered set of tokens; and

    grouping consecutive ones of the one or more tokens into the language segments based on the identifying, wherein a first of the language segment corresponds to the first language and a second of the language segment corresponds to the second language.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×