Classifying languages for objects and entities

US 10,002,131 B2
Filed: 02/28/2017
Issued: 06/19/2018
Est. Priority Date: 06/11/2014
Status: Active Grant

First Claim

Patent Images

1. A non-transitory computer readable storage medium storing instructions that, in response to being executed by a computing device, cause the computing device to perform operations for building a user language model that indicates one or more natural languages for a user associated with a user identifier, the operations comprising:

operations for receiving an indication of a set of one or more characteristics associated with the user identifier, wherein at least some of the received characteristics correspond to a specified likelihood that the user is facile with a particular language;

operations for combining the specified likelihoods to generate a baseline language prediction;

operations for receiving indications of one or more user actions, wherein each user action corresponds to a specified expectation that the user is facile with a particular language; and

operations for updating the baseline language prediction to form a current language prediction indicating one or more languages the user is facile with, the updating based on a modification of the baseline language prediction using the specified expectations;

wherein, for a selected language of the one or more of the languages which the current language prediction indicates the user is facile with, the language model includes at least a first identifier indicating whether the user can read in the selected language and at least a second identifier, different from the first identifier, indicating whether the user can write in the selected language; and

wherein the operations for updating of the baseline language prediction comprise operations for associating one or more user actions with a weight value based on an observed intensity or frequency of the user action.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technology for media item and user language classification is disclosed. Media item classification may use models for associating language identifiers or probability distributions for multiple languages with linguistic content. User language classification may define user language models for attributing to users indications of languages they speak read, and/or write. The text classifications and user classifications may interact because the probability that given text is in a particular language may depend on a determined likelihood the user who produced the text speaks that language, or conversely, a user interacting with text in a particular language may increase the likelihood they understand that language. Some embodiments use language-tagged social media content to train n-gram classifiers for use with other social media content.

Citations

20 Claims

1. A non-transitory computer readable storage medium storing instructions that, in response to being executed by a computing device, cause the computing device to perform operations for building a user language model that indicates one or more natural languages for a user associated with a user identifier, the operations comprising:
- operations for receiving an indication of a set of one or more characteristics associated with the user identifier, wherein at least some of the received characteristics correspond to a specified likelihood that the user is facile with a particular language;
  
  operations for combining the specified likelihoods to generate a baseline language prediction;
  
  operations for receiving indications of one or more user actions, wherein each user action corresponds to a specified expectation that the user is facile with a particular language; and
  
  operations for updating the baseline language prediction to form a current language prediction indicating one or more languages the user is facile with, the updating based on a modification of the baseline language prediction using the specified expectations;
  
  wherein, for a selected language of the one or more of the languages which the current language prediction indicates the user is facile with, the language model includes at least a first identifier indicating whether the user can read in the selected language and at least a second identifier, different from the first identifier, indicating whether the user can write in the selected language; and
  
  wherein the operations for updating of the baseline language prediction comprise operations for associating one or more user actions with a weight value based on an observed intensity or frequency of the user action.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The non-transitory computer readable storage medium of claim 1, wherein the one or more actions taken by the user comprise one or more of:
    - interacting with media identified as corresponding to the particular language;
      
      producing a threshold number of media items identified as being in the particular language; and
      
      using a translation service to convert another media item to the particular language.
  - 3. The non-transitory computer readable storage medium of claim 1, wherein the user characteristics comprise one or more of:
    - a network location associated with the user corresponding to a particular locale;
      
      language settings of a profile associated with the user;
      
      language settings of the user'"'"'s web browser;
      
      a determination, for a threshold number friend accounts associated with the user, the friend accounts that are associated with users who are facile with the particular language; and
      
      language settings of the user'"'"'s operating system.
  - 4. The non-transitory computer readable storage medium of claim 1,wherein each specified likelihood is a probability associated with the user characteristic that the user uses a particular language;
    - orwherein each specified expectation is a probability associated with and action taken by the user that the user uses a particular language.
  - 5. The non-transitory computer readable storage medium of claim 1, further comprising updating a current prediction based on one or more of:
    - a change or addition to the characteristics associated with the user;
      
      detecting a further user action;
      
      a change in a value of one or more of the determined likelihoods corresponding to the characteristic associated with the user; and
      
      a change in a value of one or more of the determined expectations corresponding to one or more of the user actions.
  - 6. The non-transitory computer readable storage medium of claim 1, wherein:
    - the one or more actions taken by the user comprise interacting with a media item determined to be in a particular language; and
      
      the media item was determined to be in a particular language based on one or more of;
      
      an probability that the media item is from a particular source using context classifiers; and
      
      a trained n-gram analysis of the media item using category specific trained classifiers.
  - 7. The non-transitory computer readable storage medium of claim 1, wherein, one or more of the user actions are actions taken by a user other than the user for which the language model is built.
  - 8. The non-transitory computer readable storage medium of claim 1, wherein a baseline prediction or a current prediction comprises a probability distribution across multiple languages.
  - 9. The non-transitory computer readable storage medium of claim 1, wherein:
    - operations for generating the baseline language prediction comprise operations for using a weight value associated with each of the characteristics to determine how much each characteristic affects the resulting baseline language prediction.

10. A method for providing a language classification of a media item, the operations comprising:
- determining a context characteristic indicating one or more users who have interacted with the media item;
  
  wherein the context characteristic corresponds to a computed likelihood that the media item is in one or more languages based on determined language abilities of the users who have interacted with the media item;
  
  computing, based on the determined context characteristic and corresponding computed likelihood, a context prediction that the media item is in one or more first languages;
  
  applying a trained n-gram analysis of the media item to compute a trained prediction that the media item is in one or more second languages;
  
  wherein the trained n-gram analysis of the media item comprises, for one or more n-grams in the media item having a particular length, analyzing a specified probability distribution that the n-gram is in a specific language; and
  
  combining the context prediction with the trained prediction.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The method of claim 10, wherein the combining comprises defining a distribution across multiple languages that gauges whether the media item is in each of the multiple languages.
  - 12. The method of claim 10, further comprising combining the context prediction with a prediction based on dictionary classifiers, wherein the dictionary classifiers select one or more words of the media item which indicate a particular probability that media items containing the selected words are in a certain language.
  - 13. The method of claim 10, wherein the particular length for the trained n-gram analysis is four or five characters.
  - 14. The method of claim 10, wherein the operations further comprise identifying the one or more users who have interacted with the media item as an author of the media item and wherein the corresponding computed likelihood is based on a language associated with the author indicating the author is facile with the one or more first languages.

15. A system for building a user language model that indicates one or more natural languages for a user associated with a user identifier, comprising:
- a processor and a memory;
  
  an interface to receive an indication of a set of one or more characteristics associated with the user identifier, wherein at least some of the received characteristics correspond to a specified likelihood that the user is facile with a particular language;
  
  a baseline language predictor to combine the specified likelihoods and to generate a baseline language prediction;
  
  the interface to receive an indication of one or more user actions, wherein each user action corresponds to a specified expectation that the user is facile with a particular language; and
  
  a user baseline predictor to update the baseline language prediction to form a current language prediction indicating one or more languages the user is facile with, the updating based on a modification of the baseline language prediction using the specified expectations;
  
  wherein, for a selected language of the one or more of the languages which the current language prediction indicates the user is facile with, the language model includes at least a first identifier indicating whether the user can read in the selected language and at least a second identifier, different from the first identifier, indicating whether the user can write in the selected language; and
  
  wherein the updating of the baseline language prediction comprises associating one or more user actions with a weight value based on an observed intensity or frequency of the user action.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, wherein the one or more actions taken by the user comprise one or more of:
    - interacting with media identified as corresponding to the particular language;
      
      producing a threshold number of media items identified as being in the particular language; and
      
      using a translation service to convert another media item to the particular language.
  - 17. The system of claim 15, wherein the user characteristics comprise one or more of:
    - a network location associated with the user corresponding to a particular locale;
      
      language settings of a profile associated with the user;
      
      language settings of the user'"'"'s web browser;
      
      a determination, for a threshold number friend accounts associated with the user, the friend accounts that are associated with users who are facile with the particular language; and
      
      language settings of the user'"'"'s operating system.
  - 18. The system of claim 15,wherein each specified likelihood is a probability associated with the user characteristic that the user uses a particular language;
    - orwherein each specified expectation is a probability associated with and action taken by the user that the user uses a particular language.
  - 19. The system of claim 15, comprising user baseline predictor to update a current prediction based on one or more of:
    - a change or addition to the characteristics associated with the user;
      
      detecting a further user action;
      
      a change in a value of one or more of the determined likelihoods corresponding to the characteristic associated with the user; and
      
      a change in a value of one or more of the determined expectations corresponding to one or more of the user actions.
  - 20. The system of claim 15, wherein:
    - the one or more actions taken by the user comprise interacting with a media item determined to be in a particular language; and
      
      the media item was determined to be in a particular language based on one or more of;
      
      an probability that the media item is from a particular source using context classifiers; and
      
      a trained n-gram analysis of the media item using category specific trained classifiers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Herdagdelen, Amac, Green, Bradley Ray
Primary Examiner(s)
SAINT CYR, LEONARD

Application Number

US15/445,978
Publication Number

US 20170270102A1
Time in Patent Office

476 Days
Field of Search

704 2- 10
US Class Current
CPC Class Codes

G06F 40/263   Language identification

G06F 40/40   Processing or translation o...

G06Q 10/00   Administration; Management

G06Q 10/10   Office automation; Time man...

H04L 67/02   based on web technology, e....

Classifying languages for objects and entities

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Classifying languages for objects and entities

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links