Classifying languages for objects and entities
First Claim
1. A non-transitory computer readable storage medium storing instructions that, in response to being executed by a computing device, cause the computing device to perform operations for building a user language model that indicates one or more natural languages for a user associated with a user identifier, the operations comprising:
- operations for receiving an indication of a set of one or more characteristics associated with the user identifier, wherein at least some of the received characteristics correspond to a specified likelihood that the user is facile with a particular language;
operations for combining the specified likelihoods to generate a baseline language prediction;
operations for receiving indications of one or more user actions, wherein each user action corresponds to a specified expectation that the user is facile with a particular language; and
operations for updating the baseline language prediction to form a current language prediction indicating one or more languages the user is facile with, the updating based on a modification of the baseline language prediction using the specified expectations;
wherein, for a selected language of the one or more of the languages which the current language prediction indicates the user is facile with, the language model includes at least a first identifier indicating whether the user can read in the selected language and at least a second identifier, different from the first identifier, indicating whether the user can write in the selected language; and
wherein the operations for updating of the baseline language prediction comprise operations for associating one or more user actions with a weight value based on an observed intensity or frequency of the user action.
2 Assignments
0 Petitions
Accused Products
Abstract
Technology for media item and user language classification is disclosed. Media item classification may use models for associating language identifiers or probability distributions for multiple languages with linguistic content. User language classification may define user language models for attributing to users indications of languages they speak read, and/or write. The text classifications and user classifications may interact because the probability that given text is in a particular language may depend on a determined likelihood the user who produced the text speaks that language, or conversely, a user interacting with text in a particular language may increase the likelihood they understand that language. Some embodiments use language-tagged social media content to train n-gram classifiers for use with other social media content.
-
Citations
20 Claims
-
1. A non-transitory computer readable storage medium storing instructions that, in response to being executed by a computing device, cause the computing device to perform operations for building a user language model that indicates one or more natural languages for a user associated with a user identifier, the operations comprising:
-
operations for receiving an indication of a set of one or more characteristics associated with the user identifier, wherein at least some of the received characteristics correspond to a specified likelihood that the user is facile with a particular language; operations for combining the specified likelihoods to generate a baseline language prediction; operations for receiving indications of one or more user actions, wherein each user action corresponds to a specified expectation that the user is facile with a particular language; and operations for updating the baseline language prediction to form a current language prediction indicating one or more languages the user is facile with, the updating based on a modification of the baseline language prediction using the specified expectations; wherein, for a selected language of the one or more of the languages which the current language prediction indicates the user is facile with, the language model includes at least a first identifier indicating whether the user can read in the selected language and at least a second identifier, different from the first identifier, indicating whether the user can write in the selected language; and wherein the operations for updating of the baseline language prediction comprise operations for associating one or more user actions with a weight value based on an observed intensity or frequency of the user action. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for providing a language classification of a media item, the operations comprising:
-
determining a context characteristic indicating one or more users who have interacted with the media item; wherein the context characteristic corresponds to a computed likelihood that the media item is in one or more languages based on determined language abilities of the users who have interacted with the media item; computing, based on the determined context characteristic and corresponding computed likelihood, a context prediction that the media item is in one or more first languages; applying a trained n-gram analysis of the media item to compute a trained prediction that the media item is in one or more second languages; wherein the trained n-gram analysis of the media item comprises, for one or more n-grams in the media item having a particular length, analyzing a specified probability distribution that the n-gram is in a specific language; and combining the context prediction with the trained prediction. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A system for building a user language model that indicates one or more natural languages for a user associated with a user identifier, comprising:
-
a processor and a memory; an interface to receive an indication of a set of one or more characteristics associated with the user identifier, wherein at least some of the received characteristics correspond to a specified likelihood that the user is facile with a particular language; a baseline language predictor to combine the specified likelihoods and to generate a baseline language prediction; the interface to receive an indication of one or more user actions, wherein each user action corresponds to a specified expectation that the user is facile with a particular language; and a user baseline predictor to update the baseline language prediction to form a current language prediction indicating one or more languages the user is facile with, the updating based on a modification of the baseline language prediction using the specified expectations; wherein, for a selected language of the one or more of the languages which the current language prediction indicates the user is facile with, the language model includes at least a first identifier indicating whether the user can read in the selected language and at least a second identifier, different from the first identifier, indicating whether the user can write in the selected language; and wherein the updating of the baseline language prediction comprises associating one or more user actions with a weight value based on an observed intensity or frequency of the user action. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification