Named entity recognition on chat data

US 10,765,956 B2
Filed: 01/07/2016
Issued: 09/08/2020
Est. Priority Date: 01/07/2016
Status: Active Grant

First Claim

Patent Images

1. A method comprisingperforming by one or more computers:

training a statistical classifier to identify named entities using training data comprising a plurality of features, wherein one of the features is a word shape feature that comprises a respective token for each letter of a respective word, the respective token indicating that each letter of the respective word is one of an upper case letter, a lower case letter, and a digit;

receiving a plurality of word strings in a first language, each received word string comprising a plurality of words;

identifying at least one named entity in each received word string using the trained statistical classifier; and

translating the received word strings from the first language to a second language, wherein translating comprises preserving the identified at least one named entity in the first language.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving a plurality of word strings in a first language, each received word string comprising a plurality of words, identifying one or more named entities in each received word string using a statistical classifier that was trained using training data comprising a plurality of features, wherein one of the features is a word shape feature that comprises a respective token for each letter of a respective word wherein each token signifies a case of the letter or whether the letter is a digit, and translating the received word strings from the first language to a second language including preserving the respective identified named entities in each received word string during translation.

319 Citations

30 Claims

1. A method comprisingperforming by one or more computers:
- training a statistical classifier to identify named entities using training data comprising a plurality of features, wherein one of the features is a word shape feature that comprises a respective token for each letter of a respective word, the respective token indicating that each letter of the respective word is one of an upper case letter, a lower case letter, and a digit;
  
  receiving a plurality of word strings in a first language, each received word string comprising a plurality of words;
  
  identifying at least one named entity in each received word string using the trained statistical classifier; and
  
  translating the received word strings from the first language to a second language, wherein translating comprises preserving the identified at least one named entity in the first language.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 22)
- - 2. The method of claim 1 wherein translating the received word strings from the first language to a second language comprises:
    - for a particular received word string;
      
      selecting a respective template in the first language, the respective template comprising one or more placeholders for the identified named entities and having a corresponding translated template in the second language that preserves the placeholders; and
      
      translating the particular received word string by substituting its identified named entities in the placeholders in the corresponding translated template in the second language.
  - 3. The method of claim 2 wherein the respective template in the first language further comprises words in the first language that are translated, according to a dictionary, to words in the second language in the corresponding translated template.
  - 4. The method of claim 3 wherein the dictionary comprises:
    - words in the first language; and
      
      one or more words in the second language corresponding to each of the words in the first language.
  - 5. The method of claim 2 wherein the respective template in the first language further comprises a particular word which count in the particular received word string exceeds a specified threshold.
  - 6. The method of claim 1, wherein a particular named entity comprises one or more proper nouns.
  - 7. The method of claim 1 wherein the plurality of features further comprises one or more of the following features:
    - a prefix, a suffix, a part-of-speech tag, and a word type.
  - 8. The method of claim 7, wherein the word type feature of a particular word describes whether the word shape feature of the particular word comprises tokens of a same type.
  - 9. The method of claim 1 wherein a particular feature is identified with an n-gram within an m-length window, wherein m is greater than n.
  - 10. The method of claim 1 wherein the statistical classifier is specific to the first language.
  - 11. The method of claim 1 wherein the statistical classifier comprises a conditional random field classifier that is configured to identify at least one named entity in a word string.
  - 22. The system of claim 1 wherein the statistical classifier comprises a conditional random field classifier that is configured to identify at least one named entity in a word string.

12. A system comprisingone or more computers programmed to perform operations comprising:
- training a statistical classifier to identify named entities using training data comprising a plurality of features, wherein one of the features is a word shape feature that comprises a respective token for each letter of a respective word, the respective token indicating that each letter of the respective word is one of an upper case letter, a lower case letter, and a digit;
  
  receiving a plurality of word strings in a first language, each received word string comprising a plurality of words;
  
  identifying at least one named entity in each received word string using the trained statistical classifier; and
  
  translating the received word strings from the first language to a second language, wherein translating comprises preserving the identified at least one named entity in the first language.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The system of claim 12 wherein translating the received word strings from the first language to a second language comprises:
    - for a particular received word string;
      
      selecting a respective template in the first language, the respective template comprising one or more placeholders for the identified named entities and having a corresponding translated template in the second language that preserves the placeholders; and
      
      translating the particular received word string by substituting its identified named entities in the placeholders in the corresponding translated template in the second language.
  - 14. The system of claim 13 wherein the respective template in the first language further comprises words in the first language that are translated, according to a dictionary, to words in the second language in the corresponding translated template.
  - 15. The system of claim 14 wherein the dictionary comprises:
    - words in the first language; and
      
      one or more words in the second language corresponding to each of the words in the first language.
  - 16. The system of claim 13 wherein the respective template in the first language further comprises a particular word which count in the particular received word string exceeds a specified threshold.
  - 17. The system of claim 12, wherein a particular named entity comprises one or more proper nouns.
  - 18. The system of claim 12 wherein the plurality of features further comprises one or more of the following features:
    - a prefix, a suffix, a part-of-speech tag, and a word type.
  - 19. The system of claim 18, wherein the word type feature of a particular word describes whether the word shape feature of the particular word comprises tokens of a same type.
  - 20. The system of claim 12 wherein a particular feature is identified with an n-gram within an m-length window, wherein m is greater than n.
  - 21. The system of claim 12 wherein the statistical classifier is specific to the first language.

23. A storage device having instructions stored thereon that when executed by one or more computers perform operations comprising:
- training a statistical classifier to identify named entities using training data comprising a plurality of features, wherein one of the features is a word shape feature that comprises a respective token for each letter of a respective word, the respective token indicating that each letter of the respective word is one of an upper case letter, a lower case letter, and a digit;
  
  receiving a plurality of word strings in a first language, each received word string comprising a plurality of words;
  
  identifying at least one named entity in each received word string using the trained statistical classifier; and
  
  translating the received word strings from the first language to a second language, wherein translating comprises preserving the identified at least one named entity in the first language.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
- - 24. The storage device of claim 23 wherein translating the received word strings from the first language to a second language comprises:
    - for a particular received word string;
      
      selecting a respective template in the first language, the respective template comprising one or more placeholders for the identified named entities and having a corresponding translated template in the second language that preserves the placeholders; and
      
      translating the particular received word string by substituting its identified named entities in the placeholders in the corresponding translated template in the second language.
  - 25. The storage device of claim 24 wherein the respective template in the first language further comprises words in the first language that are translated, according to a dictionary, to words in the second language in the corresponding translated template.
  - 26. The storage device of claim 25 wherein the dictionary comprises:
    - words in the first language; and
      
      one or more words in the second language corresponding to each of the words in the first language.
  - 27. The storage device of claim 24 wherein the respective template in the first language further comprises a particular word which count in the particular received word string exceeds a specified threshold.
  - 28. The storage device of claim 23, wherein a particular named entity comprises one or more proper nouns.
  - 29. The storage device of claim 23 wherein the plurality of features further comprises one or more of the following features:
    - a prefix, a suffix, a part-of-speech tag, and a word type.
  - 30. The storage device of claim 29, wherein the word type feature of a particular word describes whether the word shape feature of the particular word comprises tokens of a same type.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Mz Ip Holdings, LLC (AppLovin Corporation)
Original Assignee
Machine Zone, Inc. (AppLovin Corporation)
Inventors
Bojja, Nikhil, Kannan, Shivasankari, Wang, Pidong
Primary Examiner(s)
Serrou, Abdelali

Application Number

US14/990,540
Publication Number

US 20170197152A1
Time in Patent Office

1,706 Days
Field of Search
US Class Current
CPC Class Codes

A63F 13/87   Communicating with other pl...

G06F 40/216   using statistical methods

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/295   Named entity recognition

G06F 40/40   Processing or translation o...

G06F 40/58   Use of machine translation,...

Named entity recognition on chat data

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

319 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Named entity recognition on chat data

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

319 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links