Method and system for robust tagging of named entities in the presence of source or translation errors
First Claim
1. An electronic device comprising:
- a storage device configured to store a plurality of named entities collected from a plurality of sources, wherein each of the named entities are tokenized into a common format of named entity tokens, wherein each of the named entities are associated with a label, and wherein each of the named entity tokens are one of a word or a syllable of a word; and
one or more processors configured to convert one or more textual communications from a natural language source into a computer readable format for reading and processing by the electronic device, the one or more processors comprising a tagging apparatus configured to;
receive the one or more textual communications,identify each of the one or more textual communications,tokenize the one or more textual communications into a common format of textual tokens corresponding to a prefix tree,match, as a function of a selection of a cluster on a deepest level of a multi-level harmonized clustering structure of tokens, the textual tokens with one or more of the named entity tokens stored in the storage device, in order to assign the textual tokens to the labels associated with each of the named entities,tag the one or more textual communications based on the matching between the textual tokens and the named entity tokens, in order to identify an intended meaning of each of the one or more textual communications, wherein the intended meaning of the received one or more textual communications are identified based on an extracted conceptual concept from the one or more textual communications,identify the intended meaning of the one or more textual communications based on applying the tags to the one or more textual communications; and
replace an error of the one or more converted textual communications with the identified intended meaning of the one or more textual communications, when the error is detected.
1 Assignment
0 Petitions
Accused Products
Abstract
A system includes a storage device configured to store a plurality of named entities collected from a plurality of sources, tokenized, and associated with a label. The system includes a tagging apparatus configured to receive textual communications, identify each of the textual communications, tokenize the textual communications, match the textual tokens with the named entities tokens stored in the storage device in order to assign the textual tokens to the labels associated with the named entity tokens, tag the textual communications based on the matching between the textual tokens and the named entity tokens in order to identify the intended meaning of each of the textual communications, and identify the intended meaning of the textual communications based on applying the tags to the textual communications. A method capable of disambiguating named entities using a common sense reasoning approach is also disclosed.
24 Citations
20 Claims
-
1. An electronic device comprising:
-
a storage device configured to store a plurality of named entities collected from a plurality of sources, wherein each of the named entities are tokenized into a common format of named entity tokens, wherein each of the named entities are associated with a label, and wherein each of the named entity tokens are one of a word or a syllable of a word; and one or more processors configured to convert one or more textual communications from a natural language source into a computer readable format for reading and processing by the electronic device, the one or more processors comprising a tagging apparatus configured to; receive the one or more textual communications, identify each of the one or more textual communications, tokenize the one or more textual communications into a common format of textual tokens corresponding to a prefix tree, match, as a function of a selection of a cluster on a deepest level of a multi-level harmonized clustering structure of tokens, the textual tokens with one or more of the named entity tokens stored in the storage device, in order to assign the textual tokens to the labels associated with each of the named entities, tag the one or more textual communications based on the matching between the textual tokens and the named entity tokens, in order to identify an intended meaning of each of the one or more textual communications, wherein the intended meaning of the received one or more textual communications are identified based on an extracted conceptual concept from the one or more textual communications, identify the intended meaning of the one or more textual communications based on applying the tags to the one or more textual communications; and replace an error of the one or more converted textual communications with the identified intended meaning of the one or more textual communications, when the error is detected. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
converting, by one or more processors, one or more textual communications from a natural language source into a computer readable format for reading and processing by an electronic device; receiving, by a tagging apparatus, the converted one or more textual communications; identifying each of the one or more textual communication; tokenizing the one or more textual communications into a common format of textual tokens corresponding to a prefix tree; matching, as a function of a selection of a cluster on a deepest level of a multi-level harmonized clustering structure of tokens, the textual tokens with one or more named entity tokens stored in a storage device, in order to assign the textual tokens to labels associated with the named entity tokens, and wherein each of the one or more named entities is associated with a label; tagging the one or more textual communications based on the matching between the textual tokens and the named entity tokens, in order to identify an intended meaning of each of the one or more textual communications, wherein the intended meaning of the received one or more textual communications are identified based on an extracted conceptual concept from the one or more textual communications; identifying the intended meaning of the one or more textual communications based on applying the tags to the one or more textual communications; and replacing an error of the one or more converted textual communications with the identified intended meaning of the one or more textual communications, when the error is detected. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. For use in an electronic device, a method comprising:
-
converting, by one or more processors, one or more textual communications from a natural language source into a computer readable format configured for reading and processing by the electronic device; receiving the converted one or more textual communications by the electronic device, wherein the one or more textual communications comprises one or more named entities; grouping, as a function of a selection of a cluster on a deepest level of a multi-level harmonized structure, concepts associated with the one or more name entities in order to extract relevant information from a knowledge base; inferring relevant knowledge from the extracted relevant information from the knowledge base; calculating and ranking the inferred relevant knowledge for the one or more named entities; and identifying a relevant label based on the highest rank of the inferred relevant knowledge for each of the one or more named entities. - View Dependent Claims (18, 19, 20)
-
Specification