Extracting Tokens in a Natural Language Understanding Application
First Claim
Patent Images
1. A method of processing text within a natural language understanding system, the method comprising:
- applying a first tokenization technique to a sentence using a statistical tokenization model;
applying a second tokenization technique to the sentence using a named entity when the first tokenization technique does not extract a needed token according to a class of the sentence; and
outputting a token determined according to at least one of the tokenization techniques.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of processing text within a natural language understanding system can include applying a first tokenization technique to a sentence using a statistical tokenization model. A second tokenization technique using a named entity can be applied to the sentence when the first tokenization technique does not extract a needed token according to a class of the sentence. A token determined according to at least one of the tokenization techniques can be output.
52 Citations
20 Claims
-
1. A method of processing text within a natural language understanding system, the method comprising:
-
applying a first tokenization technique to a sentence using a statistical tokenization model; applying a second tokenization technique to the sentence using a named entity when the first tokenization technique does not extract a needed token according to a class of the sentence; and outputting a token determined according to at least one of the tokenization techniques. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of processing text within a natural language understanding (NLU) system, the method comprising:
-
determining a class for a sentence received by the NLU system at runtime; processing the sentence using a first statistical tokenization model; processing the sentence using a named entity when a token that is needed according to the class is not extracted using the first statistical tokenization model; processing the sentence using a second statistical tokenization model when a token that is needed according to the class is not extracted using the named entity; and outputting a token determined according to at least one of the first statistical tokenization model, the named entity, or the second statistical tokenization model. - View Dependent Claims (11, 12)
-
-
13. A computer program product comprising:
-
a computer-usable medium comprising computer-usable program code that processes text within a natural language understanding system, the computer-usable medium comprising; computer-usable program code that applies a first tokenization technique to a sentence using a statistical tokenization model; computer-usable program code that applies a second tokenization technique to the sentence using a named entity when the first tokenization technique does not extract a needed token according to a class of the sentence; and computer-usable program code that outputs a token determined according to at least one of the tokenization techniques. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification