Extracting tokens in a natural language understanding application
First Claim
Patent Images
1. A method of processing text within a natural language understanding system, the method comprising:
- via a processor, applying a first tokenization technique to a sentence using a statistical tokenization model;
via the processor, applying a second subsequent tokenization technique to the sentence using a named entity only when the first tokenization technique does not extract a needed token according to a class of the sentence; and
via the processor, outputting a token determined according to at least one of the tokenization techniques.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of processing text within a natural language understanding system can include applying a first tokenization technique to a sentence using a statistical tokenization model. A second tokenization technique using a named entity can be applied to the sentence when the first tokenization technique does not extract a needed token according to a class of the sentence. A token determined according to at least one of the tokenization techniques can be output.
-
Citations
20 Claims
-
1. A method of processing text within a natural language understanding system, the method comprising:
-
via a processor, applying a first tokenization technique to a sentence using a statistical tokenization model; via the processor, applying a second subsequent tokenization technique to the sentence using a named entity only when the first tokenization technique does not extract a needed token according to a class of the sentence; and via the processor, outputting a token determined according to at least one of the tokenization techniques. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of processing text within a natural language understanding (NLU) system, the method comprising:
-
via a processor, determining a class for a sentence received by the NLU system at runtime; via the processor, processing the sentence using a first statistical tokenization model; via the processor, processing the sentence using a named entity when a token that is needed according to the class is not extracted using the first statistical tokenization model; via the processor, processing the sentence using a second subsequent statistical tokenization model only when a token that is needed according to the class is not extracted using the named entity; and via the processor, outputting a token determined according to at least one of the first statistical tokenization model, the named entity, or the second subsequent statistical tokenization model. - View Dependent Claims (11, 12)
-
-
13. A computer program product comprising:
-
a computer-readable storage comprising computer-usable program code stored thereon that processes text within a natural language understanding system, the computer-readable storage comprising; computer-usable program code that applies a first tokenization technique to a sentence using a statistical tokenization model; computer-usable program code that applies a second subsequent tokenization technique to the sentence using a named only entity only when the first tokenization technique does not extract a needed token according to a class of the sentence; and computer-usable program code that outputs a token determined according to at least one of the tokenization techniques. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification