Methods and systems for natural language understanding using human knowledge and collected data
First Claim
Patent Images
1. A method comprising:
- operating, via a processor of a computing device, a natural language understanding application with a statistical model that generates a sequence of tags assigned to a first sequence of words;
developing a first part without human knowledge, the first part being first data used to formulate a replacement statistical model;
developing a second part using the human knowledge and annotated words, the second part being a second data used to formulate the replacement statistical model;
assigning weighted versions of the first part and the second part to yield the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and
during execution of the natural language understanding application in which a second sequences of words is received as speech for processing using the natural language understanding application, assigning, via the processor of the computer device executing the natural language understanding application, a new sequence of tags to the second sequence of words, the new sequence of tags being generated by the replacement statistical model.
5 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.
-
Citations
20 Claims
-
1. A method comprising:
-
operating, via a processor of a computing device, a natural language understanding application with a statistical model that generates a sequence of tags assigned to a first sequence of words; developing a first part without human knowledge, the first part being first data used to formulate a replacement statistical model; developing a second part using the human knowledge and annotated words, the second part being a second data used to formulate the replacement statistical model; assigning weighted versions of the first part and the second part to yield the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and during execution of the natural language understanding application in which a second sequences of words is received as speech for processing using the natural language understanding application, assigning, via the processor of the computer device executing the natural language understanding application, a new sequence of tags to the second sequence of words, the new sequence of tags being generated by the replacement statistical model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
- a processor; and
a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; operating a natural language understanding application with a statistical model that generates a first sequence of tags assigned to a sequence of words; developing a first part without human knowledge, the first part being first data used to formulate a replacement statistical model; developing a second part using the human knowledge and annotated words, the second part being a second data used to formulate the replacement statistical model; assigning weighted versions of the first part and the second part to yield the replacement statistical model, wherein the weighted version are determined based on an amount of annotated data that is available; and during execution of the natural language understanding application in which a second sequences of words is received as speech for processing using the natural language understanding application, assigning a new sequence of tags to the second sequence of words, the new sequence of tags being generated by the replacement statistical model. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- a processor; and
-
17. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
operating a natural language understanding application with a statistical model that generates a first sequence of tags assigned to a sequence of words; developing a first part without human knowledge, the first part being first data used to formulate a replacement statistical model; developing a second part using the human knowledge and annotated words, the second part being a second data used to formulate the replacement statistical model; assigning weighted versions of the first part and the second part to yield the replacement statistical model, wherein the weighted version are determined based on an amount of annotated data that is available; and during execution of the natural language understanding application in which a second sequences of words is received as speech for processing using the natural language understanding application, assigning a new sequence of tags to the second sequence of words, the new sequence of tags being generated by the replacement statistical model. - View Dependent Claims (18, 19, 20)
-
Specification