Methods and systems for natural language understanding using human knowledge and collected data
First Claim
Patent Images
1. A method comprising:
- developing, via a processor, a statistical model for a natural language understanding application exclusive of annotated data;
during first execution of the natural language understanding application, assigning a sequence of tags to a first received sequence of words using the statistical model, to yield annotated words;
developing a replacement statistical model for the natural language understanding application using the annotated words, wherein developing the replacement statistical model comprises;
developing a first part without human knowledge;
developing a second part using the human knowledge and the annotated words; and
assigning weighted versions of the first part and the second part to the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and
during a second execution of the natural language understanding application, assigning a second sequence of tags to a second received sequence of words by using the replacement statistical model.
5 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.
29 Citations
17 Claims
-
1. A method comprising:
-
developing, via a processor, a statistical model for a natural language understanding application exclusive of annotated data; during first execution of the natural language understanding application, assigning a sequence of tags to a first received sequence of words using the statistical model, to yield annotated words; developing a replacement statistical model for the natural language understanding application using the annotated words, wherein developing the replacement statistical model comprises; developing a first part without human knowledge; developing a second part using the human knowledge and the annotated words; and assigning weighted versions of the first part and the second part to the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and during a second execution of the natural language understanding application, assigning a second sequence of tags to a second received sequence of words by using the replacement statistical model. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising; developing a statistical model for a natural language understanding application exclusive of annotated data; during first execution of the natural language understanding application, assigning a sequence of tags to a first received sequence of words using the statistical model, to yield annotated words; developing a replacement statistical model for the natural language understanding application using the annotated words, wherein developing the replacement statistical model comprises; developing a first part without human knowledge; developing a second part using the human knowledge and the annotated words; and assigning weighted versions of the first part and the second part to the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and during a second execution of the natural language understanding application, assigning a second sequence of tags to a second received sequence of words by using the replacement statistical model. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer readable storage medium having instructions stored which, when executed by a computing device, result in the computing device performing operations comprising:
-
developing a statistical model for a natural language understanding application exclusive of annotated data; during first execution of the natural language understanding application, assigning a sequence of tags to a first received sequence of words using the statistical model, to yield annotated words; developing a replacement statistical model for the natural language understanding application using the annotated words, wherein developing the replacement statistical model comprises; developing a first part without human knowledge; developing a second part using the human knowledge and the annotated words; and assigning weighted versions of the first part and the second part to the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and during a second execution of the natural language understanding application, assigning a second sequence of tags to a second received sequence of words by using the replacement statistical model. - View Dependent Claims (14, 15, 16, 17)
-
Specification