Methods and systems for natural language understanding using human knowledge and collected data
First Claim
Patent Images
1. A method comprising:
- developing, via a processor, a statistical model for a natural language understanding application using human knowledge exclusive of any annotated data;
during a first execution of the natural language understanding application via the processor;
receiving a sequence of words to yield a received sequence of words;
assigning a sequence of tags to the received sequence of words by using the statistical model, to yield annotated words; and
when sufficient annotated words become available, updating the statistical model;
developing a replacement statistical model for the natural language understanding application using the annotated words by;
developing a first part of the replacement model without the human knowledge;
developing a second part of the replacement model from both the human knowledge and the annotated words; and
when the received and subsequently annotated data is sufficient, weighting a contribution from the first part more to the assigning than when the received and subsequently annotated data is insufficient; and
during a second execution of the natural language understanding application, receiving a second sequence of words and assigning a second sequence of tags to the second sequence of words by using the replacement statistical model.
5 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.
-
Citations
12 Claims
-
1. A method comprising:
-
developing, via a processor, a statistical model for a natural language understanding application using human knowledge exclusive of any annotated data; during a first execution of the natural language understanding application via the processor; receiving a sequence of words to yield a received sequence of words; assigning a sequence of tags to the received sequence of words by using the statistical model, to yield annotated words; and when sufficient annotated words become available, updating the statistical model; developing a replacement statistical model for the natural language understanding application using the annotated words by; developing a first part of the replacement model without the human knowledge; developing a second part of the replacement model from both the human knowledge and the annotated words; and when the received and subsequently annotated data is sufficient, weighting a contribution from the first part more to the assigning than when the received and subsequently annotated data is insufficient; and during a second execution of the natural language understanding application, receiving a second sequence of words and assigning a second sequence of tags to the second sequence of words by using the replacement statistical model. - View Dependent Claims (2, 3, 4)
-
-
5. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing a method comprising; during a first execution of the natural language understanding application; receiving a sequence of words to yield a received sequence of words; assigning a sequence of tags to the received sequence of words by using the statistical model, to yield annotated words; and when sufficient annotated words become available, updating the statistical model; developing a replacement statistical model for the natural language understanding application using the annotated words by; developing a first part of the replacement model without the human knowledge; developing a second part of the replacement model from both the human knowledge and the annotated words; and when the received and subsequently annotated data is sufficient, weighting a contribution from the first part more to the assigning than when the received and subsequently annotated data is insufficient and during a second execution of the natural language understanding application, receiving a second sequence of words and assigning a second sequence of tags to the second sequence of words by using the replacement statistical model. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
Specification