Method and system for automatically building natural language understanding models
First Claim
1. A method for building a language model configuration comprising the steps of:
- categorizing a natural language understanding (NLU) application to produce an application categorization having a plurality of categories;
classifying a corpus of example expressions to produce a classified corpus by identifying at least one of the categories for each example expression of the example expressions; and
operating at least one computer configured with a plurality of instructions that, when executed, cause the at least one computer to train at least one statistical language model using said classified corpus by;
building from the classified corpus a first language model configuration comprising a first statistical language model;
evaluating an interpretation accuracy of the first language model configuration using test data;
determining whether the evaluated interpretation accuracy of the first language model configuration is less than a desired accuracy;
when it is determined that the evaluated interpretation accuracy of the first language model configuration is at least the desired accuracy, then adopting the first language model configuration; and
when it is determined that the evaluated interpretation accuracy of the first language model configuration is less than the desired accuracy, then;
sub-dividing the application categorization into a plurality of sub-categories; and
building a second language model configuration comprising a plurality of statistical language models corresponding to the plurality of sub-categories.
3 Assignments
0 Petitions
Accused Products
Abstract
The invention disclosed herein concerns a system (100) and method (600) for building a language model representation of an NLU application. The method 500 can include categorizing an NLU application domain (602), classifying a corpus in view of the categorization (604), and training at least one language model in view of the classification (606). The categorization produces a hierarchical tree of categories, sub-categories and end targets across one or more features for interpreting one or more natural language input requests. During development of an NLU application, a developer assigns sentences of the NLU application to categories, sub-categories or end targets across one or more features for associating each sentence with desire interpretations. A language model builder (140) iteratively builds multiple language models for this sentence data, and iteratively evaluating them against a test corpus, partitioning the data based on the categorization and rebuilding models, so as to produce an optimal configuration of language models to interpret and respond to language input requests for the NLU application.
69 Citations
20 Claims
-
1. A method for building a language model configuration comprising the steps of:
-
categorizing a natural language understanding (NLU) application to produce an application categorization having a plurality of categories; classifying a corpus of example expressions to produce a classified corpus by identifying at least one of the categories for each example expression of the example expressions; and operating at least one computer configured with a plurality of instructions that, when executed, cause the at least one computer to train at least one statistical language model using said classified corpus by; building from the classified corpus a first language model configuration comprising a first statistical language model; evaluating an interpretation accuracy of the first language model configuration using test data; determining whether the evaluated interpretation accuracy of the first language model configuration is less than a desired accuracy; when it is determined that the evaluated interpretation accuracy of the first language model configuration is at least the desired accuracy, then adopting the first language model configuration; and when it is determined that the evaluated interpretation accuracy of the first language model configuration is less than the desired accuracy, then; sub-dividing the application categorization into a plurality of sub-categories; and building a second language model configuration comprising a plurality of statistical language models corresponding to the plurality of sub-categories. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A natural language understanding (NLU) system comprising:
a computer comprising at least one processor configured to perform acts of; categorizing an NLU application to produce an application categorization having a plurality of categories; classifying an NLU database corpus of example expressions to produce a classified corpus by identifying at least one of the categories for each example expression of the example expressions; and building from the classified corpus a first language model configuration comprising a first statistical language model; evaluating an interpretation accuracy of the first language model configuration using test data; determining whether the evaluated interpretation accuracy is less than a desired accuracy; when it is determined that the evaluated interpretation accuracy is at least the desired accuracy, then adopting the first language model configuration; and when it is determined that the evaluated interpretation accuracy is less than the desired accuracy, then; sub-dividing the application categorization into a plurality of sub-categories; and building a second language model configuration comprising a plurality of statistical language models corresponding to the plurality of sub-categories. - View Dependent Claims (15, 16, 17, 18, 19)
-
20. A natural language understanding (NLU) system comprising:
a computer comprising at least one processor configured to perform acts of; categorizing an NLU application to produce an application categorization having a plurality of categories; classifying an NLU database corpus of example expressions to produce a classified corpus by identifying at least one of the categories for each example expression of the example expressions; building from the classified corpus a first language model configuration comprising a first statistical language model; evaluating an interpretation accuracy of the first language model configuration using test data; determining whether the evaluated interpretation accuracy of the first language model configuration is less than a desired accuracy; when it is determined that the evaluated interpretation accuracy of the first language model configuration is at least the desired accuracy, then adopting the first language model configuration; and when it is determined that the evaluated interpretation accuracy of the first language model configuration is less than the desired accuracy, then; sub-dividing the application categorization into a first plurality of sub-categories; building a second language model configuration comprising a first plurality of statistical language models corresponding to the first plurality of sub-categories; evaluating an interpretation accuracy of the second language model configuration using the test data; determining whether the evaluated interpretation accuracy of the second language model configuration is less than the desired accuracy; when it is determined that the evaluated interpretation accuracy of the second language model configuration is at least the desired accuracy, then adopting the second language model configuration; and when it is determined that the evaluated interpretation accuracy of the second language model configuration is less than the desired accuracy, then; determining whether the first plurality of sub-categories can be further sub-divided; when it is determined that the first plurality of sub-categories can be further sub-divided, then;
further sub-dividing the first plurality of sub-categories of the application categorization into a second plurality of sub-categories; and
building a third language model configuration comprising a second plurality of statistical language models corresponding to the second plurality of sub-categories; andwhen it is determined that the first plurality of sub-categories cannot be further sub-divided, then adopting a language model configuration selected from the group consisting of the first language model configuration and the second language model configuration, the selected language model configuration having the greatest evaluated interpretation accuracy.
Specification