Creating a language model for a language processing system
First Claim
Patent Images
1. A method for creating a task dependent unified language model for a selected application from a task independent corpus, the task dependent unified language model being for use in a language processing system and having embedded context-free grammar non-terminal tokens in a N-gram model, the method comprising:
- obtaining a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts, each of the context-free grammars having words present in the task independent corpus to form the semantic or syntactic concepts;
parsing the task independent corpus with the plurality of context-free grammars to identify word occurrences of each of the semantic or syntactic concepts;
replacing each of the identified word occurrences with corresponding non-terminal tokens;
building a N-gram model having the non-terminal tokens; and
obtaining a second plurality of context-free grammars comprising at least some of the same non-terminals representing the same semantic or syntactic concepts, each of the context-free grammars of the second plurality being more appropriate for use in the selected application.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for creating a language model from a task-independent corpus is provided. In one embodiment, a task dependent unified language model is created. The unified language model includes a plurality of context-free grammars having non-terminals and a hybrid N-gram model having at least some of the same non-terminals embedded therein.
-
Citations
43 Claims
-
1. A method for creating a task dependent unified language model for a selected application from a task independent corpus, the task dependent unified language model being for use in a language processing system and having embedded context-free grammar non-terminal tokens in a N-gram model, the method comprising:
-
obtaining a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts, each of the context-free grammars having words present in the task independent corpus to form the semantic or syntactic concepts; parsing the task independent corpus with the plurality of context-free grammars to identify word occurrences of each of the semantic or syntactic concepts; replacing each of the identified word occurrences with corresponding non-terminal tokens; building a N-gram model having the non-terminal tokens; and obtaining a second plurality of context-free grammars comprising at least some of the same non-terminals representing the same semantic or syntactic concepts, each of the context-free grammars of the second plurality being more appropriate for use in the selected application. - View Dependent Claims (2)
-
-
3. A method for creating a task dependent unified language model for a selected application from a task independent corpus, the task dependent unified language model being for use in a language processing system and having embedded context-free grammar non-terminal tokens in a N-gram model, the method comprising:
-
obtaining a plurality of context-free grammars comprising a set of context-free grammars having non-terminal tokens representing task dependent semantic or syntactic concepts and at least one context-free grammar having a non-terminal token for a phrase that can be mistaken for one of the desired task dependent semantic or syntactic concepts; parsing the task independent corpus with the plurality of context-free grammars to identify word occurrences for each of the semantic or syntactic concepts and phrases; replacing each of the identified word occurrences with corresponding non-terminal tokens; and building a N-gram model having the non-terminal tokens. - View Dependent Claims (4, 5, 6)
-
-
7. A method for creating a language model for a selected application from a task independent corpus, the language model being for use in a language processing system, the method comprising:
-
obtaining a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts of the selected application; generating word phrases from the plurality of context-free grammars; formulating an information retrieval query from at least one of the word phrases; querying the task independent corpus based on the query formulated; identifying associated text in the task independent corpus based on the query; and building a language model using the identified text. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A method for creating a language model for a selected application from a task independent corpus, the language model being for use in a language processing system, the method comprising:
-
obtaining a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts of the selected application; generating word phrases from the plurality of context-free grammars; building a first N-gram language model from the word phrases; formulating an information retrieval query from at least one of the word phrases; querying the task independent corpus based on the query formulated; identifying associated text in the task independent corpus based on the query; and building a second N-gram language model from the identified text; and combining the first N-gram language model and the second N-gram language model to form a third N-gram language model. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A method for creating a unified language model for a selected application from a corpus, the method comprising:
-
obtaining a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts of the selected application; building a word language model from the corpus; and assigning probabilities to words of at least some of the context-free grammars as a function of corresponding probabilities obtained for the same words from the word language model wherein assigning probabilities includes normalizing the probabilities of the words from the language model in each of the context-free grammars as a function of the words allowed by the corresponding context-free grammar. - View Dependent Claims (20, 21, 22)
-
-
23. A computer readable medium including instructions readable by a computer which, when implemented execute a method to build a task dependent unified language model for a language processing system, the method comprising:
-
accessing a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts, each of the context-free grammars having words present in a task independent corpus to form the semantic or syntactic concepts; parsing the task independent corpus with the plurality of context-free grammars to identify word occurrences of each of the semantic or syntactic concepts; replacing each of the identified word occurrences with corresponding non-terminal tokens; building a N-gram model having the non-terminal tokens; and storing the N-gram model and a second plurality of context-free grammars comprising at least some of the same non-terminals representing the same semantic or syntactic concepts, each of the context-free grammars of the second plurality being more appropriate for use in the selected application.
-
-
24. A computer readable medium including instructions readable by a computer which, when implemented execute a method to build a task dependent unified language model for a language processing system, the method comprising:
-
accessing a plurality of context-free grammars comprising a set of context-free grammars having non-terminal tokens representing task dependent semantic or syntactic concepts and at least one context-free grammar having a non-terminal token for a phrase that can be mistaken for one of the desired task dependent semantic or syntactic concepts; parsing a task independent corpus with the plurality of context-free grammars to identify word occurrences for each of the semantic or syntactic concepts and phrases; replacing each of the identified word occurrences with corresponding non-terminal tokens; and building a N-gram model having the non-terminal tokens. - View Dependent Claims (25, 26, 27)
-
-
28. A computer readable medium including instructions readable by a computer which, when implemented execute a method to build language model for a language processing system, the method comprising:
-
accessing a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts of the selected application; generating word phrases from the plurality of context-free grammars; formulating an information retrieval query from at least one of the word phrases; querying a task independent corpus based on the query formulated; identifying associated text in the task independent corpus based on the query; and building a language model using the identified text. - View Dependent Claims (29, 30, 31, 32, 33, 34)
-
-
35. A computer readable medium including instructions readable by a computer which, when implemented execute a method to build language model for a language processing system, the method comprising:
-
accessing a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts of the selected application; generating word phrases from the plurality of context-free grammars; building a first N-gram language model from the word phrases; formulating an information retrieval query from at least one of the word phrases; querying a task independent corpus based on the query formulated; identifying associated text in the task independent corpus based on the query; building a second N-gram language model from the identified text; and combining the first N-gram language model and the second N-gram language model to form a third N-gram language model. - View Dependent Claims (36, 37, 38, 39)
-
-
40. A computer readable medium including instructions readable by a computer which, when implemented execute a method to build a unified language model for a selected application, the method comprising:
-
accessing a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts of the selected application; building a word language model from a corpus; and assigning probabilities to words of at least some of the context-free grammars as a function of corresponding probabilities obtained for the same terminals from the word language model wherein assigning probabilities includes normalizing the probabilities of the words from the word language model in each of the context-free grammars as a function of the words allowed by the corresponding context-free grammar. - View Dependent Claims (41, 42, 43)
-
Specification