Monte Carlo method for natural language understanding and speech recognition language models
First Claim
Patent Images
1. A Monte Carlo method of developing a training corpus for use with natural language understanding or speech recognition language models, said method comprising:
- identifying at least one phrase embedded in a body of text, said phrase belonging to a phrase class;
determining at least one subject matter attribute corresponding to said identified phrase; and
augmenting the training corpus by copying said body of text and replacing said identified phrase with a different phrase selected from a plurality of phrases, said different phrase belonging to said phrase class and having said determined subject matter attribute.
3 Assignments
0 Petitions
Accused Products
Abstract
A Monte Carlo method for use with natural language understanding and speech recognition language models can include a series of steps. The steps can include identifying at least one phrase embedded in a body of text wherein the phrase can belong to a phrase class. An additional attribute corresponding to the identified phrase can be determined. The body of text can be copied and the identified phrase can be replaced with a different phrase selected from a plurality of phrases. The different phrase can belong to the phrase class and correspond to the attribute.
-
Citations
25 Claims
-
1. A Monte Carlo method of developing a training corpus for use with natural language understanding or speech recognition language models, said method comprising:
-
identifying at least one phrase embedded in a body of text, said phrase belonging to a phrase class; determining at least one subject matter attribute corresponding to said identified phrase; and augmenting the training corpus by copying said body of text and replacing said identified phrase with a different phrase selected from a plurality of phrases, said different phrase belonging to said phrase class and having said determined subject matter attribute. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A Monte Carlo method of developing a training corpus for use with natural language understanding or speech recognition language models, said method comprising;
-
identifying at least one phrase embedded within a body of text; locating a second phrase within a plurality of phrases, said second phrase identically matching said identified phrase, wherein said second phrase belongs to a phrase class and has at least one subject matter attribute corresponding to said phrase class; and copying said body of text and replacing said identified phrase with a different phrase selected from said plurality of phrases, said different phrase having a subiect matter attribute that matches the subject matter attribute of said second phrase. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
-
identifying at least one phrase embedded in a body of text, said phrase belonging to a phrase class; determining at least one subject matter attribute corresponding to said identified phrase; and augmenting the training corpus by copying said body of text and replacing said identified phrase with a different phrase selected from a plurality of phrases, said different phrase belonging to said phrase class and having said determined subject matter attribute. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
-
identifying at least one phrase embedded within a body of text; locating a second phrase within a plurality of phrases, said second phrase identically matching said identified phrase, wherein said second phrase belongs to a phrase class and has at least one subject matter attribute corresponding to said phrase class; and copying said body of text and replacing said identified phrase with a different phrase selected from said plurality of phrases, said different phrase having a subject mattet attribute that matches the subject matter attribute of said second phrase. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25. A Monte Carlo method of developing a training corpus for use with natural language understanding or speech recognition language models, said method comprising:
-
identifying at least one phrase embedded in a body of text, said phrase belonging to a phrase class; determining at least one syntax-independent and semantics-independent subject matter attribute corresponding to said identified phrase; and augmenting the training corpus by copying said body of text and replacing said identified phrase with a different phrase selected from a plurality of phrases, said different phrase belonging to said phrase class and having said determined subject matter attribute.
-
Specification