System and method of extracting clauses for spoken language understanding
First Claim
1. A method comprising:
- annotating data by;
inserting, via a processor and via a discriminative classification approach independent of using n-grams, boundary tags at boundaries in a speech utterance text based on a training with weighted examples, wherein higher weights indicate more difficult examples, and wherein the boundary tags comprise one of a phrase boundary tag, a sentence boundary tag, and a paragraph boundary tag, to yield boundary marked speech utterance text;
thereafter, inserting an edit tag in the boundary marked speech utterance text, to yield edited text and unedited text, wherein the edit tag identifies a portion of the speech utterance text to be removed based on repeated words which do not contribute to language understanding, and wherein the speech utterance text has not been processed to identify clauses;
thereafter, inserting conjunction tags within the unedited text which identify, without relying on punctuation cues, coordinating conjunctions selected from a list comprising {and, but, for, nor, or, so, yet}, to yield conjunction tag text; and
identifying clauses within the speech utterance text based on the boundary marked speech utterance text, the edited text, and the conjunction tag text to yield annotated data; and
iteratively repeating the annotating of the data, where each successive iteration has a longer turn than an immediately preceding iteration and each successive iteration is used to retrain a model associated with the discriminative classification approach.
4 Assignments
0 Petitions
Accused Products
Abstract
A clausifier and method of extracting clauses for spoken language understanding are disclosed. The method relates to generating a set of clauses from speech utterance text and comprises inserting at least one boundary tag in speech utterance text related to sentence boundaries, inserting at least one edit tag indicating a portion of the speech utterance text to remove, and inserting at least one conjunction tag within the speech utterance text. The result is a set of clauses that may be identified within the speech utterance text according to the inserted at least one boundary tag, at least one edit tag and at least one conjunction tag. The disclosed clausifier comprises a sentence boundary classifier, an edit detector classifier, and a conjunction detector classifier. The clausifier may comprise a single classifier or a plurality of classifiers to perform the steps of identifying sentence boundaries, editing text, and identifying conjunctions within the text.
49 Citations
18 Claims
-
1. A method comprising:
-
annotating data by; inserting, via a processor and via a discriminative classification approach independent of using n-grams, boundary tags at boundaries in a speech utterance text based on a training with weighted examples, wherein higher weights indicate more difficult examples, and wherein the boundary tags comprise one of a phrase boundary tag, a sentence boundary tag, and a paragraph boundary tag, to yield boundary marked speech utterance text; thereafter, inserting an edit tag in the boundary marked speech utterance text, to yield edited text and unedited text, wherein the edit tag identifies a portion of the speech utterance text to be removed based on repeated words which do not contribute to language understanding, and wherein the speech utterance text has not been processed to identify clauses; thereafter, inserting conjunction tags within the unedited text which identify, without relying on punctuation cues, coordinating conjunctions selected from a list comprising {and, but, for, nor, or, so, yet}, to yield conjunction tag text; and identifying clauses within the speech utterance text based on the boundary marked speech utterance text, the edited text, and the conjunction tag text to yield annotated data; and iteratively repeating the annotating of the data, where each successive iteration has a longer turn than an immediately preceding iteration and each successive iteration is used to retrain a model associated with the discriminative classification approach. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
annotating data by; inserting, via a processor and via a discriminative classification approach independent of using n-grams, boundary tags at boundaries in a speech utterance text based on a training with weighted examples, wherein higher weights indicate more difficult examples, and wherein the boundary tags comprise one of a phrase boundary tag, a sentence boundary tag, and a paragraph boundary tag, to yield boundary marked speech utterance text; thereafter, inserting an edit tag in the boundary marked speech utterance text, to yield edited text and unedited text, wherein the edit tag identifies a portion of the speech utterance text to be removed based on repeated words which do not contribute to language understanding, and wherein the speech utterance text has not been processed to identify clauses; thereafter, inserting conjunction tags within the unedited text which identify, without relying on punctuation cues, coordinating conjunctions selected from a list comprising {and, but, for, nor, or, so, yet}, to yield conjunction tag text; and identifying clauses within the speech utterance text based on the boundary marked speech utterance text, the edited text, and the conjunction tag text to yield annotated data; and iteratively repeating the annotating of the data, where each successive iteration has a longer turn than an immediately preceding iteration and each successive iteration is used to retrain a model associated with the discriminative classification approach. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
a processor; and a computer-readable storage medium having stored therein instructions which, when executed by the processor, cause the processor to perform operations comprising; annotating data by; inserting, via a processor and via a discriminative classification approach independent of using n-grams, boundary tags at boundaries in a speech utterance text based on a training with weighted examples, wherein higher weights indicate more difficult examples, and wherein the boundary tags comprise one of a phrase boundary tag, a sentence boundary tag, and a paragraph boundary tag, to yield boundary marked speech utterance text; thereafter, inserting an edit tag in the boundary marked speech utterance text, to yield edited text and unedited text, wherein the edit tag identifies a portion of the speech utterance text to be removed based on repeated words which do not contribute to language understanding, and wherein the speech utterance text has not been processed to identify clauses; thereafter, inserting conjunction tags within the unedited text which identify, without relying on punctuation cues, coordinating conjunctions selected from a list comprising {and, but, for, nor, or, so, yet}, to yield conjunction tag text; and identifying clauses within the speech utterance text based on the boundary marked speech utterance text, the edited text, and the conjunction tag text to yield annotated data; and iteratively repeating the annotating of the data, where each successive iteration has a longer turn than an immediately preceding iteration and each successive iteration is used to retrain a model associated with the discriminative classification approach. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification