Detecting repeated phrases and inference of dialogue models
First Claim
Patent Images
1. A method of speech recognition, comprising:
- obtaining acoustic data from a plurality of conversations;
selecting a plurality of pairs of utterances from said plurality of conversations;
dynamically aligning and computing acoustic similarity of at least one portion of the first utterance of said pair of utterances with at least one portion of the second utterance of said pair of utterances;
choosing at least one pair that includes a first portion from a first utterance and a second portion from a second utterance based on a criterion of acoustic similarity; and
creating a common pattern template from the first portion and the second portion.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of speech recognition obtains acoustic data from a plurality of conversations. A plurality of pairs of utterances are selected from the plurality of conversations. At least one portion of the first utterance of the pair of utterances is dynamically aligned with at least one portion of the second utterance of the pair of utterance, and an acoustic similarity is computed. At least one pair that includes a first portion from a first utterance and a second portion from a second utterance is chosen, based on a criterion of acoustic similarity. A common pattern template is created from the first portion and the second portion.
162 Citations
36 Claims
-
1. A method of speech recognition, comprising:
-
obtaining acoustic data from a plurality of conversations;
selecting a plurality of pairs of utterances from said plurality of conversations;
dynamically aligning and computing acoustic similarity of at least one portion of the first utterance of said pair of utterances with at least one portion of the second utterance of said pair of utterances;
choosing at least one pair that includes a first portion from a first utterance and a second portion from a second utterance based on a criterion of acoustic similarity; and
creating a common pattern template from the first portion and the second portion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A speech recognition grammar inference method, comprising:
-
obtaining word scripts for utterances from a plurality of conversations based at least in part on a speech recognition process;
counting a number of times that each word sequence occurs in the said word scripts;
creating a set of common word sequences based on the frequency of occurrence of each word sequence;
selecting a set of sample phrases from said word scripts including a plurality of word sequences from said set of common word sequences; and
creating a plurality of phrase templates from said set of sample phrases by using said fixed template portions to represent said common word sequences and variable template portions to represent other word sequences in said set of sample phrases. - View Dependent Claims (10, 11)
-
-
12. A speech recognition dialogue state space inference method, comprising:
-
obtaining word scripts for utterances from a plurality of conversations based at least in part on a speech recognition process;
representing the process of each speaker speaking in turn in a given conversation as a sequence of hidden random variables;
representing the probability of occurrence of words and common word sequences as based on the values of the sequence of hidden random variables; and
inferring the probability distributions of the hidden random variables for each word script. - View Dependent Claims (13, 14, 15)
-
-
16. A speech recognition system, comprising:
-
means for obtaining acoustic data from a plurality of conversations;
means for selecting a plurality of pairs of utterances from said plurality of conversations;
means for dynamically aligning and computing acoustic similarity of at least one portion of the first utterance of said pair of utterances with at least one portion of the second utterance of said pair of utterances;
means for choosing at least one pair that includes a first portion from a first utterance and a second portion from a second utterance based on a criterion of acoustic similarity; and
means for creating a common pattern template from the first portion and the second portion. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. A speech recognition grammar inference system, comprising:
-
means for obtaining word scripts for utterances from a plurality of conversations based at least in part on a speech recognition process;
means for counting a number of times that each word sequence occurs in the said word scripts;
means for creating a set of common word sequence based on the frequency of occurrence of each word sequence;
means for selecting a set of sample phrases from said word scripts including a plurality of word sequences from said set of common word sequences; and
means for creating a plurality of phrase templates from said set of samples phrases by using said fixed template portions to represent said common word sequences and variable template portions to represent other word sequences in said set of sample phrases. - View Dependent Claims (24, 25)
-
-
26. A speech recognition dialogue state space inference system, comprising:
-
means for obtaining word script for utterances from a plurality of conversations based at least in part on a speech recognition process;
means for representing the process of each speaker speaking in turn in a given conversation as a sequence of hidden random variables;
means for representing the probability of occurrence of words and common word sequences as based on the values of the sequence of hidden random variables; and
means for inferring the probability distributions of the hidden random variables for each word script. - View Dependent Claims (27, 28, 29)
-
-
30. A program product having machine-readable program code for performing speech recognition, the program code, when executed, causing a machine to perform the following steps:
-
obtaining acoustic data from a plurality of conversations;
selecting a plurality of pairs of utterances from said plurality of conversations;
dynamically aligning and computing acoustic similarity of at least one portion of the first utterance of said pair of utterances with at least one portion of the second utterance of said pair of utterances;
choosing at least one pair that includes a first portion from a first utterance and a second portion from a second utterance based on a criterion of acoustic similarity; and
creating a common pattern template from the first portion and the second portion. - View Dependent Claims (31, 32, 33, 34)
-
-
35. A method of training recognition units and language models for speech recognition, comprising:
-
obtaining models for common pattern templates for a plurality of types of recognition units;
initializing language models for hidden stochastic processes;
computing probability distribution of hidden state random variables of the hidden stochastic processes representing hidden language model states according to a first predetermined algorithm;
estimating the language models and the models for the common pattern templates for the plurality of types of recognition units using a second predetermined algorithm; and
determining if a convergence criteria has been met for the estimating step, and if so, outputting the language models and the models for the common pattern templates for the plurality of types of recognition units, as an optimized set of models for use in speech recognition. - View Dependent Claims (36)
-
Specification