Speech recognition process
First Claim
1. A method performed by one or more processing devices, comprising:
- performing a preliminary recognition process on first audio, the preliminary recognition process comprising;
identifying one or more candidates for the first audio;
determining a plurality of path costs for the identified candidates, the plurality of path costs corresponding to sequences of sub-phonemes identified in the first audio;
determining a best path cost for each of the identified candidates based on the plurality of path costs;
associating the best path costs with the identified candidates; and
providing the identified candidates and associated best path costs;
generating first templates corresponding to the first audio, each first template comprising a number of elements corresponding to a sequence of sub-phonemes of the first audio;
selecting second templates corresponding to the identified candidates, the second templates representing second audio, each second template comprising elements that correspond to the elements in the first templates;
comparing the first templates to the second templates, wherein comparing comprises determining similarity metrics between the first templates and corresponding second templates, wherein the similarity metrics are based onexponentiated and scaled dynamic time warping (DTW) distances between the selected ones of the first templates and selected ones of the second templates;
applying weights to the similarity metrics to produce weighted similarity metrics, the weights being associated with corresponding second templates;
applying the weighted similarity metrics to corresponding best path costs to produce re-scored path costs, the re-scored path costs being associated with corresponding identified candidates; and
using the re-scored path costs to determine which of the identified candidates corresponds to the first audio.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition process may perform the following operations: performing a preliminary recognition process on first audio to identify candidates for the first audio; generating first templates corresponding to the first audio, where each first template includes a number of elements; selecting second templates corresponding to the candidates, where the second templates represent second audio, and where each second template includes elements that correspond to the elements in the first templates; comparing the first templates to the second templates, where comparing comprises includes similarity metrics between the first templates and corresponding second templates; applying weights to the similarity metrics to produce weighted similarity metrics, where the weights are associated with corresponding second templates; and using the weighted similarity metrics to determine whether the first audio corresponds to the second audio.
-
Citations
16 Claims
-
1. A method performed by one or more processing devices, comprising:
-
performing a preliminary recognition process on first audio, the preliminary recognition process comprising; identifying one or more candidates for the first audio; determining a plurality of path costs for the identified candidates, the plurality of path costs corresponding to sequences of sub-phonemes identified in the first audio; determining a best path cost for each of the identified candidates based on the plurality of path costs; associating the best path costs with the identified candidates; and providing the identified candidates and associated best path costs; generating first templates corresponding to the first audio, each first template comprising a number of elements corresponding to a sequence of sub-phonemes of the first audio; selecting second templates corresponding to the identified candidates, the second templates representing second audio, each second template comprising elements that correspond to the elements in the first templates; comparing the first templates to the second templates, wherein comparing comprises determining similarity metrics between the first templates and corresponding second templates, wherein the similarity metrics are based on exponentiated and scaled dynamic time warping (DTW) distances between the selected ones of the first templates and selected ones of the second templates; applying weights to the similarity metrics to produce weighted similarity metrics, the weights being associated with corresponding second templates; applying the weighted similarity metrics to corresponding best path costs to produce re-scored path costs, the re-scored path costs being associated with corresponding identified candidates; and using the re-scored path costs to determine which of the identified candidates corresponds to the first audio. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. One or more non-transitory machine-readable media storing instructions that are executable to perform operations comprising:
-
performing a preliminary recognition process on first audio, the preliminary recognition process comprising; identifying one or more candidates for the first audio; determining a plurality of path costs for the identified candidates, the plurality of path costs corresponding to sequences of sub-phonemes identified in the first audio; determining a best path cost for each of the identified candidates based on the plurality of path costs; associating the best path costs with the identified candidates; and providing the identified candidates and associated best path costs; generating first templates corresponding to the first audio, each first template comprising a number of elements corresponding to a sequence of sub-phonemes of the first audio; selecting second templates corresponding to the identified candidates, the second templates representing second audio, each second template comprising elements that correspond to the elements in the first templates; comparing the first templates to the second templates, wherein comparing comprises determining similarity metrics between the first templates and corresponding second templates, wherein the similarity metrics are based on exponentiated and scaled dynamic time warping (DTW) distances between the selected ones of the first templates and selected ones of the second templates; applying weights to the similarity metrics to produce weighted similarity metrics, the weights being associated with corresponding second templates; and applying the weighted similarity metrics to corresponding best path costs to produce re-scored bath costs, the re-scored bath costs being associated with corresponding identified candidates; using the re-scored path costs to determine which of the identified candidates corresponds to the first audio.
-
-
16. A system comprising:
-
memory to store an acoustic model; and one or more processing devices to perform operations associated with the acoustic model, the acoustic model comprising; a first pass module to perform a preliminary recognition process on first audio, the preliminary recognition process comprising; identifying one or more candidates for the first audio; determining a plurality of path costs for the identified candidates, the plurality of path costs corresponding to sequences of sub-phonemes identified in the first audio; determining a best path cost for each of the identified candidates based on the plurality of path costs; associating the best path costs with the identified candidates; and providing the identified candidates and associated best path costs; a second pass module to; generate first templates corresponding to the first audio, each first template comprising a number of elements corresponding to a sequence of sub-phonemes of the first audio; select second templates corresponding to the identified candidates, the second templates representing second audio, each second template comprising elements that correspond to the elements in the first templates; compare the first templates to the second templates, wherein comparing comprises determining similarity metrics between the first templates and corresponding second templates, wherein the similarity metrics are based exponentiated and scaled dynamic time warping (DTW) distances between the selected ones of the first templates and selected ones of the second templates; apply weights to the similarity metrics to produce weighted similarity metrics, the weights being associated with corresponding second templates; apply the weighted similarity metrics to corresponding best path costs to produce re-scored path costs, the re-scored path costs being associated with corresponding identified candidates; and use the re-scored path costs to determine which of the identified candidates corresponds to the first audio.
-
Specification