Phoneme lattice construction and its application to speech recognition and keyword spotting
First Claim
Patent Images
1. A method for processing a speech signal, comprising:
- receiving an input speech signal;
constructing a phoneme lattice for the input speech signal;
searching the phoneme lattice to produce a likelihood score for each potential path; and
determining a processing result for the input speech signal based on the likelihood score of each potential path.
1 Assignment
0 Petitions
Accused Products
Abstract
An arrangement is provided for using a phoneme lattice for speech recognition and/or keyword spotting. The phoneme lattice may be constructed for an input speech signal and searched to produce a textual representation for the input speech signal and/or to determine if the input speech signal contains targeted keywords. An expectation maximization (EM) trained phoneme confusion matrix may be used when searching the phoneme lattice. The phoneme lattice may be constructed in a client and sent to a server, which may search the phoneme lattice to produce a result.
90 Citations
59 Claims
-
1. A method for processing a speech signal, comprising:
-
receiving an input speech signal;
constructing a phoneme lattice for the input speech signal;
searching the phoneme lattice to produce a likelihood score for each potential path; and
determining a processing result for the input speech signal based on the likelihood score of each potential path. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for constructing a phoneme lattice for an input audio signal comprising:
-
segmenting the input audio signal into frames;
extracting acoustic features for a frame of the input audio signal;
determining K-best initial phoneme paths leading to the frame based on a first score of each potential phoneme path leading to the frame; and
calculating a second score for each of the K-best phoneme paths for the frame. - View Dependent Claims (9, 10)
-
-
11. A method for searching a phoneme lattice, comprises:
-
receiving a phoneme lattice;
traversing the phoneme lattice via potential paths; and
computing a score for a traversed path based on at least one of a phoneme confusion matrix and a plurality of language models. - View Dependent Claims (12, 13, 14)
-
-
15. A method for distributing speech processing, comprising:
-
receiving an input speech signal by a client;
constructing a phoneme lattice for the input speech signal by the client;
transmitting the phoneme lattice from the client to a server; and
searching the phoneme lattice to produce a result for the input speech signal for the purpose of at least one of recognizing speech and spotting keywords, in the input speech signal. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A method for training a phoneme confusion matrix, comprising:
-
initializing the phoneme confusion matrix;
estimating confusion probabilities between phonemes based on a training database, and the initial phoneme confusion matrix; and
updating the phoneme confusion matrix based on the estimated confusion probabilities. - View Dependent Claims (22, 23)
-
-
24. A speech processing system, comprising:
-
a phoneme lattice constructor to construct a phoneme lattice for an input speech signal;
a phoneme lattice search mechanism to search the phoneme lattice for the purpose of at least of recognizing speech and spotting keywords, in the input speech signal;
a plurality of models for lattice construction; and
a plurality of models for lattice search. - View Dependent Claims (25, 26, 27)
-
-
28. A system for constructing a phoneme lattice, comprising:
-
an acoustic feature extractor to segment an input speech signal into frames and to extract acoustic features for a frame;
a phoneme path estimator to determine K-best initial phoneme paths leading to the frame;
a global score evaluator to determine M-best refined phoneme paths based on a cluster of K-best paths of at least one consecutive frame; and
a lattice parameter identifier to identify lattice vertices and arc parameters based on M-best refined phoneme paths of each frame. - View Dependent Claims (29, 30)
-
-
31. A distributed speech processing system, comprising:
-
a client to receive an input speech signal and to construct a phoneme lattice for the input speech signal; and
a server to search the phoneme lattice to produce a result for the input speech signal for the purpose of at least one of recognizing speech and spotting keywords, in the input speech signal. - View Dependent Claims (32, 33)
-
-
34. A system for training a phoneme confusion matrix, comprising:
-
a confusion matrix initializer to initialize the phoneme confusion matrix;
a phoneme lattice constructor to construct a phoneme lattice for each utterance in a training database; and
a phoneme lattice search mechanism to search the phoneme lattice to produce a phoneme sequence hypothesis for the corresponding utterance, based on the initial phoneme confusion matrix and a plurality of language models. - View Dependent Claims (35, 36)
-
-
37. An article comprising:
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for processing a speech signal by;
receiving an input speech signal;
constructing a phoneme lattice for the input speech signal;
searching the phoneme lattice to produce a likelihood score for each potential path; and
determining a processing result for the input speech signal based on the likelihood score of each potential path. - View Dependent Claims (38, 39, 40, 41, 42, 43)
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for processing a speech signal by;
-
44. An article comprising:
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for constructing a phoneme lattice for an input audio signal by;
segmenting the input audio signal into frames;
extracting acoustic features for a frame of the input audio signal;
determining K-best initial phoneme paths leading to the frame based on a first score of each potential phoneme path leading to the frame; and
calculating a second score for each of the K-best phoneme paths for the frame. - View Dependent Claims (45, 46)
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for constructing a phoneme lattice for an input audio signal by;
-
47. An article comprising:
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for searching a phoneme lattice by;
receiving a phoneme lattice;
traversing the phoneme lattice via potential paths; and
computing a score for a traversed path based on at least one of a phoneme confusion matrix and a plurality of language models. - View Dependent Claims (48, 49, 50)
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for searching a phoneme lattice by;
-
51. An article comprising:
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for distributing speech processing by;
receiving an input speech signal by a client;
constructing a phoneme lattice for the input speech signal by the client;
transmitting the phoneme lattice from the client to a server; and
searching the phoneme lattice to produce a result for the input speech signal for the purpose of at least one of recognizing speech and spotting keywords, in the input speech signal. - View Dependent Claims (52, 53, 54, 55, 56)
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for distributing speech processing by;
-
57. An article comprising:
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for training a phoneme confusion matrix by;
initializing the phoneme confusion matrix;
estimating confusion probabilities between phonemes based on a training database, and the initial phoneme confusion matrix; and
updating the phoneme confusion matrix based on the estimated confusion probabilities. - View Dependent Claims (58, 59)
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for training a phoneme confusion matrix by;
Specification