Phoneme lattice construction and its application to speech recognition and keyword spotting

US 7,725,319 B2
Filed: 07/07/2003
Issued: 05/25/2010
Est. Priority Date: 07/07/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A method for processing a speech signal, comprising:

using a memory, coupled to a processor, to receive an input speech signal;

using the processor to construct a phoneme lattice for the input speech signal;

determining vertices and arc parameters of the phoneme lattice for the input speech signal;

searching the phoneme lattice to produce a likelihood score for each potential path; and

determining a processing result for the input speech signal based on the likelihood score of each potential path;

wherein constructing the phoneme lattice includes;

segmenting an input speech signal into frames,extracting acoustic features for a frame of the input speech signal,determining K-best initial phoneme paths leading to the frame based on a first score of each potential phoneme path leading to the frame, andcalculating a second score for each of the K-best phoneme paths for the frame;

wherein searching the phoneme lattice comprises;

receiving a phoneme lattice;

traversing the phoneme lattice via potential paths;

computing a score for a traversed path based on at least one of a phoneme confusion matrix and a plurality of language models; and

modifying the score for the traversed path by allowing repetition of phonemes and allowing flexible endpoints for phonemes in a path such that at least one of a first arc that ends at a first frame and a second arc that starts at a third frame is extended so that the first arc and the second arc are directly connected at a second frame.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An arrangement is provided for using a phoneme lattice for speech recognition and/or keyword spotting. The phoneme lattice may be constructed for an input speech signal and searched to produce a textual representation for the input speech signal and/or to determine if the input speech signal contains targeted keywords. An expectation maximization (EM) trained phoneme confusion matrix may be used when searching the phoneme lattice. The phoneme lattice may be constructed in a client and sent to a server, which may search the phoneme lattice to produce a result.

33 Citations

View as Search Results

13 Claims

1. A method for processing a speech signal, comprising:
- using a memory, coupled to a processor, to receive an input speech signal;
  
  using the processor to construct a phoneme lattice for the input speech signal;
  
  determining vertices and arc parameters of the phoneme lattice for the input speech signal;
  
  searching the phoneme lattice to produce a likelihood score for each potential path; and
  
  determining a processing result for the input speech signal based on the likelihood score of each potential path;
  
  wherein constructing the phoneme lattice includes;
  
  segmenting an input speech signal into frames,extracting acoustic features for a frame of the input speech signal,determining K-best initial phoneme paths leading to the frame based on a first score of each potential phoneme path leading to the frame, andcalculating a second score for each of the K-best phoneme paths for the frame;
  
  wherein searching the phoneme lattice comprises;
  
  receiving a phoneme lattice;
  
  traversing the phoneme lattice via potential paths;
  
  computing a score for a traversed path based on at least one of a phoneme confusion matrix and a plurality of language models; and
  
  modifying the score for the traversed path by allowing repetition of phonemes and allowing flexible endpoints for phonemes in a path such that at least one of a first arc that ends at a first frame and a second arc that starts at a third frame is extended so that the first arc and the second arc are directly connected at a second frame.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein determining the processing result comprises determining at least one of the following:
    - at least one candidate textual representation of the input speech signal and a likelihood that the input speech signal contains targeted keywords.
  - 3. The method of claim 1, wherein determining vertices and arc parameters of the phoneme lattice comprises:
    - clustering together K-best initial phoneme paths for at least one consecutive frame; and
      
      selecting M-best refined phoneme paths among the clustered phoneme paths based on second scores of these paths.
  - 4. The method of claim 1, wherein the first score and the second score comprise a score based on phoneme acoustic models and language models.

5. A method for distributing speech processing, comprising:
- using a memory, included in a client, to receive an input speech signal;
  
  using a processor, included in the client and coupled to the memory, to construct a phoneme lattice for the input speech signal;
  
  determining vertices and arc parameters of the phoneme lattice for the input speech signal;
  
  transmitting the phoneme lattice from the client to a server; and
  
  searching the phoneme lattice to produce a result for the input speech signal for the purpose of at least one of recognizing speech and spotting keywords, in the input speech signal;
  
  wherein constructing the phoneme lattice includes;
  
  segmenting an input speech signal into frames,extracting acoustic features for a frame of the input speech signal,determining K-best initial phoneme paths leading to the frame based on a first score of each potential phoneme path leading to the frame, andcalculating a second score for each of the K-best phoneme paths;
  
  wherein searching the phoneme lattice comprises;
  
  receiving a phoneme lattice;
  
  traversing the phoneme lattice via potential paths;
  
  computing a score for a traversed path based on at least one of a phoneme confusion matrix and a plurality of language models; and
  
  modifying the score for the traversed path by allowing repetition of phonemes and allowing flexible endpoints for phonemes in a path such that at least one of a first arc that ends at a first frame and a second arc that starts at a third frame is extended such that the first arc and the third arc are directly connected at a second frame.
- View Dependent Claims (6, 7)
- - 6. The method of claim 5, wherein determining vertices and arc parameters of the phoneme lattice comprises:
    - clustering together K-best initial phoneme paths for at least one consecutive frame; and
      
      selecting M-best refined phoneme paths among the clustered phoneme paths based on second scores of these paths.
  - 7. The method of claim 5, wherein the first score and the second score comprise a score based on phoneme acoustic models and phoneme language models.

8. A speech processing system, comprising:
- a phoneme lattice constructor to construct a phoneme lattice for an input speech signal;
  
  a phoneme lattice search mechanism to search the phoneme lattice for the purpose of at least of recognizing speech and spotting keywords, in the input speech signal;
  
  a plurality of models for lattice construction; and
  
  a plurality of models for lattice search;
  
  wherein the phoneme lattice constructor includes;
  
  an acoustic feature extractor to segment the input speech signal into frames and to extract acoustic features for a frame,a phoneme path estimator to determine K-best initial phoneme paths leading to the frame,a global score evaluator to determine M-best refined phoneme paths based on a cluster of K-best paths of at least one consecutive frame, anda lattice parameter identifier to identify lattice vertices and arc parameters based on M-best refined phoneme paths of each frame, wherein at least one of a first arc that ends at a first frame and a second arc that starts at a third frame is extended such that the first arc and the third arc are directly connected at a second frame.
- View Dependent Claims (9, 10)
- - 9. The system of claim 8, wherein the plurality of models for lattice construction comprise a plurality of phoneme acoustic models and a plurality of language models.
  - 10. The system of claim 8, wherein the plurality of models for lattice search comprise a phoneme confusion matrix and a plurality of language models.

11. An article comprising:
- a machine accessible medium having content stored thereon, wherein the content is accessed by a processor, the content provides for processing a speech signal by;
  
  receiving an input speech signal;
  
  constructing a phoneme lattice for the input speech signal;
  
  determining arc parameters of the phoneme lattice;
  
  receiving a phoneme lattice;
  
  traversing the phoneme lattice via potential paths;
  
  computing a score for a traversed path based on at least one of a phoneme confusion matrix and a plurality of language models; and
  
  modifying the score for based on flexible endpoints for phonemes in the traversed path; and
  
  determining a processing result for the input speech signal based on the modified score.

12. An article comprising:
- a machine accessible medium having content stored thereon, wherein when the content is accessed by a processor, the content provides for distributing speech processing by;
  
  receiving an input speech signal by a client;
  
  constructing a phoneme lattice for the input speech signal by the client;
  
  determining vertices and arc parameters of the phoneme lattice for the input speech signal;
  
  transmitting the phoneme lattice from the client to a server; and
  
  searching the phoneme lattice to produce a result for the input speech signal for the purpose of at least one of recognizing speech and spotting keywords, in the input speech signal;
  
  wherein constructing the phoneme lattice includes;
  
  segmenting an input speech signal into frames,extracting acoustic features for a frame of the input speech signal,determining K-best initial phoneme paths leading to the frame based on a first score of each potential phoneme path leading to the frame, andcalculating a second score for each of the K-best phoneme paths;
  
  wherein searching the phoneme lattice comprises;
  
  receiving a phoneme lattice;
  
  traversing the phoneme lattice via potential paths;
  
  computing a score for a traversed path based on at least one of a phoneme confusion matrix and a plurality of language models; and
  
  modifying the score for the traversed path by allowing flexible endpoints for phonemes in a path such that, based on the flexible endpoints, at least one of a first arc that ends at a first frame and a second arc that starts at a third frame is extended so that the first arc and the second arc are directly connected at a second frame.
- View Dependent Claims (13)
- - 13. The article of claim 12, wherein determining vertices and arc parameters of the phoneme lattice comprises:
    - clustering together K-best initial phoneme paths for at least one consecutive frame; and
      
      selecting M-best refined phoneme paths among the clustered phoneme paths based on second scores of these paths.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dialogic, Inc. (Enghouse Systems Limited)
Original Assignee
Dialogic, Inc. (Enghouse Systems Limited)
Inventors
Aronowitz, Hagai
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US10/616,310
Publication Number

US 20050010412A1
Time in Patent Office

2,514 Days
Field of Search

704/253
US Class Current

704/253
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/04   Segmentation; Word boundary...

G10L 15/06   Creation of reference templ...

G10L 2015/025   Phonemes, fenemes or fenone...

Phoneme lattice construction and its application to speech recognition and keyword spotting

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

33 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Phoneme lattice construction and its application to speech recognition and keyword spotting

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links