Methods and apparatus for speech segmentation using multiple metadata
First Claim
1. A method of performing automated speech recognition (ASR) in a system having a speech enhancement module for generating an audio stream signal and metadata, coupled to an ASR module for performing speech recognition on the audio stream signal using the metadata, the method comprising:
- by the speech enhancement module, processing microphone signals to generate the audio stream signal;
by a first speech detector having a first response latency, generating first metadata that indicate the possible presence of speech in the audio stream signal with a first confidence level;
by a second speech detector having a second response latency that is higher than the first response latency, generating second metadata that indicate the possible presence of speech in the audio stream signal with a second confidence level that is higher than the first confidence level;
by the ASR module based on the first metadata, initiating buffering of the audio stream signal from an endpoint; and
by the ASR module based on the second metadata, initiating speech recognition on the buffered audio stream signal from the endpoint.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus to process microphone signals by a speech enhancement module to generate an audio stream signal including first and second metadata for use by a speech recognition module. In an embodiment, speech recognition is performed using endpointing information including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech state, in which speech recognition is performed, based upon the second metadata.
-
Citations
21 Claims
-
1. A method of performing automated speech recognition (ASR) in a system having a speech enhancement module for generating an audio stream signal and metadata, coupled to an ASR module for performing speech recognition on the audio stream signal using the metadata, the method comprising:
-
by the speech enhancement module, processing microphone signals to generate the audio stream signal; by a first speech detector having a first response latency, generating first metadata that indicate the possible presence of speech in the audio stream signal with a first confidence level; by a second speech detector having a second response latency that is higher than the first response latency, generating second metadata that indicate the possible presence of speech in the audio stream signal with a second confidence level that is higher than the first confidence level; by the ASR module based on the first metadata, initiating buffering of the audio stream signal from an endpoint; and by the ASR module based on the second metadata, initiating speech recognition on the buffered audio stream signal from the endpoint. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An article, comprising a non-transitory computer readable medium having stored instructions that when executed perform a method of automated speech recognition (ASR) in a system having a speech enhancement module for generating an audio stream signal and metadata, coupled to an ASR module for performing speech recognition on the audio stream signal using the metadata, the method comprising:
-
by the speech enhancement module, processing microphone signals to generate the audio stream signal; by a first speech detector having a first response latency, generating first metadata that indicate the possible presence of speech in the audio stream signal with a first confidence level; by a second speech detector having a second response latency that is higher than the first response latency, generating second metadata that indicate the possible presence of speech in the audio stream signal with a second confidence level that is higher than the first confidence level; by the ASR module based on the first metadata, initiating buffering of the audio stream signal from an endpoint; and by the ASR module based on the second metadata, initiating speech recognition on the buffered audio stream signal from the endpoint. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A system for performing automated speech recognition (ASR) comprising a speech enhancement module for generating an audio stream signal and metadata, coupled to an ASR module for performing speech recognition on the audio stream signal using the metadata, the system further comprising:
-
in the speech enhancement module, electronic circuitry configured to provide; a first speech detector having a first response latency for generating first metadata that indicate the possible presence of speech in the audio stream signal with a first confidence level; and a second speech detector having a second response latency that is higher than the first response latency for generating second metadata that indicate the possible presence of speech in the audio stream signal with a second confidence level that is higher than the first confidence level; and in the ASR module, electronic circuitry configured to provide; an endpointing module for initiating, based on the first metadata, buffering of the audio stream signal from an endpoint, and for initiating, based on the second metadata, speech recognition on the buffered audio stream signal from the endpoint. - View Dependent Claims (19, 20, 21)
-
Specification