Methods And Apparatus For Speech Segmentation Using Multiple Metadata
First Claim
Patent Images
1. A method, comprising:
- processing microphone signals by a speech enhancement module to generate an audio stream signal;
processing of the microphone signals by a first speech detector to generate first metadata;
processing of the microphone signals by a second speech detector to generate second metadata;
performing endpointing of the audio stream signal using the first and second metadata; and
to performing speech recognition on the audio stream signal using the endpointing including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech confirmed state, in which speech recognition is performed, based upon the second metadata.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus to process microphone signals by a speech enhancement module to generate an audio stream signal including first and second metadata for use by a speech recognition module. In an embodiment, speech recognition is performed using endpointing information including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech state, in which speech recognition is performed, based upon the second metadata.
-
Citations
21 Claims
-
1. A method, comprising:
-
processing microphone signals by a speech enhancement module to generate an audio stream signal; processing of the microphone signals by a first speech detector to generate first metadata; processing of the microphone signals by a second speech detector to generate second metadata; performing endpointing of the audio stream signal using the first and second metadata; and to performing speech recognition on the audio stream signal using the endpointing including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech confirmed state, in which speech recognition is performed, based upon the second metadata. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An article, comprising:
-
a non-transitory computer readable medium having stored instructions that enable a machine to; process microphone signals by a speech enhancement module to generate an audio stream signal; process of the microphone signals by a first speech detector to generate first metadata; processing of the microphone signals by a second speech detector to generate second metadata; perform endpointing of the audio stream signal using the first and second metadata; and perform speech recognition on the audio stream signal using the endpointing including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech confirmed state, in which speech recognition is performed, based upon the second metadata. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A system, comprising:
-
a speech enhance module to process microphone signals for generating an audio stream signal, the speech enhancement module comprising; a first speech detector to process for generating first metadata; and a second speech detector to process the microphone signals for generating second metadata; and an automated speech recognition module to receive the audio stream from the speech enhancement module, the speech recognition module comprising; an endpointing module to perform endpointing of the audio stream signal using the first and second metadata; and a speech recognition module to perform speech recognition on the audio stream signal using the endpointing including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech confirmed state, in which speech recognition is performed, based upon the second metadata. - View Dependent Claims (19, 20, 21)
-
Specification