Methods And Apparatus For Speech Segmentation Using Multiple Metadata

US 20170213556A1
Filed: 08/18/2014
Published: 07/27/2017
Est. Priority Date: 08/18/2014
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

processing microphone signals by a speech enhancement module to generate an audio stream signal;

processing of the microphone signals by a first speech detector to generate first metadata;

processing of the microphone signals by a second speech detector to generate second metadata;

performing endpointing of the audio stream signal using the first and second metadata; and

to performing speech recognition on the audio stream signal using the endpointing including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech confirmed state, in which speech recognition is performed, based upon the second metadata.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus to process microphone signals by a speech enhancement module to generate an audio stream signal including first and second metadata for use by a speech recognition module. In an embodiment, speech recognition is performed using endpointing information including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech state, in which speech recognition is performed, based upon the second metadata.

Citations

21 Claims

1. A method, comprising:
- processing microphone signals by a speech enhancement module to generate an audio stream signal;
  
  processing of the microphone signals by a first speech detector to generate first metadata;
  
  processing of the microphone signals by a second speech detector to generate second metadata;
  
  performing endpointing of the audio stream signal using the first and second metadata; and
  
  to performing speech recognition on the audio stream signal using the endpointing including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech confirmed state, in which speech recognition is performed, based upon the second metadata.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method according to claim 1, wherein the first metadata has a frame-by-frame time scale.
  - 3. The method according to claim 1, wherein the second metadata has a sequence of frames time scale.
  - 4. The method according to claim 1, further including performing one or more of barge-in, beamforming, and/or echo cancellation for generating the first and/or second metadata.
  - 5. The method according to claim 1, further including tuning a speech detection threshold for a given latency for the first metadata.
  - 6. The method according to claim 1, further including adjusting latency for a given confidence level of voice activity detection for the second metadata.
  - 7. The method according to claim 1, further including controlling computation of the second metadata using the first metadata or computation of the first metadata using the second metadata.
  - 8. The method according to claim 1, further including performing one or more of barge-in, beamforming, and/or echo cancellation for generating further metadata.
  - 9. The method according to claim 1, wherein at least one of the first and second metadata is encoded into the audio signal.

10. An article, comprising:
- a non-transitory computer readable medium having stored instructions that enable a machine to;
  
  process microphone signals by a speech enhancement module to generate an audio stream signal;
  
  process of the microphone signals by a first speech detector to generate first metadata;
  
  processing of the microphone signals by a second speech detector to generate second metadata;
  
  perform endpointing of the audio stream signal using the first and second metadata; and
  
  perform speech recognition on the audio stream signal using the endpointing including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech confirmed state, in which speech recognition is performed, based upon the second metadata.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The article according to claim 10, wherein the first metadata has a frame-by-frame time scale.
  - 12. The article according to claim 10, wherein the second metadata has a sequence of frames time scale.
  - 13. The article according to claim 10, further including instructions to perform one or more of barge-in, beamforming, and/or echo cancellation for generating the first and second metadata.
  - 14. The article according to claim 10, further including instructions to tune speech detector parameters for a given latency for the first metadata.
  - 15. The article according to claim 10, further including instructions to adjust latency for a given confidence level of voice activity detection for the second metadata.
  - 16. The article according to claim 10, further including instructions to control computation of the second metadata using the first metadata or computation of the first metadata using the second metadata.
  - 17. The article according to claim 10, further including instructions to perform one or more of barge-in, beamforming, and/or echo cancellation for generating further metadata.

18. A system, comprising:
- a speech enhance module to process microphone signals for generating an audio stream signal, the speech enhancement module comprising;
  
  a first speech detector to process for generating first metadata; and
  
  a second speech detector to process the microphone signals for generating second metadata; and
  
  an automated speech recognition module to receive the audio stream from the speech enhancement module, the speech recognition module comprising;
  
  an endpointing module to perform endpointing of the audio stream signal using the first and second metadata; and
  
  a speech recognition module to perform speech recognition on the audio stream signal using the endpointing including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech confirmed state, in which speech recognition is performed, based upon the second metadata.
- View Dependent Claims (19, 20, 21)
- - 19. The system according to claim 18, further including a further speech detector to perform one or more of barge-in, beamforming, and/or echo cancellation for generating further metadata for use by the endpointing module.
  - 20. The system according to claim 18, wherein the first speech detector is further configured to tune detector parameters for a given latency for the first metadata.
  - 21. The system according to claim 18, wherein the second speech detector is further configured to adjust latency for a given confidence level of voice activity detection using the second metadata.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
BUCK, Markus, HERBIG, Tobias, GRAF, Simon, RIS, Christophe

Granted Patent

US 10,229,686 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 15/28   Constructional details of s...

G10L 21/02   Speech enhancement, e.g. no...

G10L 25/78   Detection of presence or ab...

Methods And Apparatus For Speech Segmentation Using Multiple Metadata

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Methods And Apparatus For Speech Segmentation Using Multiple Metadata

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links