Method and apparatus for transmitting speech activity in distributed voice recognition systems

US 20030061042A1
Filed: 05/28/2002
Published: 03/27/2003
Est. Priority Date: 06/14/2001
Status: Active Grant

First Claim

Patent Images

1. A method of providing detected voice activity information associated with a speech signal to a remote device, comprising:

assembling detected voice activity information related to said speech signal;

identifying feature extraction information related to said speech signal;

selectively utilizing said detected voice activity information and said feature extraction information to form advanced front end data; and

providing the advanced front end data comprising detected voice activity information to the remote device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for transmitting speech activity in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server . The local VR engine comprises an advanced feature extraction (AFE) module that extracts features from a speech signal, and a voice activity detection (VAD) module that detects voice activity within a speech signal. The combined results from the VAD module and feature extraction module are provided in an efficient manner to a remote device, such as a server, in the form of advanced front end features, thereby enabling the server to process speech segments free of silence regions. Various aspects of efficient speech segment transmission are disclosed.

Citations

32 Claims

1. A method of providing detected voice activity information associated with a speech signal to a remote device, comprising:
- assembling detected voice activity information related to said speech signal;
  
  identifying feature extraction information related to said speech signal;
  
  selectively utilizing said detected voice activity information and said feature extraction information to form advanced front end data; and
  
  providing the advanced front end data comprising detected voice activity information to the remote device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein said feature extraction information identification comprises extracting a set of features corresponding to segments of the speech signal.
  - 3. The method of claim 1, wherein said assembling, identifying, and selectively utilizing are performed at a subscriber device.
  - 4. The method of claim 3, wherein the providing comprises the subscriber device removing segments of silence and providing silence free speech segments to the remote device.
  - 5. The method of claim 3, wherein the providing comprises:
    - the subscriber device transmitting all speech including silence to the remote device;
      
      the subscriber device transmitting at least one indication where silence regions exist; and
      
      the remote device separates speech segments form silence and utilizes the speech segments.
  - 6. The method of claim 5, wherein the at least one indication is transmitted on a channel separate from a speech transmission channel.
  - 7. The method of claim 1, further comprising assembling detected voice activity information substantially in parallel to the feature extraction identification.
  - 8. The method of claim 7, wherein voice detection activity is quantized at a lower rate when feature extraction identification indicates silence regions.
  - 9. The method of claim 7, wherein detected voice activity information assembly comprises determining a voice activity vector, and segment extraction comprises determining a feature vector, and the method further comprises concatenating the voice activity vector and the feature vector to process and determine advanced front end features.
  - 10. The method of claim 1, wherein feature extraction identification comprises determining a feature vector.
  - 11. The method of claim 10, wherein said determining comprises:
    - detecting speech activity and upon detecting speech activity, computing an average feature vector corresponding to frames dropped; and
      
      transmitting a total number of frames dropped prior to transmitting speech frames.

12. An apparatus for transmitting speech activity, comprising:
- a voice activity detector;
  
  a feature extractor operating substantially in parallel to the voice activity detector;
  
  a transmitter; and
  
  a receiving device;
  
  wherein the feature extractor and voice activity detector operate to extract features from speech and detect voice activity information from speech and selectively utilize extracted features and detected voice activity information to form advanced front end data.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The apparatus of claim 12, wherein said voice activity detector, said feature extractor, and said transmitter reside on a subscriber unit.
  - 14. The apparatus of claim 13, wherein the subscriber unit removes segments of silence and provides silence free speech segments to the remote device.
  - 15. The apparatus of claim 13, wherein:
    - the subscriber device transmits all speech including silence to the remote device;
      
      the subscriber device transmits at least one indication of at least one silence region; and
      
      the remote device separates speech segments form silence and utilizes the speech segments.
  - 16. The apparatus of claim 15, wherein the at least one indication is transmitted by the transmitter on a channel separate from a speech transmission channel.
  - 17. The apparatus of claim 12, wherein the apparatus quantizes voice detection activity from the voice activity detector at a lower rate in circumstances where feature extraction indicates silence regions.
  - 18. The apparatus of claim 12, wherein the voice activity detector determines a voice activity vector, and the feature extractor determines a feature vector.
  - 19. The apparatus of claim 18, wherein the apparatus concatenates the voice activity vector and the feature vector to process and determine advanced front end data.
  - 20. The apparatus of claim 12, wherein the feature extractor determines a feature vector.
  - 21. The apparatus of claim 20, wherein the apparatus computes an average feature vector corresponding to frames dropped upon detecting speech activity and transmits a total number of frames dropped prior to transmitting speech frames.

22. A method of transmitting speech data to a remote device, comprising:
- extracting voice activity data from the speech data;
  
  identifying feature extraction data from the speech data; and
  
  selectively transmitting information related to said voice activity data and said feature extraction data in the form of advanced front end data to the remote device.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 23. The method of claim 22, wherein said feature extraction data identification comprises extracting a set of features corresponding to segments of the speech signal.
  - 24. The method of claim 22, wherein said extracting and identifying occur at a subscriber device.
  - 25. The method of claim 24, wherein selective information transmitting comprises the subscriber device removing segments of silence and providing silence free speech segments to the remote device.
  - 26. The method of claim 24, wherein the selective information transmitting comprises:
    - the subscriber device transmitting all speech including silence to the remote device;
      
      the subscriber device transmitting at least one indication where at least one silence region exists; and
      
      the remote device separates speech segments form silence and utilizes the speech segments.
  - 27. The method of claim 26, wherein the at least one indication is transmitted on a channel separate from a speech transmission channel.
  - 28. The method of claim 22, further comprising extracting voice activity data substantially in parallel to feature extraction data identification.
  - 29. The method of claim 28, wherein voice activity data is quantized to advanced front end data at a lower rate when extracting and identifying indicate silence regions.
  - 30. The method of claim 28, wherein voice activity detection comprises determining a voice activity vector, and feature extraction comprises determining a feature vector, and the method further comprises concatenating the voice activity vector and the feature vector to process and determine extended features.
  - 31. The method of claim 22, wherein feature segment extraction comprises determining a feature vector.
  - 32. The method of claim 31, wherein said determining comprises:
    - detecting speech activity and upon detecting speech activity, computing an average feature vector corresponding to frames dropped; and
      
      transmitting a total number of frames dropped prior to transmitting speech frames.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Garudadri, Harinanth

Granted Patent

US 7,203,643 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 25/78 Detection of presence or ab...

Method and apparatus for transmitting speech activity in distributed voice recognition systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for transmitting speech activity in distributed voice recognition systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links