METHOD AND APPARATUS FOR TRANSMITTING SPEECH ACTIVITY IN DISTRIBUTED VOICE RECOGNITION SYSTEMS

US 20070192094A1
Filed: 03/01/2007
Published: 08/16/2007
Est. Priority Date: 06/14/2001
Status: Active Grant

First Claim

Patent Images

1. A signal bearing medium embodying a set of machine-readable instructions executable by a data processor for operating a speech recognition system employed by a wireless subscriber station, comprising:

receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;

converting the acoustic speech signal to an electrical speech signal;

assembling detected voice activity information related to the electrical speech signal;

identifying feature extraction information related to the electrical speech signal;

selectively utilizing said detected voice activity information and said feature extraction information to form advanced front end data;

transmitting the detected voice activity information over a first wireless communication channel to a wireless base station, and transmitting the feature extraction information over a second wireless communication channel, separate from the first wireless communication channel, to the wireless base station.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method, apparatus, signal-bearing medium, and means for transmitting speech activity in a distributed voice recognition (VR) system. The distributed voice recognition system includes a local VR engine in a subscriber unit (102) and a server VR engine on a server (160). The local VR engine comprises a voice activity detection (VAD) module (106) that detects voice activity within a speech signal, and comprises an advanced feature extraction (AFE) module (104) that extracts features from a speech signal. The detected voice activity information is transmitted over a first wireless communication channel to the server (160). The feature extraction information is transmitted over a second wireless communication channel, separate from the first wireless communication channel, to the server (160). The server (160) processes the received information to determine a linguistic estimate of the electrical speech signal, and transmits the linguistic estimate to the subscriber unit (102).

Citations

34 Claims

1. A signal bearing medium embodying a set of machine-readable instructions executable by a data processor for operating a speech recognition system employed by a wireless subscriber station, comprising:
- receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;
  
  converting the acoustic speech signal to an electrical speech signal;
  
  assembling detected voice activity information related to the electrical speech signal;
  
  identifying feature extraction information related to the electrical speech signal;
  
  selectively utilizing said detected voice activity information and said feature extraction information to form advanced front end data;
  
  transmitting the detected voice activity information over a first wireless communication channel to a wireless base station, and transmitting the feature extraction information over a second wireless communication channel, separate from the first wireless communication channel, to the wireless base station.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The signal bearing medium of claim 1, wherein the identification of feature extraction information comprises extracting a set of features corresponding to segments of the electrical speech signal.
  - 3. The signal bearing medium of claim 1, wherein the wireless subscriber station further comprises a cellular radiotelephone.
  - 4. The signal bearing medium of claim 2, wherein the transmitting the feature extraction information further comprises:
    - removing segments of silence from the electrical speech signal, and transmitting silence-free speech segments of the electrical speech signal over the second wireless communication channel to the wireless base station.
  - 5. The signal bearing medium of claim 1, wherein the transmitting the feature extraction information further comprises:
    - transmitting a representation of the electrical speech signal, including silence, over the second wireless communication channel to the wireless base station; and
      
      wherein the transmitting the detected voice activity information further comprises;
      
      transmitting at least one indication where silence regions exist over the first communication channel to permit periods of speech to be separated from silence regions for use of the periods of speech.
  - 6. The signal bearing medium of claim 4, wherein the transmitting the detected voice activity information further comprises:
    - transmitting over the first communication channel at least one indication where the segments of silence exist in the electrical speech signal to permit the silence-free speech segments transmitted over second wireless communication channel to be separated responsive to the at least one indication for use of the silence-free speech segments.
  - 7. The signal bearing medium of claim 1, further comprising assembling detected voice activity information substantially in parallel to identifying the feature extraction information.
  - 8. The signal bearing medium of claim 7, wherein voice detection activity is quantized at a lower rate when identification of the feature extraction information indicates silence regions.
  - 9. The signal bearing medium of claim 7, wherein assembling the detected voice activity information comprises determining a voice activity vector, and identifying the feature extraction information comprises determining a feature extraction vector, and the method further comprises concatenating the voice activity vector and the feature extraction vector to process and determine the advanced front end data.
  - 10. The signal bearing medium of claim 1, wherein identification of feature extraction information comprises determining a feature extraction vector.
  - 11. The signal bearing medium of claim 10, wherein the determining comprises:
    - detecting speech activity and upon detecting speech activity, computing an average feature extraction vector corresponding to frames dropped; and
      
      transmitting a total number of frames dropped over one of the first and second wireless communication channel to the wireless base station prior to transmitting speech frames over the second wireless communication channel to the wireless base station.

12. A wireless subscriber station, comprising:
- means for receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station, and for converting the acoustic speech signal to an electrical speech signal;
  
  means for detecting voice activity information related to the electrical speech signal;
  
  means, operating substantially in parallel to the means for detecting voice activity, for identifying feature extraction information related to the electrical speech signal;
  
  means for selectively utilizing the detected voice activity information and the feature extraction information to form advanced front end data; and
  
  means for transmitting the detected voice activity information over a first wireless communication channel to a wireless base station, and means for transmitting the feature extraction information over a second wireless communication channel, separate from the first wireless communication channel, to the wireless base station.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The wireless subscriber station of claim 12, wherein the wireless subscriber station is a cellular radiotelephone.
  - 14. The wireless subscriber station of claim 12, wherein the means for transmitting the feature extraction information further comprises:
    - means for removing segments of silence from the electrical speech signal, and means for transmitting silence-free speech segments of the electrical speech signal over the second wireless communication channel to the wireless base station.
  - 15. The wireless subscriber station of claim 12, wherein:
    - the means for transmitting the feature extraction information further comprises;
      
      means for transmitting a representation of the electrical speech signal, including silence, over the second wireless communication channel to the wireless base station;
      
      the means for transmitting the detected voice activity information further comprises;
      
      means for transmitting at least one indication of where silence regions exist over the first communication channel to permit periods of speech to be separated from silence regions for use of the periods of speech.
  - 16. The wireless subscriber station of claim 14, wherein the means for transmitting the detected voice activity information further comprises:
    - means for transmitting over the first communication channel at least one indication where the segments of silence exist in the electrical speech signal to permit the silence-free speech segments transmitted over second wireless communication channel to be separated responsive to the at least one indication for use of the silence-free speech segments.
  - 17. The wireless subscriber station of claim 12, further comprising:
    - means for quantizing voice detection activity from the means for detecting voice activity at a lower rate in circumstances where identification of feature extraction information indicates silence regions.
  - 18. The wireless subscriber station of claim 12, wherein the means for detecting voice activity determines a voice activity vector, and the means for identifying feature extraction information determines a feature extraction vector.
  - 19. The wireless subscriber station of claim 18, further comprising:
    - means for concatenating the voice activity vector and the feature extraction vector to process and determine the advanced front end data.
  - 20. The wireless subscriber station of claim 12, wherein the means for identifying feature extraction information determines a feature extraction vector.
  - 21. The wireless subscriber station of claim 20, further comprising:
    - means for computing an average feature extraction vector corresponding to frames dropped upon detecting speech activity and transmits a total number of frames dropped over one of the first and second wireless communication channel to the wireless base station prior to transmitting speech frames over the second wireless communication channel to the wireless base station.

22. A method of operating a distributed speech recognition system employed by a wireless subscriber station, comprising:
- receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;
  
  converting the acoustic speech signal to electrical speech data;
  
  extracting voice activity data from the electrical speech data;
  
  identifying feature extraction data from the electrical speech data; and
  
  transmitting the detected voice activity information over a first wireless communication channel to a wireless base station, and transmitting the feature extraction information over a second wireless communication channel, separate from the first wireless communication channel, to the wireless base station.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 23. The method of claim 22, wherein identifying the feature extraction data comprises extracting a set of features corresponding to segments of the electrical speech data.
  - 24. The method of claim 22, wherein the wireless subscriber station further comprises a cellular radiotelephone.
  - 25. The method of claim 23, wherein transmitting the feature extraction information further comprises:
    - removing segments of silence from the electrical speech signal, and transmitting silence-free speech segments of the electrical speech signal over the second wireless communication channel to the wireless base station.
  - 26. The method of claim 22, wherein the transmitting the feature extraction information further comprises:
    - transmitting a representation of the electrical speech data, including silence, over the second wireless communication channel to the wireless base station; and
      
      wherein the transmitting the detected voice activity information further comprises;
      
      transmitting at least one indication where at least one silence region exists over the first communication channel to permit periods of speech to be separated from silence regions for use of the periods of speech.
  - 27. The method of claim 25, wherein the transmitting the detected voice activity information further comprises:
    - transmitting over the first communication channel at least one indication where the segments of silence exist in the electrical speech signal to permit the silence-free speech segments transmitted over second wireless communication channel to be separated responsive to the at least one indication for use of the silence-free speech segments.
  - 28. The method of claim 22, further comprising extracting voice activity data substantially in parallel to identifying the feature extraction data.
  - 29. The method of claim 28, wherein voice activity data is quantized to advanced front end data at a lower rate when identification of feature extraction data indicates silence regions.
  - 30. The method of claim 28, wherein voice activity detection comprises determining a voice activity vector, and identification the feature extraction data comprises determining a feature extraction vector, and the method further comprises concatenating the voice activity vector and the feature extraction vector to process and determine extended data.
  - 31. The method of claim 22, wherein identification of feature extraction data comprises determining a feature extraction vector.
  - 32. The method of claim 31, wherein the determining comprises:
    - detecting speech activity and upon detecting speech activity, computing an average feature extraction vector corresponding to frames dropped; and
      
      transmitting a total number of frames dropped over one of the first and second wireless communication channel to the wireless base station prior to transmitting speech frames over the second wireless communication channel to the wireless base station.

33. A method of operating a distributed speech recognition service, comprising:
- performing, by a wireless subscriber station, a first portion of the distributed speech recognition service, comprising;
  
  receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;
  
  converting the acoustic speech signal to an electrical speech signal;
  
  assembling detected voice activity information related to the electrical speech signal;
  
  identifying feature extraction information related to the electrical speech signal;
  
  selectively utilizing the detected voice activity information and the feature extraction information; and
  
  transmitting the detected voice activity information over a first wireless communication channel to a wireless base station, and transmitting the feature extraction information over a second wireless communication channel, separate from the first wireless communication channel, to the wireless base station; and
  
  performing, by a wireless base station, a second portion of the distributed speech recognition service, comprising;
  
  receiving the detected voice activity information over the first wireless communication channel and the feature extraction information over the second wireless communication channel;
  
  determining a linguistic estimate of the electrical speech signal responsive to receiving the detected voice activity information over the first wireless communication channel and the feature extraction information over the second wireless communication channel; and
  
  transmitting information over a third wireless communication channel from the wireless base station to the wireless subscriber station responsive to the linguistic estimate of the electrical speech signal for controlling the wireless subscriber station.

34. A method of operating a speech recognition service employed by a wireless based station, comprising:
- receiving from a wireless subscriber station advanced front end data, including detected voice activity information sent over a first wireless communication channel and feature extraction information send over a second wireless communication channel, separate from the first wireless communication channel, wherein the wireless subscriber station comprises;
  
  receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;
  
  converting the acoustic speech signal to an electrical speech signal;
  
  assembling the detected voice activity information related to the electrical speech signal;
  
  identifying feature extraction information related to the electrical speech signal; and
  
  selectively utilizing the detected voice activity information and the feature extraction information to form the advanced front end data;
  
  determining a linguistic estimate of the electrical speech signal responsive to receiving the advanced front end data; and
  
  transmitting information over a third wireless communication channel from the wireless base station to the wireless subscriber station responsive to the linguistic estimate of the electrical speech signal for controlling the wireless subscriber station.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Garudadri, Harinath

Granted Patent

US 8,050,911 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 25/78 Detection of presence or ab...

METHOD AND APPARATUS FOR TRANSMITTING SPEECH ACTIVITY IN DISTRIBUTED VOICE RECOGNITION SYSTEMS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND APPARATUS FOR TRANSMITTING SPEECH ACTIVITY IN DISTRIBUTED VOICE RECOGNITION SYSTEMS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links