Method and apparatus for transmitting speech activity in distributed voice recognition systems

US 8,050,911 B2
Filed: 03/01/2007
Issued: 11/01/2011
Est. Priority Date: 06/14/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system employed by a wireless subscriber station, comprising a computer-readable storage comprising machine readable instructions that when executed perform a method comprising:

receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;

converting the acoustic speech signal to an electrical speech signal comprising frames;

determining which frames comprise speech and which frames comprise non-speech to assemble voice activity information;

identifying feature extraction information related to the electrical speech signal; and

transmitting the voice activity information and the feature extraction information corresponding to different frames of the speech signal to a wireless base station, wherein the voice activity information is delayed from the speech signal, the feature extraction information is delayed from the speech signal, and the delay of the voice activity information is less than the delay of the feature extraction information.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method, apparatus, signal-bearing medium, and means for transmitting speech activity in a distributed voice recognition (VR) system. The distributed voice recognition system includes a local VR engine in a subscriber unit (102) and a server VR engine on a server (160). The local VR engine comprises a voice activity detection (VAD) module (106) that detects voice activity within a speech signal, and comprises an advanced feature extraction (AFE) module (104) that extracts features from a speech signal. The detected voice activity information is transmitted over a first wireless communication channel to the server (160). The feature extraction information is transmitted over a second wireless communication channel, separate from the first wireless communication channel, to the server (160). The server (160) processes the received information to determine a linguistic estimate of the electrical speech signal, and transmits the linguistic estimate to the subscriber unit (102).

Citations

35 Claims

1. A speech recognition system employed by a wireless subscriber station, comprising a computer-readable storage comprising machine readable instructions that when executed perform a method comprising:
- receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;
  
  converting the acoustic speech signal to an electrical speech signal comprising frames;
  
  determining which frames comprise speech and which frames comprise non-speech to assemble voice activity information;
  
  identifying feature extraction information related to the electrical speech signal; and
  
  transmitting the voice activity information and the feature extraction information corresponding to different frames of the speech signal to a wireless base station, wherein the voice activity information is delayed from the speech signal, the feature extraction information is delayed from the speech signal, and the delay of the voice activity information is less than the delay of the feature extraction information.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein the identification of feature extraction information comprises extracting a set of features corresponding to frames of the electrical speech signal.
  - 3. The system of claim 1, wherein the wireless subscriber station further comprises a cellular radiotelephone.
  - 4. The system of claim 2, wherein the transmitting the feature extraction information further comprises:
    - removing frames of silence from the electrical speech signal, andtransmitting silence-free speech frames of the electrical speech signal over a second wireless communication channel to the wireless base station.
  - 5. The system of claim 1,wherein the transmitting the feature extraction information further comprises:
    - transmitting a representation of the electrical speech signal, including silence, over a second wireless communication channel to the wireless base station; and
      
      wherein the transmitting the detected voice activity information further comprises;
      
      transmitting at least one indication where silence regions exist over a first communication channel to permit periods of speech to be separated from silence regions for use of the periods of speech.
  - 6. The system of claim 4, wherein the transmitting the detected voice activity information further comprises:
    - transmitting over a first wireless communication channel at least one indication where the frames of silence exist in the electrical speech signal to permit the silence-free speech frames transmitted over the second wireless communication channel to be separated responsive to the at least one indication for use of the silence-free speech frames.
  - 7. The system of claim 1, further comprising assembling voice activity information substantially in parallel to identifying the feature extraction information.
  - 8. The system of claim 7, wherein voice detection activity is quantized at a lower rate when identification of the feature extraction information indicates silence frames.
  - 9. The system of claim 7, wherein assembling the detected voice activity information comprises determining a voice activity vector, and identifying the feature extraction information comprises determining a feature extraction vector, and the method further comprises concatenating the voice activity vector and the feature extraction vector to process and determine advanced front end data.
  - 10. The system of claim 1, wherein identification of feature extraction information comprises determining a feature extraction vector.

11. A method of performing speech recognition, comprising:
- receiving an acoustic speech signal comprising speech and non-speech from a user of a wireless subscriber station;
  
  converting the acoustic speech signal to an electrical speech signal comprising frames;
  
  determining which frames comprise speech and which frames comprise non-speech to assemble voice activity information;
  
  identifying feature extraction information related to the electrical speech signal; and
  
  transmitting the voice activity information and the feature extraction information corresponding to different frames of the speech signal to a wireless base station, wherein the voice activity information is delayed from the speech signal, the feature extraction information is delayed from the speech signal, and the delay of the voice activity information is less than the delay of the feature extraction information.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The method of claim 11, wherein said wireless subscriber station is a cellular telephone.
  - 13. The method of claim 11, wherein transmitting the feature extraction information comprises removing frames of non-speech from the electrical speech signal.
  - 14. The method of claim 11, wherein transmitting the detected voice activity information comprises transmitting at least one indication where frames of non-speech exist.
  - 15. The method of claim 11, comprising quantizing frames of non-speech at a lower rate than frames comprising speech.
  - 16. The method of claim 11, wherein the voice activity information comprises voice activity vectors.
  - 17. The method of claim 11, wherein the feature extraction information comprises feature extraction vectors.
  - 18. The method of claim 11, wherein front end data is provided by concatenating the voice activity information and the feature extraction information.

19. A wireless subscriber unit with voice recognition features, comprising:
- a microphone configured to receive an acoustic speech signal from a user, wherein the acoustic signal comprises speech and non-speech;
  
  an analog to digital converter configured to convert the acoustic speech signal to an electrical speech signal comprising frames;
  
  a voice activity detector configured to determine which frames comprise speech and which frames comprise non-speech to assemble voice activity information;
  
  a feature extraction element configured to identify feature extraction information related to the electrical speech signal;
  
  a transmitter configured to transmit the voice activity information and the feature extraction information corresponding to different frames of the speech signal to a wireless base station for voice recognition processing, wherein the voice activity information is delayed from the speech signal, the feature extraction information is delayed from the speech signal, and the delay of the voice activity information is less than the delay of the feature extraction information; and
  
  a receiver configured to receive a voice recognition processing information from said wireless base station.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 20. The system of claim 19, wherein said wireless subscriber station is a cellular telephone.
  - 21. The system of claim 19, wherein said feature extraction element is configured to removing frames of non-speech from the electrical speech signal.
  - 22. The system of claim 19, wherein the voice activity information comprises at least one indication where frames of non-speech exist.
  - 23. The system of claim 19, wherein the feature extraction element is configured to quantize frames of non-speech at a lower rate than frames comprising speech.
  - 24. The system of claim 19, wherein the voice activity information comprises voice activity vectors.
  - 25. The system of claim 19, wherein the feature extraction information comprises feature extraction vectors.
  - 26. The system of claim 19, wherein front end data is provided by concatenating the voice activity information and the feature extraction information.
  - 27. The system of claim 19, wherein said wireless base station comprises a word decoder configured to perform speech recognition using said voice activity information and said feature extraction information.
  - 28. The system of claim 19, wherein said voice recognition processing information comprises a command signal.
  - 29. The system of claim 19, wherein said voice recognition processing information comprises estimated words.

30. A system for performing speech recognition, comprising:
- means for receiving an acoustic speech signal comprising speech and non-speech from a user of a wireless subscriber station;
  
  means for converting the acoustic speech signal to an electrical speech signal comprising frames;
  
  means for determining which frames comprise speech and which frames comprise non-speech to assemble voice activity information;
  
  means for identifying feature extraction information related to the electrical speech signal; and
  
  means for transmitting the voice activity information and the feature extraction information corresponding to different frames of the speech signal to a wireless base station, wherein the voice activity information is delayed from the speech signal, the feature extraction information is delayed from the speech signal, and the delay of the voice activity information is less than the delay of the feature extraction information.
- View Dependent Claims (31, 32, 33, 34, 35)
- - 31. The system of claim 30, wherein said means for receiving comprises a microphone.
  - 32. The system of claim 30, wherein said means for converting comprises an analog to digital converter.
  - 33. The system of claim 30, wherein said means for determining comprises a voice activity detector.
  - 34. The system of claim 30, wherein said means for identifying comprises a feature extraction element.
  - 35. The system of claim 30, wherein said means for transmitting comprises an antenna.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Garudadri, Harinath
Primary Examiner(s)
Wozniak; James S.
Assistant Examiner(s)
Baker; Matthew

Application Number

US11/680,740
Publication Number

US 20070192094A1
Time in Patent Office

1,706 Days
Field of Search

704214-215, 704/233
US Class Current

704/208
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 25/78 Detection of presence or ab...

Method and apparatus for transmitting speech activity in distributed voice recognition systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for transmitting speech activity in distributed voice recognition systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links