Method and apparatus for transmitting speech activity in distributed voice recognition systems

US 7,203,643 B2
Filed: 05/28/2002
Issued: 04/10/2007
Est. Priority Date: 06/14/2001
Status: Active Grant

First Claim

Patent Images

1. A method of operating a speech recognition system employed by a wireless subscriber station, comprising:

receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;

converting the acoustic speech signal to an electrical speech signal;

assembling detected voice activity information related to the electrical speech signal;

identifying feature extraction information related to the electrical speech signal;

selectively utilizing said detected voice activity information and said feature extraction information to form advanced front end data;

transmitting the detected voice activity information over a first wireless communication channel to a wireless base station, andtransmitting the feature extraction information over a second wireless communication channel, separate from the first wireless communication channel, to the wireless base station.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for transmitting speech activity in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server. The local VR engine comprises an advanced feature extraction (AFE) module that extracts features from a speech signal, and a voice activity detection (VAD) module that detects voice activity within a speech signal. The combined results from the VAD module and feature extraction module are provided in an efficient manner to a remote device, such as a server, in the form of advanced front end features, thereby enabling the server to process speech segments free of silence regions. Various aspects of efficient speech segment transmission are disclosed.

100 Citations

View as Search Results

34 Claims

1. A method of operating a speech recognition system employed by a wireless subscriber station, comprising:
- receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;
  
  converting the acoustic speech signal to an electrical speech signal;
  
  assembling detected voice activity information related to the electrical speech signal;
  
  identifying feature extraction information related to the electrical speech signal;
  
  selectively utilizing said detected voice activity information and said feature extraction information to form advanced front end data;
  
  transmitting the detected voice activity information over a first wireless communication channel to a wireless base station, andtransmitting the feature extraction information over a second wireless communication channel, separate from the first wireless communication channel, to the wireless base station.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the identification of feature extraction information comprises extracting a set of features corresponding to segments of the electrical speech signal.
  - 3. The method of claim 2, wherein the transmitting the feature extraction information further comprises:
    - removing segments of silence from the electrical speech signal, andtransmitting silence-free speech segments of the electrical speech signal over the second wireless communication channel to the wireless base station.
  - 4. The method of claim 3, wherein the transmitting the detected voice activity information further comprises:
    - transmitting over the first communication channel at least one indication where the segments of silence exist in the electrical speech signal to permit the silence-free speech segments transmitted over second wireless communication channel to be separated responsive to the at least one indication for use of the silence-free speech segments.
  - 5. The method of claim 1, wherein the wireless subscriber station further comprises a cellular radiotelephone.
  - 6. The method of claim 1,wherein the transmitting the feature extraction information further comprises:
    - transmitting a representation of the electrical speech signal, including silence, over the second wireless communication channel to the wireless base station; and
      
      wherein the transmitting the detected voice activity information further comprises;
      
      transmitting at least one indication where silence regions exist over the first communication channel to permit periods of speech to be separated from silence regions for use of the periods of speech.
  - 7. The method of claim 1, further comprising assembling detected voice activity information substantially in parallel to identifying the feature extraction information.
  - 8. The method of claim 7, wherein voice detection activity is quantized at a lower rate when identification of the feature extraction information indicates silence regions.
  - 9. The method of claim 7, wherein assembling the detected voice activity information comprises determining a voice activity vector, and identifying the feature extraction information comprises determining a feature extraction vector, and the method further comprises concatenating the voice activity vector and the feature extraction vector to process and determine the advanced front end data.
  - 10. The method of claim 1, wherein identification of feature extraction information comprises determining a feature extraction vector.
  - 11. The method of claim 10, wherein the determining comprises:
    - detecting speech activity and upon detecting speech activity, computing an average feature extraction vector corresponding to frames dropped; and
      
      transmitting a total number of frames dropped over one of the first and second wireless communication channel to the wireless base station prior to transmitting speech frames over the second wireless communication channel to the wireless base station.

12. A wireless subscriber station, comprising:
- a microphone for receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station, and for converting the acoustic speech signal to an electrical speech signal;
  
  a voice activity detector for detecting voice activity information related to the electrical speech signal;
  
  a feature extractor, operating substantially in parallel to the voice activity detector, for identifying feature extraction information related to the electrical speech signal;
  
  a processor for selectively utilizing the detected voice activity information and the feature extraction information to form advanced front end data; and
  
  a transmitter for transmitting the detected voice activity information over a first wireless communication channel to a wireless base station, and transmitting the feature extraction information over a second wireless communication channel, separate from the first wireless communication channel, to the wireless base station.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The wireless subscriber station of claim 12, wherein the wireless subscriber station is a cellular radiotelephone.
  - 14. The wireless subscriber station of claim 12, wherein the transmitter transmitting the feature extraction information further comprises:
    - removing segments of silence from the electrical speech signal, andtransmitting silence-free speech segments of the electrical speech signal over the second wireless communication channel to the wireless base station.
  - 15. The wireless subscriber station of claim 14, wherein the transmitter transmitting the detected voice activity information further comprises:
    - transmitting over the first communication channel at least one indication where the segments of silence exist in the electrical speech signal to permit the silence-free speech segments transmitted over second wireless communication channel to be separated responsive to the at least one indication for use of the silence-free speech segments.
  - 16. The wireless subscriber station of claim 12, wherein:
    - the transmitter transmitting the feature extraction information further comprises;
      
      transmitting a representation of the electrical speech signal, including silence, over the second wireless communication channel to the wireless base station;
      
      the transmitter transmitting the detected voice activity information further comprises;
      
      transmitting at least one indication of where silence regions exist over the first communication channel to permit periods of speech to be separated from silence regions for use of the periods of speech.
  - 17. The wireless subscriber station of claim 12, wherein the wireless subscriber station quantizes voice detection activity from the voice activity detector at a lower rate in circumstances where identification of feature extraction information indicates silence regions.
  - 18. The wireless subscriber station of claim 12, wherein the voice activity detector determines a voice activity vector, and the feature extractor determines a feature extraction vector.
  - 19. The wireless subscriber station of claim 18, wherein the wireless subscriber station concatenates the voice activity vector and the feature extraction vector to process and determine the advanced front end data.
  - 20. The wireless subscriber station of claim 12, wherein the feature extractor determines a feature extraction vector.
  - 21. The wireless subscriber station of claim 20, wherein the wireless subscriber station computes an average feature extraction vector corresponding to frames dropped upon detecting speech activity and transmits a total number of frames dropped over one of the first and second wireless communication channel to the wireless base station prior to transmitting speech frames over the second wireless communication channel to the wireless base station.

22. A method of operating a distributed speech recognition system employed by a wireless subscriber station, comprising:
- receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;
  
  converting the acoustic speech signal to electrical speech data;
  
  extracting voice activity data from the electrical speech data;
  
  identifying feature extraction data from the electrical speech data; and
  
  transmitting the detected voice activity information over a first wireless communication channel to a wireless base station, andtransmitting the feature extraction information over a second wireless communication channel, separate from the first wireless communication channel, to the wireless base station.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 23. The method of claim 22, wherein identifying the feature extraction data comprises extracting a set of features corresponding to segments of the electrical speech data.
  - 24. The method of claim 23, wherein transmitting the feature extraction information further comprises:
    - removing segments of silence from the electrical speech signal, andtransmitting silence-free speech segments of the electrical speech signal over the second wireless communication channel to the wireless base station.
  - 25. The method of claim 24, wherein the transmitting the detected voice activity information further comprises:
    - transmitting over the first communication channel at least one indication where the segments of silence exist in the electrical speech signal to permit the silence-free speech segments transmitted over second wireless communication channel to be separated responsive to the at least one indication for use of the silence-free speech segments.
  - 26. The method of claim 22, wherein the wireless subscriber station further comprises a cellular radiotelephone.
  - 27. The method of claim 22,wherein the transmitting the feature extraction information further comprises:
    - transmitting a representation of the electrical speech data, including silence, over the second wireless communication channel to the wireless base station; and
      
      wherein the transmitting the detected voice activity information further comprises;
      
      transmitting at least one indication where at least one silence region exists over the first communication channel to permit periods of speech to be separated from silence regions for use of the periods of speech.
  - 28. The method of claim 22, further comprising extracting voice activity data substantially in parallel to identifying the feature extraction data.
  - 29. The method of claim 28, wherein voice activity data is quantized to advanced front end data at a lower rate when identification of feature extraction data indicates silence regions.
  - 30. The method of claim 28, wherein voice activity detection comprises determining a voice activity vector, and identification the feature extraction data comprises determining a feature extraction vector, and the method further comprises concatenating the voice activity vector and the feature extraction vector to process and determine extended data.
  - 31. The method of claim 22, wherein identification of feature extraction data comprises determining a feature extraction vector.
  - 32. The method of claim 31, wherein the determining comprises:
    - detecting speech activity and upon detecting speech activity, computing an average feature extraction vector corresponding to frames dropped; and
      
      transmitting a total number of frames dropped over one of the first and second wireless communication channel to the wireless base station prior to transmitting speech frames over the second wireless communication channel to the wireless base station.

33. A method of operating a distributed speech recognition service, comprising:
- performing, by a wireless subscriber station, a first portion of the distributed speech recognition service, comprising;
  
  receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;
  
  converting the acoustic speech signal to an electrical speech signal;
  
  assembling detected voice activity information related to the electrical speech signal;
  
  identifying feature extraction information related to the electrical speech signal;
  
  selectively utilizing the detected voice activity information and the feature extraction information; and
  
  transmitting the detected voice activity information over a first wireless communication channel to a wireless base station, andtransmitting the feature extraction information over a second wireless communication channel, separate from the first wireless communication channel, to the wireless base station; and
  
  performing, by a wireless base station, a second portion of the distributed speech recognition service, comprising;
  
  receiving the detected voice activity information over the first wireless communication channel and the feature extraction information over the second wireless communication channel;
  
  determining a linguistic estimate of the electrical speech signal responsive to the detected voice activity information over the first wireless communication channel and the feature extraction information over the second wireless communication channel; and
  
  transmitting information over a third wireless communication channel from the wireless base station to the wireless subscriber station responsive to the linguistic estimate of the electrical speech signal for controlling the wireless subscriber station.

34. A method of operating a speech recognition service employed by a wireless based station, comprising:
- receiving from a wireless subscriber station advanced front end data, including detected voice activity information sent over a first wireless communication channel and feature extraction information send over a second wireless communication channel, separate from the first wireless communication channel, wherein the wireless subscriber station comprises;
  
  receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;
  
  converting the acoustic speech signal to an electrical speech signal;
  
  assembling the detected voice activity information related to the electrical speech signal;
  
  identifying feature extraction information related to the electrical speech signal;
  
  selectively utilizing the detected voice activity information and the feature extraction information to form the advanced front end data; and
  
  determining a linguistic estimate of the electrical speech signal responsive to receiving the advanced front end data; and
  
  transmitting information over a third wireless communication channel from the wireless base station to the wireless subscriber station responsive to the linguistic estimate of the electrical speech signal for controlling the wireless subscriber station.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Garudadri, Harinath
Primary Examiner(s)
Smits; Talivaldis Ivars
Assistant Examiner(s)
Ng; Eunice

Application Number

US10/157,629
Publication Number

US 20030061042A1
Time in Patent Office

1,778 Days
Field of Search

704/233, 704/270.1
US Class Current

704/233
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 25/78 Detection of presence or ab...

Method and apparatus for transmitting speech activity in distributed voice recognition systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

100 Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for transmitting speech activity in distributed voice recognition systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

100 Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links