Method and apparatus for transmitting speech activity in distributed voice recognition systems
First Claim
1. A speech recognition system employed by a wireless subscriber station, comprising a computer-readable storage comprising machine readable instructions that when executed perform a method comprising:
- receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station;
converting the acoustic speech signal to an electrical speech signal comprising frames;
determining which frames comprise speech and which frames comprise non-speech to assemble voice activity information;
identifying feature extraction information related to the electrical speech signal; and
transmitting the voice activity information and the feature extraction information corresponding to different frames of the speech signal to a wireless base station, wherein the voice activity information is delayed from the speech signal, the feature extraction information is delayed from the speech signal, and the delay of the voice activity information is less than the delay of the feature extraction information.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method, apparatus, signal-bearing medium, and means for transmitting speech activity in a distributed voice recognition (VR) system. The distributed voice recognition system includes a local VR engine in a subscriber unit (102) and a server VR engine on a server (160). The local VR engine comprises a voice activity detection (VAD) module (106) that detects voice activity within a speech signal, and comprises an advanced feature extraction (AFE) module (104) that extracts features from a speech signal. The detected voice activity information is transmitted over a first wireless communication channel to the server (160). The feature extraction information is transmitted over a second wireless communication channel, separate from the first wireless communication channel, to the server (160). The server (160) processes the received information to determine a linguistic estimate of the electrical speech signal, and transmits the linguistic estimate to the subscriber unit (102).
-
Citations
35 Claims
-
1. A speech recognition system employed by a wireless subscriber station, comprising a computer-readable storage comprising machine readable instructions that when executed perform a method comprising:
-
receiving an acoustic speech signal, including periods of speech and non-speech, from a user of the wireless subscriber station; converting the acoustic speech signal to an electrical speech signal comprising frames; determining which frames comprise speech and which frames comprise non-speech to assemble voice activity information; identifying feature extraction information related to the electrical speech signal; and transmitting the voice activity information and the feature extraction information corresponding to different frames of the speech signal to a wireless base station, wherein the voice activity information is delayed from the speech signal, the feature extraction information is delayed from the speech signal, and the delay of the voice activity information is less than the delay of the feature extraction information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of performing speech recognition, comprising:
-
receiving an acoustic speech signal comprising speech and non-speech from a user of a wireless subscriber station; converting the acoustic speech signal to an electrical speech signal comprising frames; determining which frames comprise speech and which frames comprise non-speech to assemble voice activity information; identifying feature extraction information related to the electrical speech signal; and transmitting the voice activity information and the feature extraction information corresponding to different frames of the speech signal to a wireless base station, wherein the voice activity information is delayed from the speech signal, the feature extraction information is delayed from the speech signal, and the delay of the voice activity information is less than the delay of the feature extraction information. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A wireless subscriber unit with voice recognition features, comprising:
-
a microphone configured to receive an acoustic speech signal from a user, wherein the acoustic signal comprises speech and non-speech; an analog to digital converter configured to convert the acoustic speech signal to an electrical speech signal comprising frames; a voice activity detector configured to determine which frames comprise speech and which frames comprise non-speech to assemble voice activity information; a feature extraction element configured to identify feature extraction information related to the electrical speech signal; a transmitter configured to transmit the voice activity information and the feature extraction information corresponding to different frames of the speech signal to a wireless base station for voice recognition processing, wherein the voice activity information is delayed from the speech signal, the feature extraction information is delayed from the speech signal, and the delay of the voice activity information is less than the delay of the feature extraction information; and a receiver configured to receive a voice recognition processing information from said wireless base station. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A system for performing speech recognition, comprising:
-
means for receiving an acoustic speech signal comprising speech and non-speech from a user of a wireless subscriber station; means for converting the acoustic speech signal to an electrical speech signal comprising frames; means for determining which frames comprise speech and which frames comprise non-speech to assemble voice activity information; means for identifying feature extraction information related to the electrical speech signal; and means for transmitting the voice activity information and the feature extraction information corresponding to different frames of the speech signal to a wireless base station, wherein the voice activity information is delayed from the speech signal, the feature extraction information is delayed from the speech signal, and the delay of the voice activity information is less than the delay of the feature extraction information. - View Dependent Claims (31, 32, 33, 34, 35)
-
Specification