Client-server speech processing system, apparatus, method, and storage medium
First Claim
Patent Images
1. A speech processing system in which speech information is input at a client side, and speech recognition is done at a serve side,said client comprising:
- acoustic analysis means for generating speech parameters by acoustically analyzing speech information;
encoding means for compression-encoding the speech parameters; and
transmission means for transmitting the compression-encoded speech parameters, and said server comprising;
reception means for receiving the compression-encoded speech parameters;
first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception means;
selection means for selecting states of acoustic models using only the first likelihood;
decoding means for decoding the compression-encoded speech parameters received by said reception means;
second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech parameters; and
speech recognition means for making speech recognition using the second likelihood obtained by said second computation means.
1 Assignment
0 Petitions
Accused Products
Abstract
The system implements high-accuracy speech recognition while suppressing the amount of data transfer between the client and server. For this purpose, the client compression-encodes speech parameters by a speech processing unit, and sends the compression-encoded speech parameters to the server. The server receives the compression-encoded speech parameters, and speech processing unit makes speech recognition of the compression-encoded speech parameters, and sends information corresponding to the speech recognition result to the client.
53 Citations
47 Claims
-
1. A speech processing system in which speech information is input at a client side, and speech recognition is done at a serve side,
said client comprising: -
acoustic analysis means for generating speech parameters by acoustically analyzing speech information;
encoding means for compression-encoding the speech parameters; and
transmission means for transmitting the compression-encoded speech parameters, and said server comprising;
reception means for receiving the compression-encoded speech parameters;
first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception means;
selection means for selecting states of acoustic models using only the first likelihood;
decoding means for decoding the compression-encoded speech parameters received by said reception means;
second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech parameters; and
speech recognition means for making speech recognition using the second likelihood obtained by said second computation means. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A speech processing apparatus comprising:
-
reception means for receiving compression-encoded speech parameters from a client via a network;
first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception means;
selection means for selecting states of acoustic models using only the first likelihood;
decoding means for decoding the compression-encoded speech parameters received by said reception means;
second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech parameters; and
speech recognition means for making speech recognition using the second likelihood obtained by said second computation means. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A speech processing method in which speech information is input at a client side, and speech recognition is done at a server side,
comprising at the client side: -
an acoustic analysis step of generating speech parameters by acoustically analyzing speech information;
an encoding step of compression-encoding the speech parameters; and
a transmission step of transmitting the compression-encoded speech parameters, and comprising at the server side;
a reception step of receiving the compression-encoded speech parameters;
a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step;
a selection step of selecting states of acoustic models using only the first likelihood;
a decoding step of decoding the compression-encoded speech parameters received in said reception step;
a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters; and
a speech recognition step of making speech recognition using the second likelihood obtained in said second computation step. - View Dependent Claims (22, 23, 24)
-
-
25. A speech processing method comprising:
-
a reception step of receiving compression-encoded speech parameters from a client via a network;
a first computation step for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step;
a selection step of selecting states of acoustic models using only the first likelihood;
a decoding step of decoding the compression-encoded speech parameters received in said reception step;
a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters; and
a speech recognition step of making speech recognition using the second likelihood obtained in said second computation step. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A speech processing program in which speech information is input at a client side, and speech recognition is done at a server side, said program implementing,
at the client side: -
an acoustic analysis step of generating speech parameters by acoustically analyzing speech information;
an encoding step of compression-encoding the speech parameters; and
a transmission step of transmitting the compression-encoded speech parameters, and at the server side;
a reception step of receiving compression-encoded speech parameters;
a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step;
a selection step of selecting states of acoustic models using only the first likelihood;
a decoding step of decoding the compression-encoded speech parameters received in said reception step;
a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters; and
a speech recognition step of making speech recognition using the second likelihood obtained in said second computation step. - View Dependent Claims (37, 38)
-
-
39. A speech processing program implementing:
-
a reception step of receiving compression-encoded speech parameters from a client via a network;
a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step;
a selection step of selecting states of acoustic models using only the first likelihood;
a decoding step of decoding the compression-encoded speech parameters received in said reception step;
a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters; and
a speech recognition step of making speech recognition using the second likelihood obtained in said second computation step. - View Dependent Claims (40, 41)
-
-
42. A speech processing system in which speech information is input at a client side, and speech recognition is done at a server side,
said client comprising: -
an acoustic analysis unit adapted to generate speech parameters by acoustically analyzing speech information;
an encoding unit adapted to compression-encode the speech parameters; and
a transmission unit adapted to transmit the compression-encoded speech parameters, and said server comprising;
a reception unit adapted to receive the compression-encode speech parameters;
a first computation unit adapted to compute output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception unit;
a selection unit adapted to select states of acoustic models using only the first likelihood;
a decoding unit adapted to decode the compression-encode speech parameters received by said reception unit;
a second computation unit adapted to compute output probabilities of states of acoustic models selected by said selection unit, as second likelihood, using the decoded speech parameters; and
a speech recognition unit adapted to accomplish speech recognition using the second likelihood obtained by said second computation unit. - View Dependent Claims (43, 44)
-
-
45. A speech processing apparatus comprising:
-
a reception unit adapted to receive compression-encoded speech parameters from a client via a network;
a first computation unit adapted to compute output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception unit;
a selection unit adapted to select states of acoustic models using only the first likelihood;
a decoding unit adapted to decode the compression-encoded speech parameters received by said reception unit;
a second computation unit adapted to compute output probabilities of states of acoustic models selected by said selection unit, as second likelihood, using the decoded speech parameters; and
a speech recognition unit adapted to accomplish speech recognition using the second likelihood obtained by said computation unit. - View Dependent Claims (46, 47)
-
Specification