Client-server speech processing system, apparatus, method, and storage medium
First Claim
Patent Images
1. A speech processing system in which speech information is input at a client side, and speech recognition is performed at a server side,said client comprising:
- encoding means for compression-encoding the speech information; and
transmission means for transmitting the compressed-encoded speech information, andsaid server comprising;
reception means for receiving the compression-encoded speech information;
first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received by said reception means;
selection means for selecting states of acoustic models using the first likelihood;
decoding means for decoding the compression-encoded speech information received by said reception means;
second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech information;
speech recognition means for accomplishing speech recognition using the second likelihood obtained by said second computation means;
execution means for executing a predetermined application program based on a recognition result of said speech recognition means; and
transmission means for transmitting, to said client, a result obtained from the predetermined application program executed by said execution means.
0 Assignments
0 Petitions
Accused Products
Abstract
The system implements high-accuracy speech recognition while suppressing the amount of data transfer between the client and server. For this purpose, the client compression-encodes speech parameters by a speech processing unit, and sends the compression-encoded speech parameters to the server. The server receives the compression-encoded speech parameters, a speech processing unit makes speech recognition of the compression-encoded speech parameters, and sends information corresponding to the speech recognition result to the client.
42 Citations
49 Claims
-
1. A speech processing system in which speech information is input at a client side, and speech recognition is performed at a server side,
said client comprising: -
encoding means for compression-encoding the speech information; and transmission means for transmitting the compressed-encoded speech information, and said server comprising; reception means for receiving the compression-encoded speech information; first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received by said reception means; selection means for selecting states of acoustic models using the first likelihood; decoding means for decoding the compression-encoded speech information received by said reception means; second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech information; speech recognition means for accomplishing speech recognition using the second likelihood obtained by said second computation means; execution means for executing a predetermined application program based on a recognition result of said speech recognition means; and transmission means for transmitting, to said client, a result obtained from the predetermined application program executed by said execution means. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A speech processing apparatus comprising:
-
reception means for receiving compression-encoded speech information from a client via a network; first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received by said reception means; selection means for selecting states of acoustic models using the first likelihood; decoding means for decoding the compression-encoded speech information received by said reception means; second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech information; speech recognition means for accomplishing speech recognition using the second likelihood obtained by said second computation means; execution means for executing a predetermined application program based on a recognition result of said speech recognition means; and transmission means for transmitting, to said client, a result obtained from the predetermined application program executed by said execution means. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A speech processing method in which speech information is input at a client side, and speech recognition is performed at a server side,
said method comprising: -
an encoding step of compression-encoding the speech information; a transmission step of transmitting the compressed-encoded speech information; a reception step of receiving the compression-encoded speech information; a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received in said reception step; a selection step of selecting states of acoustic models using the first likelihood; a decoding step of decoding the compression-encoded speech information received in said reception step; a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech information; a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step; an execution step of executing a predetermined application program based on a recognition result of said speech recognition step; a transmission step of transmitting, to said client, a result obtained from the predetermined application program executed in said execution step. - View Dependent Claims (23, 24, 25, 26)
-
-
27. A speech processing method comprising:
-
a reception step of receiving compression-encoded speech information from a client via a network; a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received in said reception step; a selection step of selecting states of acoustic models using the first likelihood; a decoding step of decoding the compression-encoded speech information received in said reception step; a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech information; a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step; an execution step of executing a predetermined application program based on a recognition result of said speech recognition step; and a transmission step of transmitting, to said client, a result obtained from the predetermined application program executed in said execution step. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A speech processing program in which speech information is inputted at a client side, and speech recognition is performed at a server side,
said program comprising: -
an encoding step of compression-encoding the speech information; a transmission step of transmitting the compression-encoded speech information, a reception step of receiving the compression-encoded speech information; a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received in said reception step; a selection step of selecting states of acoustic models using the first likelihood; a decoding step of decoding the compression-encoded speech information received in said reception step; a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech information; a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step; an execution step of executing a predetermined application program based on a recognition result of said speech recognition step; and a transmission step of transmitting, to said client, a result obtained from the predetermined application program executed in said execution step. - View Dependent Claims (39, 40)
-
-
41. A speech processing program comprising:
-
a reception step of receiving compression-encoded speech parameters from a client via a network; a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step; a selection step of selecting states of acoustic models using the first likelihood; a decoding step of decoding the compression-encoded speech parameters received in said reception step; a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters; a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step; an execution step of executing a predetermined application program based on a recognition result of said speech recognition step; and a transmission step of transmitting, to said client, a result obtained from the predetermined application program executed in said execution step. - View Dependent Claims (42, 43)
-
-
44. A speech processing system in which speech information is input at a client side, and speech recognition is performed at a server side,
said client comprising: -
an acoustic analysis unit adapted to generate speech parameters by acoustically analyzing speech information; an encoding unit adapted to compression-encode the speech parameters; and a transmission unit adapted to transmit the compression-encoded speech parameters, and said server comprising; a reception unit adapted to receive the compression-encoded speech parameters; a first computation unit adapted to compute output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception unit; a selection unit adapted to select states of acoustic models using the first likelihood; a decoding unit adapted to decode the compression-encoded speech parameters received by said reception unit; a second computation unit adapted to compute output probabilities of states of acoustic models selected by said selection unit, as second likelihood, using the decoded speech parameters; a speech recognition unit adapted to accomplish speech recognition using the second likelihood obtained by said second computation unit; an execution unit adapted to execute a predetermined application program based on a recognition result of said speech recognition unit; and a transmission unit adapted to transmit, to said client, a result obtained from the predetermined application program executed by said execution unit. - View Dependent Claims (45, 46)
-
-
47. A speech processing apparatus comprising:
-
a reception unit adapted to receive compression-encoded speech parameters from a client via a network; a first computation unit adapted to compute output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception unit; a selection unit adapted to select states of acoustic models using the first likelihood; a decoding unit adapted to decode the compression-encoded speech parameters received by said reception unit; a second computation unit adapted to compute output probabilities of states of acoustic models selected by said selection unit, as second likelihood, using the decoded speech parameters; a speech recognition unit adapted to accomplish speech recognition using the second likelihood obtained by said second computation unit; an execution unit adapted to execute a predetermined application program based on a recognition result of said speech recognition unit; and a transmission unit adapted to transmit, to said client, a result obtained from the predetermined application program executed by said execution unit. - View Dependent Claims (48, 49)
-
Specification