Client-server speech processing system, apparatus, method, and storage medium

US 7,058,580 B2
Filed: 10/04/2004
Issued: 06/06/2006
Est. Priority Date: 05/24/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A speech processing system in which speech information is input at a client side, and speech recognition is performed at a server side,said client comprising:

encoding means for compression-encoding the speech information; and

transmission means for transmitting the compressed-encoded speech information, andsaid server comprising;

reception means for receiving the compression-encoded speech information;

first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received by said reception means;

selection means for selecting states of acoustic models using the first likelihood;

decoding means for decoding the compression-encoded speech information received by said reception means;

second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech information;

speech recognition means for accomplishing speech recognition using the second likelihood obtained by said second computation means;

execution means for executing a predetermined application program based on a recognition result of said speech recognition means; and

transmission means for transmitting, to said client, a result obtained from the predetermined application program executed by said execution means.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The system implements high-accuracy speech recognition while suppressing the amount of data transfer between the client and server. For this purpose, the client compression-encodes speech parameters by a speech processing unit, and sends the compression-encoded speech parameters to the server. The server receives the compression-encoded speech parameters, a speech processing unit makes speech recognition of the compression-encoded speech parameters, and sends information corresponding to the speech recognition result to the client.

42 Citations

View as Search Results

49 Claims

1. A speech processing system in which speech information is input at a client side, and speech recognition is performed at a server side,said client comprising:
- encoding means for compression-encoding the speech information; and
  
  transmission means for transmitting the compressed-encoded speech information, andsaid server comprising;
  
  reception means for receiving the compression-encoded speech information;
  
  first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received by said reception means;
  
  selection means for selecting states of acoustic models using the first likelihood;
  
  decoding means for decoding the compression-encoded speech information received by said reception means;
  
  second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech information;
  
  speech recognition means for accomplishing speech recognition using the second likelihood obtained by said second computation means;
  
  execution means for executing a predetermined application program based on a recognition result of said speech recognition means; and
  
  transmission means for transmitting, to said client, a result obtained from the predetermined application program executed by said execution means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system according to claim 1, wherein said selection means selects states of acoustic models having output probabilities larger than a predetermined value.
  - 3. The system according to claim 1, wherein said selection means selects states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation means.
  - 4. The system according to claim 1, wherein said server further comprises transmission means for transmitting a recognition result of said speech recognition means to said client.
  - 5. The system according to claim 1, wherein said client further comprises acoustic analysis means for generating speech parameters by acoustically analyzing the speech information,wherein said encoding means compression-encodes the speech parameters, and said transmission means transmits the compression-encoded speech parameters.
  - 6. The system according to claim 5, wherein said client further comprises reception means for receiving a speech recognition result of said server using the speech parameters.
  - 7. The system according to claim 5, wherein said encoding means scalar-quantizes the speech parameters.
  - 8. The system according to claim 5, wherein the speech parameters include parameters indicating static and dynamic features.
  - 9. The system according to claim 5, wherein the speech parameters include parameters indicating static features.
  - 10. The system according to claim 9, wherein said reception means receives the speech parameters, said decoding means decodes the speech parameters, and said server further comprises feature parameter generation means for generating parameters indicating dynamic features using the speech parameters decoded by said decoding means.
  - 11. The system according to claim 10, wherein said server further comprises feature parameter encoding means for compression-encoding the parameters generated by said feature parameter generation means using an encoding method used to compression-encode the speech parameters received by said reception means.

12. A speech processing apparatus comprising:
- reception means for receiving compression-encoded speech information from a client via a network;
  
  first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received by said reception means;
  
  selection means for selecting states of acoustic models using the first likelihood;
  
  decoding means for decoding the compression-encoded speech information received by said reception means;
  
  second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech information;
  
  speech recognition means for accomplishing speech recognition using the second likelihood obtained by said second computation means;
  
  execution means for executing a predetermined application program based on a recognition result of said speech recognition means; and
  
  transmission means for transmitting, to said client, a result obtained from the predetermined application program executed by said execution means.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The apparatus according to claim 12, wherein said selection means selects states of acoustic models having output probabilities larger than a predetermined value.
  - 14. The apparatus according to claim 12, wherein said selection means selects states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation means.
  - 15. The apparatus according to claim 12, wherein said reception means receives compression-encoded speech parameters from a client connected to a network.
  - 16. The apparatus according to claim 15, further comprising transmission means for transmitting a recognition result of said speech recognition means to the client.
  - 17. The apparatus according to claim 12, wherein said reception means receives scalar-quantized speech parameters.
  - 18. The apparatus according to claim 17, wherein the speech parameters include parameters indicating static and dynamic features.
  - 19. The apparatus according to claim 17, wherein the speech parameters include parameters indicating static features.
  - 20. The apparatus according to claim 19, wherein said reception means receives the speech parameters, said decoding means decodes the speech parameters, and said apparatus further comprises feature parameter generation means for generating parameters indicating dynamic features using the speech parameters decoded by said decoding means.
  - 21. The apparatus according to claim 20, further comprising feature parameter encoding means for compression-encoding the parameters generated by said feature parameter generation means using an encoding method used to compression-encode the speech parameters received by said reception means.

22. A speech processing method in which speech information is input at a client side, and speech recognition is performed at a server side,said method comprising:
- an encoding step of compression-encoding the speech information;
  
  a transmission step of transmitting the compressed-encoded speech information;
  
  a reception step of receiving the compression-encoded speech information;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received in said reception step;
  
  a selection step of selecting states of acoustic models using the first likelihood;
  
  a decoding step of decoding the compression-encoded speech information received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech information;
  
  a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step;
  
  an execution step of executing a predetermined application program based on a recognition result of said speech recognition step;
  
  a transmission step of transmitting, to said client, a result obtained from the predetermined application program executed in said execution step.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The method according to claim 22, wherein said method further comprises an acoustic analysis step of generating speech parameters by acoustically analyzing the speech information,wherein the speech parameters are compression-encoded in said encoding step, and the compression-encoded speech parameters are transmitted in said transmission step.
  - 24. The method according to claim 22, wherein, in said selection step, states of acoustic models having output probabilities larger than a predetermined value are selected.
  - 25. The method according to claim 22, wherein, in said selection step, states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.
  - 26. A storage medium that stores a control program for making a computer implement the method recited in claim 22.

27. A speech processing method comprising:
- a reception step of receiving compression-encoded speech information from a client via a network;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received in said reception step;
  
  a selection step of selecting states of acoustic models using the first likelihood;
  
  a decoding step of decoding the compression-encoded speech information received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech information;
  
  a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step;
  
  an execution step of executing a predetermined application program based on a recognition result of said speech recognition step; and
  
  a transmission step of transmitting, to said client, a result obtained from the predetermined application program executed in said execution step.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 28. The method according to claim 27, wherein, in said selection step, states of acoustic models having output probabilities larger than a predetermined value are selected.
  - 29. The method according to claim 27, wherein, in said selection step, states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.
  - 30. The method according to claim 27, wherein said reception step includes a step of receiving compression-encoded speech parameters from a client connected to a network.
  - 31. The method according to claim 27, further comprising a transmission step of transmitting a recognition result, obtained in said speech recognition step, to the client.
  - 32. A storage medium that stores a control program for making a computer implement the method recited in claim 27.
  - 33. The method according to claim 27, wherein the reception step includes a step of receiving scalar-quantized speech parameters.
  - 34. The method according to claim 33, wherein the speech parameters include parameters indicating static and dynamic features.
  - 35. The method according to claim 33, wherein the speech parameters include parameters indicating static features.
  - 36. The method according to claim 35, wherein the speech parameters are decoded in said decoding step and the method further comprises a feature parameter generation step of generating parameters indicating dynamic features using the speech parameters decoded in said decoding step.
  - 37. The method according to claim 36, further comprising a feature parameter encoding step of compression-encoding the parameters generated in said feature parameter generation step, using an encoding method used to compression-encode the speech parameters received in said reception step.

38. A speech processing program in which speech information is inputted at a client side, and speech recognition is performed at a server side,said program comprising:
- an encoding step of compression-encoding the speech information;
  
  a transmission step of transmitting the compression-encoded speech information,a reception step of receiving the compression-encoded speech information;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received in said reception step;
  
  a selection step of selecting states of acoustic models using the first likelihood;
  
  a decoding step of decoding the compression-encoded speech information received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech information;
  
  a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step;
  
  an execution step of executing a predetermined application program based on a recognition result of said speech recognition step; and
  
  a transmission step of transmitting, to said client, a result obtained from the predetermined application program executed in said execution step.
- View Dependent Claims (39, 40)
- - 39. The program according to claim 38, wherein, in said selection step, states of acoustic models having output probabilities larger than a predetermined value are selected.
  - 40. The program according to claim 38, wherein, in said selection step, states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.

41. A speech processing program comprising:
- a reception step of receiving compression-encoded speech parameters from a client via a network;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step;
  
  a selection step of selecting states of acoustic models using the first likelihood;
  
  a decoding step of decoding the compression-encoded speech parameters received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters;
  
  a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step;
  
  an execution step of executing a predetermined application program based on a recognition result of said speech recognition step; and
  
  a transmission step of transmitting, to said client, a result obtained from the predetermined application program executed in said execution step.
- View Dependent Claims (42, 43)
- - 42. The program according to claim 41, wherein, in said selection step, states of acoustic models having output probabilities larger than a predetermined value are selected.
  - 43. The program according to claim 41, wherein, in said selection step, states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.

44. A speech processing system in which speech information is input at a client side, and speech recognition is performed at a server side,said client comprising:
- an acoustic analysis unit adapted to generate speech parameters by acoustically analyzing speech information;
  
  an encoding unit adapted to compression-encode the speech parameters; and
  
  a transmission unit adapted to transmit the compression-encoded speech parameters, andsaid server comprising;
  
  a reception unit adapted to receive the compression-encoded speech parameters;
  
  a first computation unit adapted to compute output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception unit;
  
  a selection unit adapted to select states of acoustic models using the first likelihood;
  
  a decoding unit adapted to decode the compression-encoded speech parameters received by said reception unit;
  
  a second computation unit adapted to compute output probabilities of states of acoustic models selected by said selection unit, as second likelihood, using the decoded speech parameters;
  
  a speech recognition unit adapted to accomplish speech recognition using the second likelihood obtained by said second computation unit;
  
  an execution unit adapted to execute a predetermined application program based on a recognition result of said speech recognition unit; and
  
  a transmission unit adapted to transmit, to said client, a result obtained from the predetermined application program executed by said execution unit.
- View Dependent Claims (45, 46)
- - 45. The system according to claim 44, wherein said selection unit selects states of acoustic models having output probabilities larger than a predetermined value.
  - 46. The system according to claim 44, wherein said selection unit selects states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation unit.

47. A speech processing apparatus comprising:
- a reception unit adapted to receive compression-encoded speech parameters from a client via a network;
  
  a first computation unit adapted to compute output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception unit;
  
  a selection unit adapted to select states of acoustic models using the first likelihood;
  
  a decoding unit adapted to decode the compression-encoded speech parameters received by said reception unit;
  
  a second computation unit adapted to compute output probabilities of states of acoustic models selected by said selection unit, as second likelihood, using the decoded speech parameters;
  
  a speech recognition unit adapted to accomplish speech recognition using the second likelihood obtained by said second computation unit;
  
  an execution unit adapted to execute a predetermined application program based on a recognition result of said speech recognition unit; and
  
  a transmission unit adapted to transmit, to said client, a result obtained from the predetermined application program executed by said execution unit.
- View Dependent Claims (48, 49)
- - 48. The apparatus according to claim 47, wherein said selection unit selects states of acoustic models having output probabilities larger than a predetermined value.
  - 49. The apparatus according to claim 47, wherein said selection unit selects states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Yamada, Masayuki, Ueyama, Teruhiko, Komori, Yasuhiro, Kushida, Akihiro, Kosaka, Tetsuo
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US10/956,130
Publication Number

US 20050043946A1
Time in Patent Office

610 Days
Field of Search

704/270.1, 704/251
US Class Current

704/270.1
CPC Class Codes

G10L 15/30 Distributed recognition, e....

Client-server speech processing system, apparatus, method, and storage medium

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

42 Citations

49 Claims

Specification

Solutions

Use Cases

Quick Links

Client-server speech processing system, apparatus, method, and storage medium

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

49 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links