Client-server speech processing system, apparatus, method, and storage medium

US 6,813,606 B2
Filed: 12/20/2000
Issued: 11/02/2004
Est. Priority Date: 05/24/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A speech processing system in which speech information is input at a client side, and speech recognition is done at a serve side,said client comprising:

acoustic analysis means for generating speech parameters by acoustically analyzing speech information;

encoding means for compression-encoding the speech parameters; and

transmission means for transmitting the compression-encoded speech parameters, and said server comprising;

reception means for receiving the compression-encoded speech parameters;

first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception means;

selection means for selecting states of acoustic models using only the first likelihood;

decoding means for decoding the compression-encoded speech parameters received by said reception means;

second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech parameters; and

speech recognition means for making speech recognition using the second likelihood obtained by said second computation means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The system implements high-accuracy speech recognition while suppressing the amount of data transfer between the client and server. For this purpose, the client compression-encodes speech parameters by a speech processing unit, and sends the compression-encoded speech parameters to the server. The server receives the compression-encoded speech parameters, and speech processing unit makes speech recognition of the compression-encoded speech parameters, and sends information corresponding to the speech recognition result to the client.

53 Citations

View as Search Results

47 Claims

1. A speech processing system in which speech information is input at a client side, and speech recognition is done at a serve side,said client comprising:
- acoustic analysis means for generating speech parameters by acoustically analyzing speech information;
  
  encoding means for compression-encoding the speech parameters; and
  
  transmission means for transmitting the compression-encoded speech parameters, and said server comprising;
  
  reception means for receiving the compression-encoded speech parameters;
  
  first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception means;
  
  selection means for selecting states of acoustic models using only the first likelihood;
  
  decoding means for decoding the compression-encoded speech parameters received by said reception means;
  
  second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech parameters; and
  
  speech recognition means for making speech recognition using the second likelihood obtained by said second computation means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system according to claim 1, wherein said encoding means scalar-quantizes the speech parameters.
  - 3. The system according to claim 1, wherein the speech parameters include parameters indicating static and dynamic features.
  - 4. The system according to claim 1, wherein said server further comprises transmission means for transmitting a recognition result of said speech recognition means to said client.
  - 5. The system according to claim 1, wherein said client further comprises reception means for receiving a speech recognition result of said server using the speech parameters.
  - 6. The system according to claim 1, wherein said selection means selects acoustic models having output probabilities larger than a predetermined value.
  - 7. The system according to claim 1, wherein said selection means selects acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation means.
  - 8. The system according to claim 1, wherein the speech parameters include parameters indicating static features.
  - 9. The system according to claim 8, wherein said server further comprises feature parameter generation means for generating parameters indicating dynamic features using the speech parameters decoded by said decoding means.
  - 10. The system according to claim 9, wherein said server further comprises feature parameter encoding means for compression-encoding the parameters generated by said feature parameter generation means using an encoding method that compression-encodes the speech parameters received by said reception means.

11. A speech processing apparatus comprising:
- reception means for receiving compression-encoded speech parameters from a client via a network;
  
  first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception means;
  
  selection means for selecting states of acoustic models using only the first likelihood;
  
  decoding means for decoding the compression-encoded speech parameters received by said reception means;
  
  second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech parameters; and
  
  speech recognition means for making speech recognition using the second likelihood obtained by said second computation means.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The apparatus according to claim 11, wherein said reception means receives scalar-quantized speech parameters.
  - 13. The apparatus according to claim 11, wherein the speech parameters include parameters indicating static and dynamic features.
  - 14. The apparatus according to claim 11, wherein said selection means selects acoustic models having output probabilities larger than a predetermined value.
  - 15. The apparatus according to claim 11, wherein said selection means selects acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation means.
  - 16. The apparatus according to claim 11, wherein said reception means receives the compression-encoded speech parameters from a client connected to a network.
  - 17. The apparatus according to claim 16, further comprising transmission means for transmitting a recognition result of said speech recognition means to the client.
  - 18. The apparatus according to claim 11, wherein the speech parameters include parameters indicating static features.
  - 19. The apparatus according to claim 18, further comprising feature parameter generation means for generating parameters indicating dynamic features using the speech parameters decoded by said decoding means.
  - 20. The apparatus according to claim 19, further comprising feature parameter encoding means for compression-encoding the parameters generate by said dynamic speech parameter generation means using an encoding method that compression-encodes the speech parameters received by said reception means.

21. A speech processing method in which speech information is input at a client side, and speech recognition is done at a server side,comprising at the client side:
- an acoustic analysis step of generating speech parameters by acoustically analyzing speech information;
  
  an encoding step of compression-encoding the speech parameters; and
  
  a transmission step of transmitting the compression-encoded speech parameters, and comprising at the server side;
  
  a reception step of receiving the compression-encoded speech parameters;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step;
  
  a selection step of selecting states of acoustic models using only the first likelihood;
  
  a decoding step of decoding the compression-encoded speech parameters received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters; and
  
  a speech recognition step of making speech recognition using the second likelihood obtained in said second computation step.
- View Dependent Claims (22, 23, 24)
- - 22. A storage medium that stores a control program for making a computer implement the method recited in claim 21.
  - 23. The method according to claim 21, wherein in said selection step, acoustic models having output probabilities larger than a predetermined value are selected.
  - 24. The method according to claim 21, wherein in said selection step, acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.

25. A speech processing method comprising:
- a reception step of receiving compression-encoded speech parameters from a client via a network;
  
  a first computation step for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step;
  
  a selection step of selecting states of acoustic models using only the first likelihood;
  
  a decoding step of decoding the compression-encoded speech parameters received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters; and
  
  a speech recognition step of making speech recognition using the second likelihood obtained in said second computation step.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
- - 26. The method according to claim 25, wherein said reception step includes a step of receiving the compression-encoded speech parameters from a client connected to a network.
  - 27. The method according to claim 25, further comprising a transmission step of transmitting a recognition result in said speech recognition step to the client.
  - 28. A storage medium that stores a control program for making a computer implement the method recited in claim 25.
  - 29. The method according to claim 25, wherein in said selection step, acoustic models having output probabilities larger than a predetermined value are selected.
  - 30. The method according to claim 25, wherein in said selection step, acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.
  - 31. The method according to claim 25, wherein the reception step includes the step of receiving scalar-quantized speech parameters.
  - 32. The method according to claim 25, wherein the speech parameters include parameters indicating static and dynamic features.
  - 33. The method according to claim 25, wherein the speech parameters include parameters indicating static features.
  - 34. The method according to claim 33, further comprising a feature parameter generation step of generating parameters indicating dynamic features using the speech parameters decoded in said decoding step.
  - 35. The method according to claim 34, further comprising a feature parameter encoding step of compression-encoding the parameters, which are generated in said dynamic speech parameter generation step and indicate dynamic features, using an encoding method that compression-encodes the speech parameters received in said reception step.

36. A speech processing program in which speech information is input at a client side, and speech recognition is done at a server side, said program implementing,at the client side:
- an acoustic analysis step of generating speech parameters by acoustically analyzing speech information;
  
  an encoding step of compression-encoding the speech parameters; and
  
  a transmission step of transmitting the compression-encoded speech parameters, and at the server side;
  
  a reception step of receiving compression-encoded speech parameters;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step;
  
  a selection step of selecting states of acoustic models using only the first likelihood;
  
  a decoding step of decoding the compression-encoded speech parameters received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters; and
  
  a speech recognition step of making speech recognition using the second likelihood obtained in said second computation step.
- View Dependent Claims (37, 38)
- - 37. The program according to claim 36, wherein in said selection step, acoustic models having output probabilities larger than a predetermined value are selected.
  - 38. The program according to claim 36, wherein in said selection step, acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.

39. A speech processing program implementing:
- a reception step of receiving compression-encoded speech parameters from a client via a network;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step;
  
  a selection step of selecting states of acoustic models using only the first likelihood;
  
  a decoding step of decoding the compression-encoded speech parameters received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters; and
  
  a speech recognition step of making speech recognition using the second likelihood obtained in said second computation step.
- View Dependent Claims (40, 41)
- - 40. The program according to claim 39, wherein in said selection step, acoustic models having output probabilities larger than a predetermined value are selected.
  - 41. The program according to claim 39, wherein in said selection step, acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.

42. A speech processing system in which speech information is input at a client side, and speech recognition is done at a server side,said client comprising:
- an acoustic analysis unit adapted to generate speech parameters by acoustically analyzing speech information;
  
  an encoding unit adapted to compression-encode the speech parameters; and
  
  a transmission unit adapted to transmit the compression-encoded speech parameters, and said server comprising;
  
  a reception unit adapted to receive the compression-encode speech parameters;
  
  a first computation unit adapted to compute output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception unit;
  
  a selection unit adapted to select states of acoustic models using only the first likelihood;
  
  a decoding unit adapted to decode the compression-encode speech parameters received by said reception unit;
  
  a second computation unit adapted to compute output probabilities of states of acoustic models selected by said selection unit, as second likelihood, using the decoded speech parameters; and
  
  a speech recognition unit adapted to accomplish speech recognition using the second likelihood obtained by said second computation unit.
- View Dependent Claims (43, 44)
- - 43. The system according to claim 42, wherein said selection unit selects acoustic models having output probabilities larger than a predetermined value.
  - 44. The system according to claim 42, wherein said selection unit selects acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation unit.

45. A speech processing apparatus comprising:
- a reception unit adapted to receive compression-encoded speech parameters from a client via a network;
  
  a first computation unit adapted to compute output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception unit;
  
  a selection unit adapted to select states of acoustic models using only the first likelihood;
  
  a decoding unit adapted to decode the compression-encoded speech parameters received by said reception unit;
  
  a second computation unit adapted to compute output probabilities of states of acoustic models selected by said selection unit, as second likelihood, using the decoded speech parameters; and
  
  a speech recognition unit adapted to accomplish speech recognition using the second likelihood obtained by said computation unit.
- View Dependent Claims (46, 47)
- - 46. The apparatus according to claim 45, wherein said selection unit selects acoustic models having output probabilities larger than a predetermined value.
  - 47. The apparatus according to claim 45, wherein said selection unit selects acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Yamada, Masayuki, Ueyama, Teruhiko, Komori, Yasuhiro, Kushida, Akihiro, Kosaka, Tetsuo
Primary Examiner(s)
To, Doris H.
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US09/739,878
Publication Number

US 20010056346A1
Time in Patent Office

1,413 Days
Field of Search

704/251, 704/252, 704/270.1
US Class Current

704/270.1
CPC Class Codes

G10L 15/30 Distributed recognition, e....

Client-server speech processing system, apparatus, method, and storage medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

53 Citations

47 Claims

Specification

Solutions

Use Cases

Quick Links

Client-server speech processing system, apparatus, method, and storage medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

53 Citations

47 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links