Client-server speech processing system, apparatus, method, and storage medium

US 20050043946A1
Filed: 10/04/2004
Published: 02/24/2005
Est. Priority Date: 05/24/2000
Status: Active Grant

First Claim

Patent Images

1-46. -46. (Cancelled)

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The system implements high-accuracy speech recognition while suppressing the amount of data transfer between the client and server. For this purpose, the client compression-encodes speech parameters by a speech processing unit, and sends the compression-encoded speech parameters to the server. The server receives the compression-encoded speech parameters, a speech processing unit makes speech recognition of the compression-encoded speech parameters, and sends information corresponding to the speech recognition result to the client.

179 Citations

95 Claims

1-46. -46. (Cancelled)

47. A speech processing system in which speech information is input at a client side, and speech recognition is performed at a server side, said client comprising:
- encoding means for compression-encoding the speech information; and
  
  transmission means for transmitting the compressed-encoded speech information, and said server comprising;
  
  reception means for receiving the compression-encoded speech information;
  
  first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received by said reception means;
  
  selection means for selecting states of acoustic models using the first likelihood;
  
  decoding means for decoding the compression-encoded speech information received by said reception means;
  
  second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech information; and
  
  speech recognition means for accomplishing speech recognition using the second likelihood obtained by said second computation means.
- View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55, 80, 81)
- - 48. The system according to claim 47, wherein said client further comprises acoustic analysis means for generating speech parameters by acoustically analyzing the speech information, wherein said encoding means compression-encodes the speech parameters, and said transmission means transmits the compression-encoded speech parameters.
  - 49. The system according to claim 48, wherein said encoding means scalar-quantizes the speech parameters.
  - 50. The system according to claim 48, wherein the speech parameters include parameters indicating static and dynamic features.
  - 51. The system according to claim 48, wherein the speech parameters include parameters indicating static features.
  - 52. The system according to claim 51, wherein said reception means receives the speech parameters, said decoding means decodes the speech parameters, and said server further comprises feature parameter generation means for generating parameters indicating dynamic features using the speech parameters decoded by said decoding means.
  - 53. The system according to claim 52, wherein said server further comprises feature parameter encoding means for compression-encoding the parameters generated by said feature parameter generation means using an encoding method used to compression-encode the speech parameters received by said reception means.
  - 54. The system according to 47, wherein said server further comprises transmission means for transmitting a recognition result of said speech recognition means to said client.
  - 55. The system according to claim 48, wherein said client further comprises reception means for receiving a speech recognition result of said server using the speech parameters.
  - 80. The system according to claim 47, wherein said selection means selects states of acoustic models having output probabilities larger than a predetermined value.
  - 81. The system according to claim 47, wherein said selection means selects states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation means.

56. A speech processing apparatus comprising:
- reception means for receiving compression-encoded speech information from a client via a network;
  
  first computation means for computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received by said reception means;
  
  selection means for selecting states of acoustic models using the first likelihood;
  
  decoding means for decoding the compression-encoded speech information received by said reception means;
  
  second computation means for computing output probabilities of states of acoustic models selected by said selection means, as second likelihood, using the decoded speech information; and
  
  speech recognition means for accomplishing speech recognition using the second likelihood obtained by said second computation means.
- View Dependent Claims (57, 58, 59, 60, 61, 62, 63, 82, 83)
- - 57. The apparatus according to claim 56, wherein said reception means receives scalar-quantized speech parameters.
  - 58. The apparatus according to claim 57, wherein the speech parameters include parameters indicating static and dynamic features.
  - 59. The apparatus according to claim 57, wherein the speech parameters include parameters indicating static features.
  - 60. The apparatus according to claim 59, wherein said reception means receives the speech parameters, said decoding means decodes the speech parameters, and said apparatus further comprises feature parameter generation means for generating parameters indicating dynamic features using the speech parameters decoded by said decoding means.
  - 61. The apparatus according to claim 60, further comprising feature parameter encoding means for compression-encoding the parameters generated by said feature parameter generation means using an encoding method used to compression-encode the speech parameters received by said reception means.
  - 62. The apparatus according to claim 56, wherein said reception means receives compression-encoded speech parameters from a client connected to a network.
  - 63. The apparatus according to claim 62, further comprising transmission means for transmitting a recognition result of said speech recognition means to the client.
  - 82. The apparatus according to claim 56, wherein said selection means selects states of acoustic models having output probabilities larger than a predetermined value.
  - 83. The apparatus according to claim 56, wherein said selection means selects states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation means.

64. A speech processing method in which speech information is input at a client side, and speech recognition is performed at a server side, said method comprising:
- an encoding step of compression-encoding the speech information;
  
  a transmission step of transmitting the compressed-encoded speech information;
  
  a reception step of receiving the compression-encoded speech information;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received in said reception step;
  
  a selection step of selecting states of acoustic models using the first likelihood;
  
  a decoding step of decoding the compression-encoded speech information received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech information; and
  
  a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step.
- View Dependent Claims (65, 74, 84, 85)
- - 65. The method according to claim 64, wherein said method further comprises an acoustic analysis step of generating speech parameters by acoustically analyzing the speech information, wherein the speech parameters are compression-encoded in said encoding step, and the compression-encoded speech parameters are transmitted in said transmission step.
  - 74. A storage medium that stores a control program for making a computer implement the method recited in claim 64.
  - 84. The method according to claim 64, wherein, in said selection step, states of acoustic models having output probabilities larger than a predetermined value are selected.
  - 85. The method according to claim 64, wherein, in said selection step, states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.

66. A speech processing method comprising:
- a reception step of receiving compression-encoded speech information from a client via a network;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received in said reception step;
  
  a selection step of selecting states of acoustic models using the first likelihood;
  
  a decoding step of decoding the compression-encoded speech information received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech information; and
  
  a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step.
- View Dependent Claims (67, 68, 69, 70, 71, 72, 73, 75, 86, 87)
- - 67. The method according to claim 66, wherein the reception step includes a step of receiving scalar-quantized speech parameters.
  - 68. The method according to claim 67, wherein the speech parameters include parameters indicating static and dynamic features.
  - 69. The method according to claim 67, wherein the speech parameters include parameters indicating static features.
  - 70. The method according to claim 69, wherein the speech parameters are decoded in said decoding step and the method further comprises a feature parameter generation step of generating parameters indicating dynamic features using the speech parameters decoded in said decoding step.
  - 71. The method according to claim 70, further comprising a feature parameter encoding step of compression-encoding the parameters generated in said feature parameter generation step, using an encoding method used to compression-encode the speech parameters received in said reception step.
  - 72. The method according to claim 66, wherein said reception step includes a step of receiving compression-encoded speech parameters from a client connected to a network.
  - 73. The method according to claim 66, further comprising a transmission step of transmitting a recognition result, obtained in said speech recognition step, to the client.
  - 75. A storage medium that stores a control program for making a computer implement the method recited in claim 66.
  - 86. The method according to claim 66, wherein, in said selection step, states of acoustic models having output probabilities larger than a predetermined value are selected.
  - 87. The method according to claim 66, wherein, in said selection step, states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.

76. A speech processing program in which speech information is inputted at a client side, and speech recognition is performed at a server side, said program comprising:
- an encoding step of compression-encoding the speech information;
  
  a transmission step of transmitting the compression-encoded speech information, a reception step of receiving the compression-encoded speech information;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech information received by said reception step;
  
  a selection step of selecting states of acoustic models using the first likelihood;
  
  a decoding step of decoding the compression-encoded speech information received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech information; and
  
  a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step.
- View Dependent Claims (88, 89)
- - 88. The program according to claim 76, wherein, in said selection step, states of acoustic models having output probabilities larger than a predetermined value are selected.
  - 89. The program according to claim 76, wherein, in said selection step, states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.

77. A speech processing program comprising:
- a reception step of receiving compression-encoded speech parameters from a client via a network;
  
  a first computation step of computing output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received in said reception step;
  
  a selection step of selecting states of acoustic models using the first likelihood;
  
  a decoding step of decoding the compression-encoded speech parameters received in said reception step;
  
  a second computation step of computing output probabilities of states of acoustic models selected in said selection step, as second likelihood, using the decoded speech parameters; and
  
  a speech recognition step of accomplishing speech recognition using the second likelihood obtained in said second computation step.
- View Dependent Claims (90, 91)
- - 90. The program according to claim 77, wherein, in said selection step, states of acoustic models having output probabilities larger than a predetermined value are selected.
  - 91. The program according to claim 77, wherein, in said selection step, states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed in said first computation step are selected.

78. A speech processing system in which speech information is input at a client side, and speech recognition is performed at a server side, said client comprising:
- an acoustic analysis unit adapted to generate speech parameters by acoustically analyzing speech information;
  
  an encoding unit adapted to compression-encode the speech parameters; and
  
  a transmission unit adapted to transmit the compression-encoded speech parameters, and said server comprising;
  
  a reception unit adapted to receive the compression-encoded speech parameters;
  
  a first computation unit adapted to compute output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception unit;
  
  a selection unit adapted to select states of acoustic models using the first likelihood;
  
  a decoding unit adapted to decode the compression-encoded speech parameters received by said reception unit;
  
  a second computation unit adapted to compute output probabilities of states of acoustic models selected by said selection unit, as second likelihood, using the decoded speech parameters; and
  
  a speech recognition unit adapted to accomplish speech recognition using the second likelihood obtained by said second computation unit.
- View Dependent Claims (92, 93)
- - 92. The system according to claim 78, wherein said selection unit selects states of acoustic models having output probabilities larger than a predetermined value.
  - 93. The system according to claim 78, wherein said selection unit selects states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation unit.

79. A speech processing apparatus comprising:
- a reception unit adapted to receive compression-encoded speech parameters from a client via a network;
  
  a first computation unit adapted to compute output probabilities of states of acoustic models, as first likelihood, using the compression-encoded speech parameters received by said reception unit;
  
  a selection unit adapted to select states of acoustic models using the first likelihood;
  
  a decoding unit adapted to decode the compression-encoded speech parameters received by said reception unit;
  
  a second computation unit adapted to compute output probabilities of states of acoustic models selected by said selection unit, as second likelihood, using the decoded speech parameters; and
  
  a speech recognition unit adapted to accomplish speech recognition using the second likelihood obtained by said second computation unit.
- View Dependent Claims (94, 95)
- - 94. The apparatus according to claim 79, wherein said selection unit selects states of acoustic models having output probabilities larger than a predetermined value.
  - 95. The apparatus according to claim 79, wherein said selection unit selects states of acoustic models having output probabilities within a predetermined range of which the largest value is a largest output probability computed by said first computation unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Ayutthaya Limited (Canon Inc.)
Original Assignee
Canon Ayutthaya Limited (Canon Inc.)
Inventors
Yamada, Masayuki, Ueyama, Teruhiko, Komori, Yasuhiro, Kushida, Akihiro, Kosaka, Tetsuo

Granted Patent

US 7,058,580 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 15/30 Distributed recognition, e....

Client-server speech processing system, apparatus, method, and storage medium

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

179 Citations

95 Claims

Specification

Solutions

Use Cases

Quick Links

Client-server speech processing system, apparatus, method, and storage medium

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

179 Citations

95 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links