Speech recognition system, speech recognition server, speech recognition client, their control method, and computer readable memory

US 7,099,824 B2
Filed: 11/27/2001
Issued: 08/29/2006
Est. Priority Date: 11/27/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A client-server speech recognition system comprising:

a client comprising;

display control means for controlling the display of a speech input window comprising plural input forms;

determining means for determining from among the displayed plural input forms an input form to which speech information is input as a target speech input;

first transmission means for transmitting input form identifying information indicating the input form determined when said determining means determines the input form;

storing means for storing a user dictionary which holds target recognition words and input form identifying information in association with each other;

speech receiving means for receiving the speech information inputted by a speech input module,second transmission means for transmitting the user dictionary and the speech information to the server; and

inputting means for inputting a speech recognition result received from the server to the input form to which the speech information is determined to be input by said determining means; and

a server comprising;

holding means for holding a plurality of kinds of recognition dictionaries, and a table managing a correspondence of the input form identifying information and each of the plurality of kinds of recognition dictionaries;

first receiving means for receiving the input form identifying information;

setting means for setting one or more recognition dictionaries corresponding to the received input form identifying information from said holding means by referring to the table;

second receiving means for receiving the user dictionary and the speech information;

speech recognition means for recognizing the speech information using (i) the target recognition words of the user dictionary associated with the input form to which speech is determined by said determining means to have been input that is identified by the received input form identifying information and (ii) the one or more recognition dictionaries set by the setting means; and

third transmission means for transmitting the speech recognition result of said speech recognition means to the client.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A user dictionary, which is formed by storing pronunciations and notations of target recognition words designated by the user in correspondence with each other, input speech recognition data, and dictionary management data used to determine the recognition field of a recognition dictionary used in recognition of the speech recognition data are sent to a server via a communication module. In the server, a dictionary management unit looks up an identifier table to determine a recognition dictionary corresponding to the dictionary management information received from a client from a plurality of kinds of recognition dictionaries. A speech recognition module recognizes the speech recognition data using at least the determined recognition dictionary. The recognition result is sent to the client via a communication module.

36 Citations

View as Search Results

11 Claims

1. A client-server speech recognition system comprising:
- a client comprising;
  
  display control means for controlling the display of a speech input window comprising plural input forms;
  
  determining means for determining from among the displayed plural input forms an input form to which speech information is input as a target speech input;
  
  first transmission means for transmitting input form identifying information indicating the input form determined when said determining means determines the input form;
  
  storing means for storing a user dictionary which holds target recognition words and input form identifying information in association with each other;
  
  speech receiving means for receiving the speech information inputted by a speech input module,second transmission means for transmitting the user dictionary and the speech information to the server; and
  
  inputting means for inputting a speech recognition result received from the server to the input form to which the speech information is determined to be input by said determining means; and
  
  a server comprising;
  
  holding means for holding a plurality of kinds of recognition dictionaries, and a table managing a correspondence of the input form identifying information and each of the plurality of kinds of recognition dictionaries;
  
  first receiving means for receiving the input form identifying information;
  
  setting means for setting one or more recognition dictionaries corresponding to the received input form identifying information from said holding means by referring to the table;
  
  second receiving means for receiving the user dictionary and the speech information;
  
  speech recognition means for recognizing the speech information using (i) the target recognition words of the user dictionary associated with the input form to which speech is determined by said determining means to have been input that is identified by the received input form identifying information and (ii) the one or more recognition dictionaries set by the setting means; and
  
  third transmission means for transmitting the speech recognition result of said speech recognition means to the client.

2. In a client-server speech recognition system, wherein the server has holding means for holding a plurality of kinds of recognition dictionaries, and a table managing a correspondence of input form identifying information and each of the plurality of kinds of recognition dictionaries, an information processing apparatus acting as a client comprising:
- display control means for controlling the display of a speech input window comprising plural input forms;
  
  determining means for determining from among the displayed plural input forms an input form to which speech information is input as a target speech input;
  
  first transmission means for transmitting to the server input form identifying information indicating the input form determined when said determining means determines the input form;
  
  storing means for storing a user dictionary which holds target recognition words and input form identifying information in association with each other;
  
  speech receiving means for receiving the speech information inputted by a speech input module, using a displayed input form;
  
  second transmission means for transmitting the user dictionary and the speech information to the server; and
  
  inputting means for inputting a speech recognition result received from the server to the input form to which the speech information is determined to be input by said determining means,wherein the server sets one or more recognition dictionaries corresponding to the received input form identifying information transmitted by said first transmission means from said holding means by referring to the table, and recognizes the received speech information transmitted by said second transmission means using (i) the target recognition words of the received user dictionary transmitted by said second transmission means associated with the input form to which the speech is determined by said determining means to have been input that is identified by the received input form identifying information and (ii) the one or more recognition dictionaries, and transmits the speech recognition result to the client.

3. In a client-server speech recognition system for recognizing, by a server, speech input at a client for inputting information to an input form, the client having display control means for controlling the display of a speech input window comprising plural input forms, determining means for determining from among the displayed plural input forms an input form to which speech information is input as a target speech input, and transmission means for transmitting input form identifying information indicating the input form determined when said determining means determines the input form, an information processing apparatus acting as the server comprising:
- holding means for holding a plurality of kinds of recognition dictionaries, and a table managing a correspondence of input form identifying information and each of the plurality of kinds of recognition dictionaries;
  
  first receiving means for receiving the input form identifying information transmitted by said transmission means;
  
  setting means for setting one or more recognition dictionaries corresponding to the received input form identifying information from said holding means by referring to the table;
  
  second receiving means for receiving from the client speech information and a user dictionary that holds target recognition words and input form identifying information in association with each other; and
  
  speech recognition means for recognizing speech information using (i) the target recognition words of the user dictionary associated with the input form to which the speech inputted by the client is inputted as identified by the received input form identifying information and (ii) one or more recognition dictionaries set by the setting means,wherein the client inputs a speech recognition result received from the server to the input form to which the speech information is determined to be input by said determining means.

4. A client-server speech recognition system comprising:
- a client comprising;
  
  display control means for controlling the displaying of a speech input window comprising plural input forms;
  
  determining means for determining from among the displayed plural input forms an input form to which speech information is input as a target speech input;
  
  first transmission means for transmitting input form identifying information indicating the input form determined when said determining means determines the input form;
  
  speech receiving means for receiving the speech information inputted by a speech input module;
  
  second transmission means for transmitting (i) a user dictionary that holds target recognition words and input form identifying information in association with each other and (ii) the speech information to the server; and
  
  inputting means for inputting a speech recognition result received from the server to the input form to which the speech information is determined to be input by said determining means; and
  
  a server comprising;
  
  holding means for holding a plurality of kinds of recognition dictionaries, and a table managing a correspondence of the input form identifying information and each of the plurality of kinds of recognition dictionaries;
  
  first receiving means for receiving the input form identifying information;
  
  setting means for setting one or more recognition dictionaries corresponding to the received input form identifying information from said holding means by referring to the table;
  
  second receiving means for receiving the user dictionary and the speech information;
  
  speech recognition means for recognizing the speech information using (i) the target recognition words of the user dictionary associated with the input form to which speech is determined by said determining means to have been input that is identified by the received input form identifying information and (ii) the one or more recognition dictionaries set by the setting means; and
  
  third transmission means for transmitting the speech recognition result of said speech recognition means to the client.

5. A client-server speech recognition apparatus comprising:
- a client comprising;
  
  display control means for controlling the displaying of a speech input window comprising plural input forms;
  
  determining means for determining from among the displayed plural input forms an input form to which speech information is input as a target speech input;
  
  first transmission means for transmitting input form identifying information indicating the input form determined when said determining means determines the input form;
  
  speech receiving means for receiving the speech information inputted by a speech input module;
  
  second transmission means for transmitting (i) a user dictionary that holds target recognition words and input form identifying information in association with each other and (ii) the speech information to the server; and
  
  inputting means for inputting a speech recognition result received from the server to the input form to which the speech information is determined to be input by said determining means; and
  
  a server comprising;
  
  holding means for holding a plurality of kinds of recognition dictionaries, and a table managing a correspondence of the input form identifying information and each of the plurality of kinds of recognition dictionaries;
  
  first receiving means for receiving the input form identifying information;
  
  setting means for setting one or more recognition dictionaries corresponding to the received input form identifying information from said holding means by referring to the table;
  
  second receiving means for receiving the user dictionary and the speech information;
  
  speech recognition means for recognizing the speech information using (i) the target recognition words of the user dictionary associated with the input form to which speech is determined by said determining means to have been input that is identified by the received input form identifying information and (ii) the one or more recognition dictionaries set by the setting means; and
  
  third transmission means for transmitting a speech recognition result of said speech recognition means to the client.

6. In a client-server speech recognition system, wherein the server has holding means for holding a plurality of kinds of recognition dictionaries, and a table managing a correspondence of input form identifying information and each of the plurality of kinds of recognition dictionaries, a method of controlling an information processing apparatus acting as a client comprising:
- a display control step of controlling the displaying of a speech input window comprising plural input forms;
  
  a determining step of determining from among the displayed plural input forms an input form to which speech information is input as a target speech;
  
  a first transmission step of transmitting to the server input form identifying information indicating the input form determined in said determining step when the input form is determined;
  
  a speech receiving step of receiving the speech information inputted by a speech input module;
  
  a second transmission step of transmitting a user dictionary which holds target recognition words and input form identifying information in association with each other, and the speech information, to the server; and
  
  an inputting step of inputting a speech recognition result received from the server to the input form to which the speech information is determined to be input in said determining step,wherein the server sets one or more recognition dictionaries corresponding to the received input form identifying information transmitted in said first transmission step from said holding means by referring to the table, and recognizes the received speech information transmitted in said second transmission step using (i) the target recognition words of the received user dictionary transmitted in said second transmission step associated with the input form to which the speech is determined in said determining step to have been input that is identified by the received input form identifying information and (ii) the one or more recognition dictionaries, and transmits the speech recognition result to the client.

7. In a client-server speech recognition system for recognizing, by a server, speech input at a client for inputting information to an input form, the client having display control means for controlling the display of a speech input window comprising plural input forms, determining means for determining from among the displayed plural input forms an input form to which speech information is input as a target speech input, and transmission means for transmitting input form identifying information indicating the input form determined when said determining means determines the input form, a method of controlling an information processing apparatus acting as the server comprising:
- a first receiving step of receiving input form identifying information transmitted by said transmission means;
  
  a setting step of setting one or more recognition dictionaries corresponding to the received input form identifying information from a holding means that holds a plurality of kinds of recognition dictionaries by referring to a table managing a correspondence of the input form identifying information and each of the plurality of kinds of recognition dictionaries;
  
  a second receiving step of receiving from the client a user dictionary and speech information, the user dictionary holding target recognition words and input form identifying information in association with each other; and
  
  a speech recognition step of recognizing the speech information using (i) the target recognition words of the user dictionary associated with the input form to which the speech information is input, identified by the received input form identifying information and (ii) one or more recognition dictionaries set in said setting step,wherein the client inputs a speech recognition result received from the server to the input form to which the speech information is determined to be input by said determining means.

8. In a client-server speech recognition system, wherein the server has holding means for holding a plurality of kinds of recognition dictionaries, and a table managing a correspondence of input form identifying information and each of the plurality of kinds of recognition dictionaries, a method of performing speech recognition comprising:
- at the client side;
  
  a display control step of controlling the displaying of a speech input window comprising plural input forms;
  
  a determining step of determining from among the displayed plural input forms an input form to which speech information is input as a target speech;
  
  a first transmission step of transmitting input form identifying information indicating the input form determined when said determining step determines the input form;
  
  a speech receiving step of receiving the speech information inputted by a speech input module, using an input form;
  
  a second transmission step of transmitting (i) a user dictionary that holds target recognition words and input form identifying information in association with each other and (ii) the speech information to the server; and
  
  an inputting step of inputting a speech recognition result received from the server to the input form to which the speech information is determined to be input in said determining step; and
  
  at the server side;
  
  a first receiving step of receiving the input form identifying information;
  
  a setting step of setting one or more recognition dictionaries corresponding to the received input form identifying information from said holding means by referring to the table;
  
  a second receiving step of receiving the user dictionary and the speech information;
  
  a speech recognition step of recognizing the speech information using (i) the target recognition words of the user dictionary associated with the input form to which speech is determined in said determining step to have been input that is identified by the received input form identifying information and (ii) the one or more recognition dictionaries set by the setting means; and
  
  a third transmission step of transmitting a speech recognition result of said speech recognition step to the client.

9. In a client-server speech recognition system, wherein the server has holding means for holding a plurality of kinds of recognition dictionaries, and a table managing a correspondence of input form identifying information and each of the plurality of kinds of recognition dictionaries, a computer readable memory that stores program code of control of an information processing apparatus acting as a client comprising:
- program code of a display control step of controlling the displaying of a speech input window comprising plural input forms;
  
  program code of a determining step of determining from among the displayed plural input forms an input form to which speech information is input as a target speech;
  
  program code of a first transmission step of transmitting to the server input form identifying information indicating the input form determined in the determining step when the input form is determined;
  
  program code of a speech receiving step of receiving the speech information inputted by a speech input module, using an input form;
  
  program code of a second transmission step of transmitting a user dictionary which holds target recognition words and input form identifying information in association with each other, and the speech information, to the server; and
  
  program code of an inputting step of inputting a speech recognition result received from the server to the input form to which the speech information is determined to be input in the determining step,wherein the server sets one or more recognition dictionaries corresponding to the received input form identifying information transmitted in said first transmission step from said holding means by referring to the table, and recognizes the received speech information transmitted in said second transmission step using (i) the target recognition words of the received user dictionary transmitted in said second transmission step associated with the input form to which the speech is determined in said determining step to have been input that is identified by the received input form identifying information and (ii) the one or more recognition dictionaries, and transmits the speech recognition result to the client.

10. In a client-server speech recognition system for recognizing, by a server, speech input at a client for inputting information to an input form, the client having display control means for controlling the display of a speech input window comprising plural input forms, determining means for determining from among the displayed plural input forms an input form to which speech information is input as a target speech input, and transmission means for transmitting input form identifying information indicating the input form determined when said determining means determines the input form, a computer readable memory that stores program code of control of an information processing apparatus acting as the server comprising:
- program code of a first receiving step of receiving input form identifying information transmitted by said transmission means;
  
  program code of a setting step of setting one or more recognition dictionaries corresponding to the received input form identifying information from a holding means that holds a plurality of kinds of recognition dictionaries by referring to a table managing a correspondence of the input form identifying information and each of the plurality of kinds of recognition dictionaries;
  
  program code of a second receiving step of receiving from the client a user dictionary and speech information, the user dictionary holding target recognition words and input form identifying information in association with each other; and
  
  program code of a speech recognition step of recognizing the speech information using (i) the target recognition words associated with the input form to which the speech information is input, identified by the received input form identifying information and (ii) one or more recognition dictionaries set in the setting step,wherein the client inputs a speech recognition result received from the server to the input form to which the speech information is determined to be input by said determining means.

11. In a client-server speech recognition system, wherein the server has holding means for holding a plurality of kinds of recognition dictionaries, and a table managing a correspondence of input form identifying information and each of the plurality of kinds of recognition dictionaries, a computer readable memory that stores program code for performing speech recognition, comprising:
- program code of a display control step of controlling the displaying of a speech input window comprising plural input forms;
  
  program code of a determining step of determining from among the displayed plural input forms an input form to which speech information is input as a target speech;
  
  program code of a first transmission step of transmitting to the server input form identifying information indicating the input form determined when said determining step determines the input form;
  
  program code of a speech receiving step of receiving the speech information inputted by a speech input module;
  
  program code of a second transmission step of transmitting (i) a user dictionary that holds target recognition words and input form identifying information in association with each other and (ii) the speech information to the server;
  
  program code of an inputting step of inputting a speech recognition result received from the server to the input form to which the speech information is determined to be input in said determining step;
  
  program code of a first receiving step of receiving from the client the input form identifying information;
  
  program code of a setting step of setting one or more recognition dictionaries corresponding to the received input form identifying information from said holding means by referring to the table;
  
  program code of a second receiving step of receiving from the client the user dictionary and the speech information;
  
  program code of a speech recognition step of recognizing the speech information using (i) the target recognition words of the user dictionary associated with the input form to which speech is determined in said determining step to have been input that is identified by the received input form identifying information and (ii) the one or more recognition dictionaries set in the setting step; and
  
  program code of a third transmission step of transmitting a speech recognition result of the speech recognition step to the client.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Kushida, Akihiro, Kosaka, Tetsuo
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Vo, Huyen X.

Application Number

US09/993,570
Publication Number

US 20020065652A1
Time in Patent Office

1,736 Days
Field of Search

704/231, 704/270, 704/275, 704/243, 704/255, 704/235, 704/270.1, 704/246, 382/189
US Class Current

704/231
CPC Class Codes

G10L 15/30 Distributed recognition, e....

Speech recognition system, speech recognition server, speech recognition client, their control method, and computer readable memory

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

36 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system, speech recognition server, speech recognition client, their control method, and computer readable memory

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links