Speech recognition system trained with regional speech characteristics

US 20050144001A1
Filed: 01/07/2005
Published: 06/30/2005
Est. Priority Date: 11/12/1999
Status: Active Grant

First Claim

Patent Images

1. A method of optimizing recognition of a speech utterance from a user with a distributed speech processing system comprising the steps of:

(a) training one or more speech recognition models for recognizing speech utterances in a first natural language in a first training operation;

wherein said speech recognition models are implemented as part of a speech recognition engine executing on a network server system of the distributed speech processing system;

(b) training said one or more speech recognition models in a second training operation, said second training operation being based on additional samples of speech from a group of persons employing said first natural language and residing in geographic regions served by the distributed speech processing system;

wherein recognition of speech utterances during a speech recognition process is optimized for a geographic region by using one or more speech models which include variants of words to be uttered by users of the distributed client-server system.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system uses speech recognition models which are specifically trained and optimized for users residing in a particular geographic area or region. The speech models are trained with samples of word variants expected to be used in a natural language by representative members of a population associated with the geographic region or community of users. The speech recognition system is configured to have a real-time response that imitates a dialogue with a human operator.

Citations

20 Claims

1. A method of optimizing recognition of a speech utterance from a user with a distributed speech processing system comprising the steps of:
- (a) training one or more speech recognition models for recognizing speech utterances in a first natural language in a first training operation;
  
  wherein said speech recognition models are implemented as part of a speech recognition engine executing on a network server system of the distributed speech processing system;
  
  (b) training said one or more speech recognition models in a second training operation, said second training operation being based on additional samples of speech from a group of persons employing said first natural language and residing in geographic regions served by the distributed speech processing system;
  
  wherein recognition of speech utterances during a speech recognition process is optimized for a geographic region by using one or more speech models which include variants of words to be uttered by users of the distributed client-server system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein said speech recognition models are Hidden Markov Models.
  - 3. The method of claim 1, further including a step:
    - recognizing a speech utterance from a user by selecting a speech model from said one more speech recognition models.
  - 4. The method of claim 3, further including a step:
    - providing a response to said speech utterance over a network connection to a client device employed by the user.
  - 5. The method of claim 4, further including a step:
    - converting said response to audible form using a text-to-speech engine.
  - 6. The method of claim 5, wherein said response is voiced by an interactive electronic agent.
  - 7. The method of claim 3, wherein said response to said speech utterance is provided to the user before said speech utterance is completely recognized.
  - 8. The method of claim 3, further including a step:
    - dynamically switching a grammar to be used by said speech model based on an application being used by the user at the time of said speech utterance.
  - 9. The method of claim 8, further including a step:
    - dynamically switching a dictionary to be used by said speech model based on an application being used by the user at the time of said speech utterance.
  - 10. The method of claim 1, further including a step:
    - configuring a set of speech recognition operations to be performed by the network server system based on computing resources available to such system.
  - 11. The method of claim 3, further including a step:
    - calibrating noise at the client device prior to recognizing the speech utterance.

12. A method of optimizing recognition of a speech utterance from a user with a distributed speech processing system comprising the steps of:
- (a) receiving first speech data from a client device in streaming packets through a network interface of a network server system and/or plurality of servers, said first speech data resulting from a first set of speech recognition operations being performed on the speech utterance by the client device;
  
  (b) completing recognition of the speech utterance using software routines executing at the network server system and/or plurality of servers which implement a second set of speech recognition operations;
  
  wherein said software routines at the network server system and/or plurality of servers use one or more speech recognition models that are trained based on speech characteristics of a group of persons residing in geographical regions served by the distributed client-server system;
  
  (c) providing a real-time response to the user based on the speech utterance as well as subsequent speech utterances from the user so that an interactive dialog is conducted by the distributed speech processing system.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The method of claim 12 wherein an electronic agent provides said real-time response, which electronic agent exhibits characteristics that are adjusted by said software routines executing at the network server system and/or plurality of servers based on a type of application interacting with the user.
  - 14. The method of claim 12 wherein an electronic agent provides said real-time response, which electronic agent exhibits characteristics that are adjusted by said software routines executing at the network server system and/or plurality of servers based on an identity of the user.
  - 15. The method of claim 12 wherein an electronic agent presented within a browser of the client system provides said real-time response, which electronic agent responds to user queries presented in speech form and assists the user to navigate and select items from an Internet web page.
  - 16. The method of claim 15 wherein said electronic agent further provides one or more specific suggested queries to the user.

17. A speech processing system for recognizing a speech utterance from a user comprising:
- (a) a speech recognition engine;
  
  wherein said speech recognition engine executes on a network server system of a distributed speech processing system;
  
  (b) one or more speech recognition models useable by the speech recognition engine for recognizing speech utterances in a first language;
  
  wherein said one or more speech recognition models have been trained to include additional samples of speech from a group of persons employing said first language and residing in geographic regions served by the distributed speech processing system;
  
  further wherein recognition of speech utterances by the speech recognition engine is optimized for a geographic region by using one or more speech models which include variants of words to be uttered by users of the distributed client-server system.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, further including a natural language engine coupled to the speech recognition engine for recognizing a meaning of said speech utterances.
  - 19. The system of claim 18, further including a database containing query/answer pairs responsive to said speech utterances.
  - 20. The system of claim 17, further including an electronic agent for conducting an interactive dialog with the users.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Phoenix Solutions Incorporated (CDC Corporation)
Inventors
Babu, Bandi Ramesh, Gururaj, Pallaki, Morkhandikar, Kishor, Bennett, Ian M.

Granted Patent

US 7,225,125 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/252
CPC Class Codes

G06F 16/24522   Translation of natural lang...

G06F 40/58   Use of machine translation,...

G10L 15/005   Language recognition

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 17/22   Interactive procedures; Man...

Y10S 707/99935   Query augmenting and refini...

Speech recognition system trained with regional speech characteristics

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system trained with regional speech characteristics

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links