Speech recognition system trained with regional speech characteristics

US 7,225,125 B2
Filed: 01/07/2005
Issued: 05/29/2007
Est. Priority Date: 11/12/1999
Status: Expired due to Term

- Alert
- Pin

Associated Cases

Associated Defendants

First Claim

Patent Images

1. A method of optimizing recognition of a speech utterance from a user with a distributed speech processing system comprising the steps of:

(a) training one or more speech recognition models for recognizing speech utterances in a first natural language in a first training operation;

wherein said speech recognition models are implemented as part of a speech recognition engine executing on a network server system of the distributed speech processing system;

wherein said first training operation is based on samples of speech from a group of persons employing said first natural language and which are communicated over a network to the distributed speech processing system from geographic regions served by the distributed speech processing system, such that said speech recognition models are derived and constituted at least in part at said network server system;

wherein recognition of speech utterances during a speech recognition process is optimized for a geographic region by using one or more speech models which include variants of words to be uttered by users of the distributed speech processing system;

(b) configuring a set of speech recognition operations to be performed by the network server system based on computing resources available to such system.

View all claims

2 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

A speech recognition system uses speech recognition models which are specifically trained and optimized for users residing in a particular geographic area or region. The speech models are trained with samples of word variants expected to be used in a natural language by representative members of a population associated with the geographic region or community of users. The speech recognition system is configured to have a real-time response that imitates a dialogue with a human operator.

499 Citations

28 Claims

1. A method of optimizing recognition of a speech utterance from a user with a distributed speech processing system comprising the steps of:
- (a) training one or more speech recognition models for recognizing speech utterances in a first natural language in a first training operation;
  
  wherein said speech recognition models are implemented as part of a speech recognition engine executing on a network server system of the distributed speech processing system;
  
  wherein said first training operation is based on samples of speech from a group of persons employing said first natural language and which are communicated over a network to the distributed speech processing system from geographic regions served by the distributed speech processing system, such that said speech recognition models are derived and constituted at least in part at said network server system;
  
  wherein recognition of speech utterances during a speech recognition process is optimized for a geographic region by using one or more speech models which include variants of words to be uttered by users of the distributed speech processing system;
  
  (b) configuring a set of speech recognition operations to be performed by the network server system based on computing resources available to such system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein said speech recognition models are Hidden Markov Models.
  - 3. The method of claim 1, further including a step:
    - recognizing a speech utterance from a user by selecting a speech model from said one or more speech recognition models.
  - 4. The method of claim 3, further including a step:
    - providing a response to said speech utterance over a network connection to a client device employed by the user.
  - 5. The method of claim 4, further including a step:
    - converting said response to audible form using a text-to-speech engine.
  - 6. The method of claim 5, wherein said response is voiced by an interactive electronic agent.
  - 7. The method of claim 3, wherein said response to said speech utterance is provided to the user before said speech utterance is completely recognized by a natural language engine.
  - 8. The method of claim 3, further including a step:
    - dynamically switching a grammar to be used by said speech model based on an application being used by the user at the time of said speech utterance.
  - 9. The method of claim 8, further including a step:
    - dynamically switching a dictionary to be used by said speech model based on an application being used by the user at the time of said speech utterance.
  - 10. The method of claim 3, further including a step:
    - calibrating noise at a client device prior to recognizing the speech utterance.
  - 11. The method of claim 1, wherein said first training operation is based on speech data obtained from a plurality of different types of client devices.
  - 12. The method of claim 1, wherein said plurality of different types of client devices have different silence and/or noise characteristics.

13. A method of optimizing recognition of a speech utterance from a user with a distributed speech processing system comprising the steps of:
- (a) receiving first speech data from a client device in streaming packets through a network interface of a network server system and/or plurality of servers, said first speech data resulting from a first set of speech recognition operations being performed on the speech utterance by the client device;
  
  (b) completing recognition of the speech utterance using software routines executing at the network server system and/or plurality of servers which implement a second set of speech recognition operations;
  
  wherein said software routines at the network server system and/or plurality of servers use one or more speech recognition models that are trained based on speech characteristics of a group of persons residing in geographical regions served by the distributed speech processing system;
  
  further wherein said speech characteristics from such group of persons are obtained over said network interface such that said speech recognition models are derived and constituted at least in part at said network server system;
  
  (c) presenting an electronic agent within a browser of the client device, which electronic agent responds to user queries presented in speech form and assists the user to navigate and select items from an Internet web page;
  
  wherein said electronic agent further provides one or more specific suggested queries to the user;
  
  (d) providing a real-time response to the user with said electronic agent based on the speech utterance as well as subsequent speech utterances from the user so that an interactive dialog is conducted by the distributed speech processing system.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The method of claim 13 wherein said electronic agent exhibits characteristics that are adjusted by said software routines executing at the network server system and/or plurality of servers based on a type of application interacting with the user.
  - 15. The method of claim 14, wherein said response is adjusted for users of the distributed speech recognition system by said electronic agent articulating said response audible form with adjusted voice characteristics including one of at least pitch, volume, speed, intonation, gender, age or personality characteristics.
  - 16. The method of claim 13 wherein said electronic agent exhibits characteristics that are adjusted by said software routines executing at the network server system and/or plurality of servers based on an identity of the user.
  - 17. The method of claim 16, wherein said response is adjusted for users of the distributed speech recognition system by said electronic agent articulating said response audible form with adjusted voice characteristics including one of at least pitch, volume, speed, intonation, gender, age or personality characteristics.
  - 18. The method of claim 13, wherein said first training operation is based on speech data obtained from a plurality of different types of client devices.
  - 19. The method of claim 13, wherein said plurality of different types of client devices have different silence and/or noise characteristics.
  - 20. The method of claim 13 wherein said response is adjusted for users of the distributed speech recognition system by said electronic agent articulating said response audible form with adjusted voice characteristics including one of at least pitch, volume, speed, intonation, gender, age or personality characteristics.

21. A speech processing system for recognizing a speech utterance from a user comprising:
- (a) a speech recognition engine;
  
  wherein said speech recognition engine executes on a network server system of a distributed speech processing system;
  
  (b) one or more speech recognition models useable by the speech recognition engine for recognizing speech utterances in a first language;
  
  wherein said one or more speech recognition models have been trained to include additional samples of speech from a group of persons employing said first language that have provided such additional samples over a network to the distributed speech processing system from geographic regions served by the distributed speech processing system such that said speech recognition models are derived and constituted at least in part at said network server system;
  
  further wherein recognition of speech utterances by the speech recognition engine is optimized for a geographic region by using one or more speech models which include variants of words to be uttered by users of the distributed client-server system;
  
  (c) configuring a set of speech recognition operations to be performed by the network server system for recognizing said speech utterances based on computing resources available to such system.
- View Dependent Claims (22, 23, 24, 25, 26, 27)
- - 22. The system of claim 21, further including a natural language engine coupled to the speech recognition engine for recognizing a meaning of said speech utterances.
  - 23. The system of claim 22, further including a database containing query/answer pairs responsive to said speech utterances.
  - 24. The system of claim 21, further including an electronic agent for conducting an interactive dialog with the users.
  - 25. The system of claim 24, wherein said electronic agent articulates a response for users of the distributed speech recognition system in audible form with adjusted voice characteristics including one of at least pitch, volume, speed, intonation, gender, age or personality characteristics.
  - 26. The system of claim 21, wherein said first training operation is based on speech data obtained from a plurality of different types of client devices.
  - 27. The system of claim 21, wherein said plurality of different types of client devices have different silence and/or noise characteristics.

28. A method of optimizing recognition of a speech utterance from a user with a distributed speech processing system comprising the steps of:
- (a) calibrating noise at a client device prior to recognizing speech utterances from such client device;
  
  (b) training one or more speech recognition models for recognizing speech utterances in a first natural language in a first training operation;
  
  wherein said speech recognition models are implemented as part of a speech recognition engine executing on a network server system of the distributed speech processing system;
  
  wherein said first training operation is based on samples of speech from a group of persons employing said first natural language and which are communicated over a network to the distributed speech processing system from geographic regions served by the distributed speech processing system, such that said speech recognition models are derived and constituted at least in part at said network server system;
  
  wherein recognition of speech utterances during a speech recognition process is optimized for a geographic region by using one or more speech models which include variants of words to be uttered by users of the distributed speech processing system;
  
  (c) recognizing a speech utterance from a user by selecting a speech model from said one or more speech recognition models.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Phoenix Solutions Incorporated (CDC Corporation)
Inventors
Bennett, Ian M., Babu, Bandi Ramesh, Morkhandikar, Kishor, Gururaj, Pallaki
Primary Examiner(s)
Lerner, Martin

Application Number

US11/031,207
Publication Number

US 20050144001A1
Time in Patent Office

872 Days
Field of Search

704/4, 704/8, 704/9, 704/270, 704/270.1, 704/275, 704/233, 704/243, 704/246, 704/256.2, 707/3, 707/4
US Class Current

704/233
CPC Class Codes

G06F 16/24522   Translation of natural lang...

G06F 40/58   Use of machine translation,...

G10L 15/005   Language recognition

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 17/22   Interactive procedures; Man...

Y10S 707/99935   Query augmenting and refini...

Speech recognition system trained with regional speech characteristics

First Claim

2 Assignments

Litigations

0 Petitions

Accused Products

Abstract

499 Citations

28 Claims

Specification

Use Cases

Quick Links

Others

Speech recognition system trained with regional speech characteristics

First Claim

2 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

499 Citations

28 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others