Multi-language speech recognition system

US 9,190,063 B2
Filed: 10/31/2007
Issued: 11/17/2015
Est. Priority Date: 11/12/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method of performing recognition of a speech utterance from a user with a distributed client-server system comprising:

receiving user speech data from a client device in streaming packets through a network interface of a network server system employing an application layer Internet-based protocol overlaid on transmission control protocol (TCP) such that said streaming packets are processed as they are received, said speech data resulting from a first set of speech recognition operations being performed on the speech utterance by a client device;

recognizing the speech utterance as well as a natural language used in said speech utterance using processing routines executing at said network server system which implement a second set of speech recognition operations, wherein recognizing includes converting the speech utterance into text using a Hidden Markov Modeling technique;

sending text corresponding to the speech utterance to a natural language engine and a database engine;

performing linguistic processing of the text at the natural language engine, wherein linguistic processing of the text includes tokenizing the text, tagging one or more tokens, grouping the tagged tokens and storing one or more noun phrases associated with the text;

transferring the one or more noun phrases to the database engine for construction of an SQL query;

providing a response to the user in a same natural language as was recognized; and

adjusting said second set of speech recognition operations based on an automated evaluation of resources available at the network server system and/or the client device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. A number of different speech models for different natural languages are used to support and detect a natural language spoken by a user. In some implementations an interactive electronic agent responds in the user'"'"'s native language to facilitate an real-time, human like dialog.

406 Citations

25 Claims

1. A method of performing recognition of a speech utterance from a user with a distributed client-server system comprising:
- receiving user speech data from a client device in streaming packets through a network interface of a network server system employing an application layer Internet-based protocol overlaid on transmission control protocol (TCP) such that said streaming packets are processed as they are received, said speech data resulting from a first set of speech recognition operations being performed on the speech utterance by a client device;
  
  recognizing the speech utterance as well as a natural language used in said speech utterance using processing routines executing at said network server system which implement a second set of speech recognition operations, wherein recognizing includes converting the speech utterance into text using a Hidden Markov Modeling technique;
  
  sending text corresponding to the speech utterance to a natural language engine and a database engine;
  
  performing linguistic processing of the text at the natural language engine, wherein linguistic processing of the text includes tokenizing the text, tagging one or more tokens, grouping the tagged tokens and storing one or more noun phrases associated with the text;
  
  transferring the one or more noun phrases to the database engine for construction of an SQL query;
  
  providing a response to the user in a same natural language as was recognized; and
  
  adjusting said second set of speech recognition operations based on an automated evaluation of resources available at the network server system and/or the client device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein said response is retrieved from a database containing content responsive to user queries concerning a set of topics.
  - 3. The method of claim 2, wherein said response is articulated by an electronic agent in audible form.
  - 4. The method of claim 3, wherein said electronic agent is programmed with a behavior customized for the user.
  - 5. The method of claim 3, wherein said electronic agent is associated with a web compatible page loaded within a browser.
  - 6. The method of claim 1, wherein said speech utterance controls an interaction by the user with a browser on the client device for accessing content that is also associated with a web page.
  - 7. The method of claim 1 further including a step:
    - dynamically switching a grammar at the network server system in response to the speech utterance, which grammar includes words correlated to links selectable within an electronic page presented by a web server to a user and such grammar is switched to another grammar based on activating a link.
  - 8. The method of claim 1 further including a step:
    - processing said query with a natural language routine adapted to consider only a subset of a first set of words and/or phrases in said query, and which further can consider words and/or phrases in said query which are not present in a set of query/answer pairs to determine said response.
  - 9. The method of claim 1, wherein a speech-based query composed of a list of candidate word phrases are concatenated to search for said response.
  - 10. The method of claim 1, wherein a confidence level for identifying said response as processed by a natural language engine can be specified for the speech utterance.
  - 11. The method of claim 1 wherein said first set of speech recognition operations includes detecting silence in said speech utterance.
  - 12. The method of claim 1, wherein the application level Internet-based protocol comprises hypertext transfer protocol (HTTP).

13. A method of performing recognition of a speech utterance from a user with a distributed client-server system comprising:
- receiving user speech data from a client device in streaming packets through a network interface of a network server system employing an application level Internet based protocol overlaid on transmission control protocol (TCP) such that said streaming packets are processed as they are received, said speech data resulting from a first set of speech recognition operations being performed on the speech utterance by a client device;
  
  recognizing the speech utterance as well as a natural language used in said speech utterance using processing routines executing at said network server system which implement a second set of speech recognition operations, wherein recognizing includes converting the speech utterance into text using a Hidden Markov Modeling technique;
  
  sending text corresponding to the speech utterance to a natural language engine and a database engine;
  
  performing linguistic processing of the text at the natural language engine, wherein linguistic processing of the text includes tokenizing the text, tagging one or more tokens, grouping the tagged tokens and storing one or more noun phrases associated with the text;
  
  transferring the one or more noun phrases to the database engine for construction of an SQL query;
  
  providing a response to the user in a same natural language as was recognized;
  
  automatically adjusting said second set of speech recognition operations based on an automated evaluation of resources available at the network server system and/or the client device; and
  
  automatically adjusting said first set of speech recognition operations based on an automated evaluation of resources available at the client device.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 14. The method of claim 13, wherein automatically adjusting said second set and said first set are performed on a client device by device basis.
  - 15. The method of claim 13, wherein said response is retrieved from a database containing content responsive to user queries concerning a set of topics.
  - 16. The method of claim 13, wherein said response is articulated by an electronic agent in audible form.
  - 17. The method of claim 16, wherein said electronic agent is programmed with a behavior customized for the user.
  - 18. The method of claim 16, wherein said electronic agent is associated with a web compatible page loaded within a browser executing on the client device.
  - 19. The method of claim 13 further including a step:
    - dynamically switching a grammar at the network server system in response to the speech utterance, which grammar includes words correlated to links selectable within an electronic page presented by a web server to a user and such grammar is switched to another grammar based on activating a link.
  - 20. The method of claim 13 further including a step:
    - processing said query with a natural language routine adapted to consider only a subset of a first set of words and/or phrases in said query, and which further can consider words and/or phrases in said query which are not present in a set of query/answer pairs to determine said response.
  - 21. The method of claim 13, wherein a speech-based query composed of a list of candidate word phrases are concatenated to search for said response.
  - 22. The method of claim 13, wherein said speech utterance controls an interaction by the user with a browser on the client device for accessing content that is also associated with a web page.
  - 23. The method of claim 13, wherein a confidence level for identifying said response as processed by a natural language engine can be specified for the speech utterance.
  - 24. The method of claim 13 wherein said first set of speech recognition operations includes detecting silence in said speech utterance.
  - 25. The method of claim 13, wherein the application level Internet-based protocol comprises hypertext transfer protocol (HTTP).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Bennett, Ian, Babu, Bandi Ramesh, Morkhandikar, Kishor, Gururaj, Pallaki
Primary Examiner(s)
Lerner, Martin

Application Number

US11/932,250
Publication Number

US 20080052063A1
Time in Patent Office

2,939 Days
Field of Search

704/8, 704/9, 704/255, 704/257, 704/270, 704/270.1, 704/275, 704/277, 707/706
US Class Current

1/1
CPC Class Codes

G06F 16/24522   Translation of natural lang...

G06F 40/58   Use of machine translation,...

G10L 15/005   Language recognition

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 17/22   Interactive procedures; Man...

Y10S 707/99935   Query augmenting and refini...

Multi-language speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

406 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Multi-language speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

406 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others