Partial speech processing device and method for use in distributed systems

US 7,729,904 B2
Filed: 12/03/2004
Issued: 06/01/2010
Est. Priority Date: 11/12/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A portable speech processing device incorporated within a personal digital assistant or cellphone for use in a distributed speech recognition system for processing a speech utterance comprising:

a first signal processing circuit adapted to generate a first set of speech data values from speech utterance signals associated with the speech utterance, wherein said first set of speech data values have a limited data content and are compressed;

a transmission circuit that formats the first set of speech data values by removing NULL data in the first set of speech data values and inserting a single NULL character to denote end of speech and transmits the first set of speech data values over a communications channel to a second signal processing circuit;

said transmission circuit being adapted to transmit said speech data values over said communications channel in response to a designated button for speech queries being pressed on the portable speech processing device;

wherein the portable speech processing device is configured so that said first set of speech data values can be sent in a data stream over said channel, during periods when silence is not detected, to a server system which includes a second signal processing circuit which can perform a full recognition of text words in the speech utterance as well as a natural language engine for performing a recognition of a meaning of a sentence presented in said text words;

said full recognition being performed subject to a confidence level provided to said server system by the portable speech processing device.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A client device incorporates partial speech recognition for recognizing a spoken query by a user. The full recognition process is distributed over a client/server architecture, so that the amount of partial recognition signal processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, etc. Partially processed speech data from the client device can be streamed to a server for a real-time response. Additional natural language processing operations can also be performed to implement sentence recognition functionality.

Citations

20 Claims

1. A portable speech processing device incorporated within a personal digital assistant or cellphone for use in a distributed speech recognition system for processing a speech utterance comprising:
- a first signal processing circuit adapted to generate a first set of speech data values from speech utterance signals associated with the speech utterance, wherein said first set of speech data values have a limited data content and are compressed;
  
  a transmission circuit that formats the first set of speech data values by removing NULL data in the first set of speech data values and inserting a single NULL character to denote end of speech and transmits the first set of speech data values over a communications channel to a second signal processing circuit;
  
  said transmission circuit being adapted to transmit said speech data values over said communications channel in response to a designated button for speech queries being pressed on the portable speech processing device;
  
  wherein the portable speech processing device is configured so that said first set of speech data values can be sent in a data stream over said channel, during periods when silence is not detected, to a server system which includes a second signal processing circuit which can perform a full recognition of text words in the speech utterance as well as a natural language engine for performing a recognition of a meaning of a sentence presented in said text words;
  
  said full recognition being performed subject to a confidence level provided to said server system by the portable speech processing device.
- View Dependent Claims (2, 3, 4)
- - 2. The portable speech processing device of claim 1, including a routine for permitting a speaker to present said speech utterance simultaneously to one or more different natural language engines.
  - 3. The portable speech processing device of claim 2, wherein said different natural language engines are located on different respective servers.
  - 4. The portable speech processing device of claim 1, wherein the transmission circuit transfers said first set of speech data values with a content in an amount which is determined automatically to reduce latency.

5. A portable speech processing device incorporated within a personal digital assistant or cellphone for use in a distributed speech recognition system for processing a speech utterance, comprising:
- a first signal processing circuit that generates a first set of speech data values from speech utterance signals associated with the speech utterance, the first set of speech data values being compressed and having a limited data content;
  
  a transmission circuit that formats the first set of speech data values by removing NULL data in the first set of speech data values and inserting a single NULL character to denote end of speech and transmits the first set of speech data values over a communications channel to a second signal processing circuit; and
  
  a designated button on the portable speech processing device coupled to the transmission circuit such that when the button is pressed, the first set of speech data values are transmitted;
  
  wherein the portable speech processing device is configured such that the first set of speech data values are sent in a data stream over the channel during periods when silence is not detected to a server system that includes a second signal processing circuit capable of performing a full recognition of text words in the speech utterance and a natural language engine for performing a recognition of a meaning of a sentence presented in the text words.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 6. The portable speech processing device of claim 5, wherein the first set of speech data values are transmitted to the server system using hypertext transfer protocol (HTTP).
  - 7. The portable speech processing device of claim 5, wherein the sentence presented in the text words can include one of a number of predefined sentences recognizable by the natural language engine, such that the sentence is recognized by identifying a candidate set of potential sentences from the number of predefined sentences corresponding to the sentence, and then comparing each entry in the candidate set of potential sentences to the sentence to determine a matching recognized sentence.
  - 8. The portable speech processing device of claim 7, wherein the sentence is compared against the candidate set of potential sentences by examining noun phrases.
  - 9. The portable speech processing device of claim 8, wherein the candidate set of potential sentences are determined in part by a context dictionary loaded by the natural language engine in response to an operating environment presented by the system to a user.
  - 10. The portable speech processing device of claim 5, further comprising a routine for permitting a speaker to present the speech utterance simultaneously to a plurality of different natural language engines.
  - 11. The portable speech processing device of claim 10, wherein the plurality of different natural language engines are located on different respective servers.
  - 12. The portable speech processing device of claim 5, wherein the limited data content of the first set of speech data values is configured based on a processing ability of the portable speech processing device.
  - 13. The portable speech processing device of claim 12, wherein the first set of speech data values are MFCC vector coefficients, and the second signal processing circuit generates MFCC delta coefficients and MFCC acceleration coefficients derived from the MFCC vector coefficients.
  - 14. The portable speech processing device of claim 5, wherein the transmission circuit transfers the first set of speech data values with a content in an amount which is determined automatically to reduce latency.
  - 15. The portable speech processing device of claim 5, further comprising a text to speech engine adapted to provide an articulated response to the speech utterance.
  - 16. The portable speech processing device of claim 5, further comprising a routine for controlling an interactive character agent presented to a user for assisting in handling the speech utterance.

17. A distributed speech recognition system, comprising:
- a portable speech processing device incorporated within a personal digital assistant or cellphone includinga first signal processing circuit that generates a first set of speech data values from speech utterance signals, the first set of speech data values being compressed and having a limited data content;
  
  a transmission circuit that formats the first set of speech data values by removing NULL data in the first set of speech data values and inserting a single NULL character to denote end of speech and transmits the first set of speech data values over a communications channel to a second signal processing circuit, anda designated button for speech queries on the portable speech processing device coupled to the transmission circuit such that when the button is pressed, the first signal processing circuit and the transmission circuit are caused, respectively, to generate the first speech data values from speech utterance signals associated with a user utterance made immediately after the button is depressed and to transmit the speech data values,the portable speech processing device being configured such that the first set of speech data values are sent in a data stream over the channel during periods when silence is not detected; and
  
  a server system configured to receive the first set of speech values over the communications channel, the server system havinga second signal processing circuit capable of performing a full recognition of text words in the user utterance, anda natural language engine for performing a recognition of a meaning of a sentence presented in the text words.
- View Dependent Claims (18, 19, 20)
- - 18. The distributed speech recognition system of claim 17, wherein the portable device and the server system communicate by an Internet-based protocol.
  - 19. The distributed speech recognition system of claim 17, wherein the first set of speech data values is transmitted using a hypertext transfer protocol (HTTP).
  - 20. The distributed speech recognition system of claim 17, wherein the server system includes a plurality of natural language engines, and the portable device includes a routine for permitting a speaker to present the user utterance simultaneously to one or more of the plurality of natural language engines.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Phoenix Solutions Incorporated (CDC Corporation)
Inventors
Bennett, Ian M.
Primary Examiner(s)
Lerner, Martin

Application Number

US11/003,085
Publication Number

US 20050086059A1
Time in Patent Office

2,006 Days
Field of Search

704/230, 704/251, 704/255, 704/257, 704/270, 704/270.1, 704/210, 704/215
US Class Current

704/215
CPC Class Codes

G06F 16/243   Natural language query form...

G06F 16/24522   Translation of natural lang...

G06F 16/3344   using natural language anal...

G06F 40/216   using statistical methods

G06F 40/237   Lexical tools

G06F 40/30   Semantic analysis

G06F 40/42   Data-driven translation

G06F 40/44   Statistical methods, e.g. p...

G09B 5/04   with audible presentation o...

G09B 7/00   Electrically-operated teach...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/285   Memory allocation or algori...

G10L 15/30   Distributed recognition, e....

H04M 2250/74   with voice recognition means

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Partial speech processing device and method for use in distributed systems

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Partial speech processing device and method for use in distributed systems

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links