Method for processing speech signal features for streaming transport

US 20040249635A1
Filed: 03/02/2004
Published: 12/09/2004
Est. Priority Date: 11/12/1999
Status: Active Grant

First Claim

Patent Images

1. A method of formatting speech data for a distributed speech recognition system comprising the steps of:

(a) capturing speech data uttered by a speaker at a client computing device;

(b) extracting acoustic features from said speech data, (c) representing said extracted acoustic features by speech symbols including speech vector data;

(d) converting said speech vector data to a byte stream;

wherein a NULL character is included in said byte stream to indicate a termination of speech data from said client computing device.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech signal information is formatted, processed and transported in accordance with a format adapted for TCP/IP protocols used on the Internet and other communications networks. NULL characters are used for indicating the end of a voice segment. The method is useful for distributed speech recognition systems such as a client-server system, typically implemented on an intranet or over the Internet based on user queries at his/her computer, a PDA, or a workstation using a speech input interface.

121 Citations

View as Search Results

37 Claims

1. A method of formatting speech data for a distributed speech recognition system comprising the steps of:
- (a) capturing speech data uttered by a speaker at a client computing device;
  
  (b) extracting acoustic features from said speech data, (c) representing said extracted acoustic features by speech symbols including speech vector data;
  
  (d) converting said speech vector data to a byte stream;
  
  wherein a NULL character is included in said byte stream to indicate a termination of speech data from said client computing device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein said client computing device implements one part of the distributed speech recognition system, and a server computing device coupled through a connected network to said client computing device implements a remainder of the distributed speech recognition system which recognizes words in said speech data.
  - 3. The method of claim 2, wherein speech recognition tasks required by the distributed speech recognition system for recognizing words are allocated between said client computing device and said server computing device are tailored on a connection by connection basis.
  - 4. The method of claim 3, wherein said speech recognition tasks are allocated between said client computing device and said server computing device based on computing resources available to said client computing device and server computing device respectively.
  - 5. The method of claim 4, wherein said client computing device can be configured to perform speech recognition tasks also performed by said server computing device.
  - 6. The method of claim 3, wherein said speech recognition tasks are allocated between said client computing device and said server computing device based on a latency of said network.
  - 7. The method of claim 1, wherein said speech vector data is converted to a byte stream continuously while said speaker is talking.
  - 8. The method of claim 1, wherein other NULL characters present in said byte stream are removed prior to transmitting said byte stream.
  - 9. The method of claim 1, wherein said byte stream is communicated to a server computing system in accordance with an Internet transport protocol.
  - 10. The method of claim 6, wherein said Internet transport protocol is Hyper Text Transfer Protocol (HTTP).
  - 11. The method of claim 6, wherein steps (a) through (c) are performed by a software plug-in module for a browser program, or a library operating on the client device.
  - 12. The method of claim 11 where the library may be dynamically or statically linked.

13. A method of formatting speech data for a speech recognition system comprising the steps of:
- capturing speech data at a client computing device, said speech data including query words spoken by a speaker in a speech utterance;
  
  extracting acoustic features including mel frequency cepstral coefficient (MFCC) speech vector data from said speech data on a continuous basis until silence is detected;
  
  converting said speech vector data, while said speech utterance is being spoken by said speaker, to a byte stream suitable for transport across an Internet based network connection;
  
  removing any NULL characters in said byte stream before said speech vector data is transmitted through said Internet based network connection;
  
  adding a NULL character to an end of said byte stream to indicate a termination of speech data from said client computing device.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The method of claim 13, wherein said byte stream is compatible with a Hyper Text Transport Protocol (HTTP).
  - 15. The method of claim 13, wherein said speech vector data further includes RMS energy.
  - 16. The method of claim 15, wherein MFCC delta parameters and MFCC acceleration parameters are also computed and included with said speech vector data.
  - 17. The method of claim 13, wherein an amount of data contained in said MFCC speech vector data is configured in response to a real-time performance requirement set for the speech recognition system during a speech utterance session with said speaker.
  - 18. The method of claim 13, wherein speech recognition tasks used by the distributed speech recognition system for recognizing words are allocated to said server computing device on a connection by connection basis.
  - 19. The method of claim 13, wherein speech recognition tasks used by the distributed speech recognition system for recognizing words are allocated to said client computing device on a connection by connection basis.

20. A method of transmitting speech data for a distributed speech recognition system comprising the steps of:
- capturing speech data at a client computing device, said speech data including words spoken by a speaker;
  
  extracting acoustic features including speech vector data from said speech data on a continuous basis as said words are spoken;
  
  encoding said speech vector data into a format adapted for transport across an Internet based network connection;
  
  wherein at least one NULL character is added to said byte stream when silence is detected;
  
  transmitting said speech vector data through said Internet based network connection as a stream of bytes for further speech recognition processing by a speech recognition engine located on a server computing device.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The method of claim 20, wherein said transmitting occurs while said speaker is speaking said words.
  - 22. The method of claim 20 wherein other NULL characters are removed from said byte stream.
  - 23. The method of claim 20, wherein speech recognition tasks required by the distributed speech recognition system for recognizing words are allocated to said server computing device on a connection by connection basis.
  - 24. The method of claim 20, wherein speech recognition tasks used by the distributed speech recognition system for recognizing words are allocated to said client computing device on a connection by connection basis.
  - 25. The method of claim 20, wherein said server computing device includes a collection of separate interlinked servers.

26. A method of processing speech data for a distributed speech query recognition system comprising the steps of:
- establishing a network connection between a server computing system and a client device suitable for transporting a streaming communication;
  
  receiving a data stream containing speech vector data from the client device, said speech vector data representing acoustic features of speech data and being characterized by a data content insufficient to recognize words;
  
  wherein said data stream includes a NULL character used to identify a silence in speech data from said client device;
  
  further processing said speech vector data at said server computing system to generate additional speech feature related content and identify words in said speech data.
- View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 27. The method of claim 26, further including a step of processing said words using a natural language engine at said server computing system to determine a query sentence or utterance transmitted by the client device.
  - 28. The method of claim 26, wherein speech recognition tasks required by the distributed speech recognition system for recognizing words are allocated to said server computing device on a connection by connection basis.
  - 29. The method of claim 26, wherein speech recognition tasks used by the distributed speech recognition system for recognizing words are allocated to said client computing device on a connection by connection basis.
  - 30. The method of claim 26, further including a step:
    - transmitting a spoken answer or response in the form of answer speech data from said server computing system to said client device in response to a spoken query presented at said client device.
  - 31. The method of claim 26, wherein said speech data only includes NULL characters during periods of silence.
  - 32. The method of claim 26, wherein said network connection is an Internet based connection.
  - 33. The method of claim 32, wherein said Internet based connection is over a wireless channel.
  - 34. The method of claim 32, wherein said Internet based connection is over a digital subscriber line (DSL) channel.
  - 35. The method of claim 26, wherein said words in said speech data are recognized in real time.
  - 36. The method of claim 35, wherein a sentence containing said words is recognized in real time.
  - 37. The method of claim 36, wherein said sentence is recognized before a speech utterance representing said sentence is completed.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Phoenix Solutions Incorporated (CDC Corporation)
Inventors
Bennett, Ian M.

Granted Patent

US 7,376,556 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/222
CPC Class Codes

G06F 16/243   Natural language query form...

G06F 16/24522   Translation of natural lang...

G06F 16/3344   using natural language anal...

G06F 40/216   using statistical methods

G06F 40/237   Lexical tools

G06F 40/30   Semantic analysis

G06F 40/42   Data-driven translation

G06F 40/44   Statistical methods, e.g. p...

G09B 5/04   with audible presentation o...

G09B 7/00   Electrically-operated teach...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/285   Memory allocation or algori...

G10L 15/30   Distributed recognition, e....

H04M 2250/74   with voice recognition mean...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Method for processing speech signal features for streaming transport

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

121 Citations

37 Claims

Specification

Solutions

Use Cases

Quick Links

Method for processing speech signal features for streaming transport

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

121 Citations

37 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links