Method for processing speech signal features for streaming transport

US 7,376,556 B2
Filed: 03/02/2004
Issued: 05/20/2008
Est. Priority Date: 11/12/1999
Status: Expired due to Fees

- Alert
- Pin

First Claim

Patent Images

1. A method of formatting speech data for a distributed speech recognition system comprising the steps of:

(a) capturing speech data uttered by a speaker at a client computing device;

(b) extracting acoustic features from said speech data,(c) representing said extracted acoustic features by speech symbols including speech vector data;

(d) converting said speech vector data to a byte stream;

wherein one or more NULL characters are included in said byte stream to indicate a termination of speech data from said client computing device, each of said one or more NULL characters comprising a plurality of zero value data bits;

further wherein other NULL characters present in said byte stream are removed prior to transmitting said byte stream.

View all claims

3 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

Speech signal information is formatted, processed and transported in accordance with a format adapted for TCP/IP protocols used on the Internet and other communications networks. NULL characters are used for indicating the end of a voice segment. The method is useful for distributed speech recognition systems such as a client-server system, typically implemented on an intranet or over the Internet based on user queries at his/her computer, a PDA, or a workstation using a speech input interface.

Citations

23 Claims

1. A method of formatting speech data for a distributed speech recognition system comprising the steps of:
- (a) capturing speech data uttered by a speaker at a client computing device;
  
  (b) extracting acoustic features from said speech data,(c) representing said extracted acoustic features by speech symbols including speech vector data;
  
  (d) converting said speech vector data to a byte stream;
  
  wherein one or more NULL characters are included in said byte stream to indicate a termination of speech data from said client computing device, each of said one or more NULL characters comprising a plurality of zero value data bits;
  
  further wherein other NULL characters present in said byte stream are removed prior to transmitting said byte stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1, wherein said client computing device implements one part of the distributed speech recognition system, and a server computing device coupled through a connected network to said client computing device implements a remainder of the distributed speech recognition system which recognizes words in said speech data.
  - 3. The method of claim 1, wherein speech recognition tasks are allocated between said client computing device and said server computing device based on computing resources available to said client computing device and server computing device respectively.
  - 4. The method of claim 3, wherein said client computing device can be configured to perform speech recognition tasks also performed by said server computing device.
  - 5. The method of claim 3 wherein said speech recognition tasks are allocated automatically between said client computing device and said server computing device.
  - 6. The method of claim 3, wherein said speech recognition tasks are allocated between said client computing device and said server computing device based on a latency of a network coupling said devices.
  - 7. The method of claim 6, wherein steps (a) through (c) are performed by a software plug-in module for a browser program, or a library operating on the client device.
  - 8. The method of claim 7 where the library may be dynamically or statically linked.
  - 9. The method of claim 1, wherein said speech vector data is converted to a byte stream continuously while said speaker is talking.
  - 10. The method of claim 1, wherein said byte stream is communicated to a server computing system in accordance with an Internet transport protocol.
  - 11. The method of claim 10, wherein said Internet transport protocol is Hyper Text Transfer Protocol (HTTP).
  - 12. The method of claim 1, further including a step:
    - monitoring said network at the client computing device for a response from a speech recognition system.
  - 13. The method of claim 1, wherein an amount of data contained in said speech vector data is configured in response to a real-time performance requirement set for the speech recognition system during a speech utterance session with said speaker.
  - 14. The method of claim 1 further including a step:
    - detecting if recognition of said speech data has achieved a predetermined confidence level.
  - 15. The method of claim 1, further including a step:
    - further including a step;
      
      providing said speech data to one or more natural language engines situated on different servers.
  - 16. The method of claim 1 further including a step:
    - specifying that a natural language engine should return multiple results for the speech data.
  - 17. The method of claim 1 further including a step:
    - calibrating speech and silence components of said speech data.
  - 18. The method of claim 1 wherein said speech data is transmitted at a rate of at least 100 frames of speech data per second.

19. A method of formatting speech data for a speech recognition system comprising the steps of:
- capturing speech data at a client computing device, said speech data including query words spoken by a speaker in a speech utterance;
  
  extracting acoustic features including mel frequency cepstral coefficient (MFCC) speech vector data from said speech data on a continuous basis until silence is detected;
  
  converting said speech vector data, while said speech utterance is being spoken by said speaker, to a byte stream suitable for transport across an Internet based network connection;
  
  removing any NULL characters in said byte stream before said speech vector data is transmitted through said Internet based network connection;
  
  adding a NULL character to an end of said byte stream to indicate a termination of speech data from said client computing device.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The method of claim 19, wherein said byte stream is compatible with a Hyper Text Transport Protocol (HTTP).
  - 21. The method of claim 19, wherein said speech vector data further includes RMS energy.
  - 22. The method of claim 21, wherein MFCC delta parameters and MFCC acceleration parameters are also computed and included with said speech vector data.
  - 23. The method of claim 19, wherein an amount of data contained in said MFCC speech vector data is configured in response to a real-time performance requirement set for the speech recognition system during a speech utterance session with said speaker.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Phoenix Solutions Incorporated (CDC Corporation)
Inventors
Bennett, Ian M.
Primary Examiner(s)
Lerner, Martin

Application Number

US10/792,678
Publication Number

US 20040249635A1
Time in Patent Office

1,540 Days
Field of Search

704/210, 704/215, 704/270, 704/270.1, 704/256.8, 704/231
US Class Current

704/215
CPC Class Codes

G06F 16/243   Natural language query form...

G06F 16/24522   Translation of natural lang...

G06F 16/3344   using natural language anal...

G06F 40/216   using statistical methods

G06F 40/237   Lexical tools

G06F 40/30   Semantic analysis

G06F 40/42   Data-driven translation

G06F 40/44   Statistical methods, e.g. p...

G09B 5/04   with audible presentation o...

G09B 7/00   Electrically-operated teach...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/285   Memory allocation or algori...

G10L 15/30   Distributed recognition, e....

H04M 2250/74   with voice recognition means

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Method for processing speech signal features for streaming transport

First Claim

3 Assignments

Litigations

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method for processing speech signal features for streaming transport

First Claim

3 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links