Method for processing speech signal features for streaming transport
First Claim
Patent Images
1. A method of formatting speech data for a distributed speech recognition system comprising the steps of:
- (a) capturing speech data uttered by a speaker at a client computing device;
(b) extracting acoustic features from said speech data, (c) representing said extracted acoustic features by speech symbols including speech vector data;
(d) converting said speech vector data to a byte stream;
wherein a NULL character is included in said byte stream to indicate a termination of speech data from said client computing device.
3 Assignments
0 Petitions
Accused Products
Abstract
Speech signal information is formatted, processed and transported in accordance with a format adapted for TCP/IP protocols used on the Internet and other communications networks. NULL characters are used for indicating the end of a voice segment. The method is useful for distributed speech recognition systems such as a client-server system, typically implemented on an intranet or over the Internet based on user queries at his/her computer, a PDA, or a workstation using a speech input interface.
121 Citations
37 Claims
-
1. A method of formatting speech data for a distributed speech recognition system comprising the steps of:
-
(a) capturing speech data uttered by a speaker at a client computing device;
(b) extracting acoustic features from said speech data, (c) representing said extracted acoustic features by speech symbols including speech vector data;
(d) converting said speech vector data to a byte stream;
wherein a NULL character is included in said byte stream to indicate a termination of speech data from said client computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of formatting speech data for a speech recognition system comprising the steps of:
-
capturing speech data at a client computing device, said speech data including query words spoken by a speaker in a speech utterance;
extracting acoustic features including mel frequency cepstral coefficient (MFCC) speech vector data from said speech data on a continuous basis until silence is detected;
converting said speech vector data, while said speech utterance is being spoken by said speaker, to a byte stream suitable for transport across an Internet based network connection;
removing any NULL characters in said byte stream before said speech vector data is transmitted through said Internet based network connection;
adding a NULL character to an end of said byte stream to indicate a termination of speech data from said client computing device. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A method of transmitting speech data for a distributed speech recognition system comprising the steps of:
-
capturing speech data at a client computing device, said speech data including words spoken by a speaker;
extracting acoustic features including speech vector data from said speech data on a continuous basis as said words are spoken;
encoding said speech vector data into a format adapted for transport across an Internet based network connection;
wherein at least one NULL character is added to said byte stream when silence is detected;
transmitting said speech vector data through said Internet based network connection as a stream of bytes for further speech recognition processing by a speech recognition engine located on a server computing device. - View Dependent Claims (21, 22, 23, 24, 25)
-
-
26. A method of processing speech data for a distributed speech query recognition system comprising the steps of:
-
establishing a network connection between a server computing system and a client device suitable for transporting a streaming communication;
receiving a data stream containing speech vector data from the client device, said speech vector data representing acoustic features of speech data and being characterized by a data content insufficient to recognize words;
wherein said data stream includes a NULL character used to identify a silence in speech data from said client device;
further processing said speech vector data at said server computing system to generate additional speech feature related content and identify words in said speech data. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
Specification