Signal processing method and system utilizing logical speech boundaries
First Claim
1. A method of processing speech information comprising steps of:
- corverting analog voice signals representative of sound of a sequence of spoken words into digital voice signals representative of said sound of said sequence of spoken words;
analyzing said digital voice signals representative of said sound of said sequence of spoken words to detect signal segments representative of isolated words within said sequence of spoken words;
segmenting said digital voice signals representative of said sound of said sequence of spoken words at least partially based upon said detection of said signal segments representative of isolated words, thereby forming frames of digital voice signals; and
data compressing said digital voice signals within said frames, said compressed digital voice signals within said frames having phonetic information that substantially preserves individual sounds of said isolated word a within said sequence of spoken words.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and system of processing speech information includes segmenting the speech information based upon detection of logical speech boundaries, such as isolated words, prior to compressing and/or transmitting the speech information. In one embodiment, a continuous stream of voice data is analyzed to detect signal segments containing the characteristics of an isolated word, thereby forming frames of speech information. The frames are data compressed to form packets that are transmitted to a remote site. Preferably, the packets include error checking information. In a receive mode, incoming packets are error checked prior to packet decoding. If transmission errors are detected, repairable packets may be corrected. Non-correctable errors cause generation of notice data that are used to notify a listener of the location of lost speech information. Notice data are also generated if the duration between two arriving packets exceeds a preselected threshold.
-
Citations
17 Claims
-
1. A method of processing speech information comprising steps of:
-
corverting analog voice signals representative of sound of a sequence of spoken words into digital voice signals representative of said sound of said sequence of spoken words; analyzing said digital voice signals representative of said sound of said sequence of spoken words to detect signal segments representative of isolated words within said sequence of spoken words; segmenting said digital voice signals representative of said sound of said sequence of spoken words at least partially based upon said detection of said signal segments representative of isolated words, thereby forming frames of digital voice signals; and data compressing said digital voice signals within said frames, said compressed digital voice signals within said frames having phonetic information that substantially preserves individual sounds of said isolated word a within said sequence of spoken words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of processing speech information for real-time voice communications comprising steps of:
-
generating digital voice signals from analog voice signals in response to a voice input of a sequence of words, said digital voice signals containing phonetic information that is representative of individual sounds of said voice input; analyzing said digital voice signals to recognize logical speech boundaries relating to said sequence of words; establishing signal segments of said digital voice signals based upon said logical speech boundaries and a threshold time, including forming said signal segments based upon limiting each signal segment to containing the lesser of data specific to a detected isolated word contained in said voice input that is defined by said logical speech boundaries and data generated during passage of said threshold time; compressing said digital voice signals within each of said signal segments of said digital voice signals, said compressed digital voice signals within each of said signal segments being in a form to substantially preserve said phonetic information that is representative of said individual sounds of said voice input; and transmitting said signal segments of said compressed digital voice signals to a remote site. - View Dependent Claims (11, 12, 13)
-
-
14. A system for processing speech information comprising:
-
a speech input device for receiving an analog voice input; a signal generator responsive to said speech input device for forming digital voice signals at an output, said digital voice signals containing phonetic information that is representative of individual sounds of said analog voice input; speech recognition means coupled to said output of said signal generator for detecting signal segments within said digital voice signals that represent isolated words, said speech recognition means being configured to form said signal segments based upon limiting each signal segment to containing the lesser of data specific to an isolated word contained in said analog voice input and data generated during passage of a threshold time, said speech recognition means maintaining said digital voice signals as containing said phonetic information that is representative of said individual sounds of said analog voice input; and compression means, connected to said speech recognition means, for compressing said digital voice signals that are within said signal segments while maintaining said digital voice signals to contain said phonetic information that is representative of said individual sounds of said analog voice input. - View Dependent Claims (15, 16, 17)
-
Specification