Context dependent phoneme networks for encoding speech information
First Claim
Patent Images
1. A method for encoding speech information comprising:
- generating at a local user location, as an intermediate step in speech recognition, a context dependent phoneme network from speech in a phoneme network generator using an acoustic model that adapts to a user'"'"'s voice, wherein the context dependent phoneme network is a representation of speech input in the form of nodes and arcs, each arc representing a score of a phoneme with start and end times represented by nodes, the phoneme network enabling the speech input to be represented by the nodes and arcs thereby resulting in the speech input being packaged into an intermediate format that is independent of vocabulary, language model, user and physical environment; and
transmitting the context dependent phoneme network to one or more application programs located remotely from the local user, to enable the remote application programs to effect recognition of speech in each application program using a vocabulary and language model selected by the application program, thereby obviating the need for the local user location to perform recognition of speech tasks.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for generating a context dependent phoneme network as an intermediate step of encoding speech information. The context dependent phoneme network is generated from speech in a phoneme network generator (48) associated with an operating system (44). The context dependent phoneme network is then transmitted to a first application (52).
141 Citations
5 Claims
-
1. A method for encoding speech information comprising:
-
generating at a local user location, as an intermediate step in speech recognition, a context dependent phoneme network from speech in a phoneme network generator using an acoustic model that adapts to a user'"'"'s voice, wherein the context dependent phoneme network is a representation of speech input in the form of nodes and arcs, each arc representing a score of a phoneme with start and end times represented by nodes, the phoneme network enabling the speech input to be represented by the nodes and arcs thereby resulting in the speech input being packaged into an intermediate format that is independent of vocabulary, language model, user and physical environment; and
transmitting the context dependent phoneme network to one or more application programs located remotely from the local user, to enable the remote application programs to effect recognition of speech in each application program using a vocabulary and language model selected by the application program, thereby obviating the need for the local user location to perform recognition of speech tasks. - View Dependent Claims (2)
-
-
3. A data storage medium comprising instructions and data which, when loaded into a first general purpose microprocessor having an operating system cause the first general purpose microprocessor to comprise:
-
a phoneme network generator located at a local user location generating a context dependent phoneme network having an output defining the context dependent phoneme network, wherein the context dependent phoneme network enables the speech input to be represented in the form of nodes and arcs, where each arc represents a score of a phoneme with start and end times represented by nodes, thereby resulting in the speech input being packaged in an intermediate format; and
a plurality of application programs located remotely from the local user location adapted to receive the output of the phoneme network generator and extract information needed from the output using vocabulary and language models of the plurality of application programs thereby eliminating information from being extracted at the local user location, the phoneme network generator and the plurality of application programs being independently associated with the operating system. - View Dependent Claims (4)
-
-
5. A method for encoding speech information comprising:
-
generating at a local user location a context dependent phoneme network from speech in a phoneme network generator associated with an operating system, wherein the context dependent phoneme network is a representation of speech input in the form of nodes and arcs, where each arc represents a score of a phoneme with start and end times represented by nodes, thereby packaging the speech input in an intermediate format;
transmitting the context dependent phoneme network to a plurality of applications located remotely from the local user location via the operating system; and
extracting, at the remotely located plurality of applications, information needed from the context dependent phoneme network using vocabulary and language models of the plurality of applications in order to operate the plurality of applications.
-
Specification