Speech recognition for recognizing speaker-independent, continuous speech
First Claim
Patent Images
1. A speech recognition device, comprising:
- an I/O device for accepting a voice stream;
a frequency domain converter communicating with said I/O device, said frequency domain converter converting said voice stream from a time domain to a frequency domain and generating a plurality of frequency domain outputs;
a frequency domain output storage communicating with said frequency domain converter, said frequency domain output storage comprising at least two frequency spectrum frame storages for storing at least a current frequency spectrum frame and a previous frequency spectrum frame, with a frequency spectrum frame storage of said at least two frequency spectrum frame storages comprising a plurality of frequency bins storing said plurality of frequency domain outputs;
a processor communicating with said plurality of frequency bins;
a memory communicating with said processor;
a frequency spectrum difference storage in said memory, with said frequency spectrum difference storage storing one or more frequency spectrum differences calculated as a difference between said current frequency spectrum frame and said previous frequency spectrum frame;
at least one feature storage in said memory for storing at least one feature extracted from said voice stream;
at least one transneme table in said memory, with said at least one transneme table including a plurality of transneme table entries and with a transneme table entry of said plurality of transneme table entries mapping a predetermined frequency spectrum difference to at least one predetermined transneme of a predetermined verbal language;
at least one mappings storage in said memory, with said at least one mappings storage storing one or more found transnemes;
at least one transneme-to-vocabulary database in said memory, with said at least one transneme-to-vocabulary database mapping a set of one or more found transnemes to at least one speech unit of said predetermined verbal language; and
at least one voice stream representation storage in said memory, with said at least one voice stream representation storage storing a voice stream representation created from said one or more found transnemes;
wherein said speech recognition device calculates a frequency spectrum difference between a current frequency spectrum frame and a previous frequency spectrum frame, maps said frequency spectrum difference to a transneme table, and converts said frequency spectrum difference to a transneme if said frequency spectrum difference is greater than a predetermined difference threshold, and creates a digital voice stream representation of said voice stream from one or more transnemes thus produced.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition method and apparatus are provided for converting a voice stream into a digital voice stream representation. A method for performing speech recognition on a voice stream according to a first method embodiment includes the steps of determining one or more candidate transnemes in the voice stream, mapping the one or more candidate transnemes to a transneme table to convert the one or more candidate transnemes to one or more found transnemes, and mapping the one or more found transnemes to a transneme-to-vocabulary database to convert the one or more found transnemes to one or more speech units.
-
Citations
72 Claims
-
1. A speech recognition device, comprising:
-
an I/O device for accepting a voice stream;
a frequency domain converter communicating with said I/O device, said frequency domain converter converting said voice stream from a time domain to a frequency domain and generating a plurality of frequency domain outputs;
a frequency domain output storage communicating with said frequency domain converter, said frequency domain output storage comprising at least two frequency spectrum frame storages for storing at least a current frequency spectrum frame and a previous frequency spectrum frame, with a frequency spectrum frame storage of said at least two frequency spectrum frame storages comprising a plurality of frequency bins storing said plurality of frequency domain outputs;
a processor communicating with said plurality of frequency bins;
a memory communicating with said processor;
a frequency spectrum difference storage in said memory, with said frequency spectrum difference storage storing one or more frequency spectrum differences calculated as a difference between said current frequency spectrum frame and said previous frequency spectrum frame;
at least one feature storage in said memory for storing at least one feature extracted from said voice stream;
at least one transneme table in said memory, with said at least one transneme table including a plurality of transneme table entries and with a transneme table entry of said plurality of transneme table entries mapping a predetermined frequency spectrum difference to at least one predetermined transneme of a predetermined verbal language;
at least one mappings storage in said memory, with said at least one mappings storage storing one or more found transnemes;
at least one transneme-to-vocabulary database in said memory, with said at least one transneme-to-vocabulary database mapping a set of one or more found transnemes to at least one speech unit of said predetermined verbal language; and
at least one voice stream representation storage in said memory, with said at least one voice stream representation storage storing a voice stream representation created from said one or more found transnemes;
wherein said speech recognition device calculates a frequency spectrum difference between a current frequency spectrum frame and a previous frequency spectrum frame, maps said frequency spectrum difference to a transneme table, and converts said frequency spectrum difference to a transneme if said frequency spectrum difference is greater than a predetermined difference threshold, and creates a digital voice stream representation of said voice stream from one or more transnemes thus produced. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
-
-
19. A method for performing speech recognition on a voice stream, comprising the steps of:
-
determining one or more candidate transnemes in said voice stream;
mapping said one or more candidate transnemes to a transneme table to convert said one or more candidate transnemes to one or more found transnemes; and
mapping said one or more found transnemes to a transneme-to-vocabulary database to convert said one or more found transnemes to one or more speech units.
-
-
28. A method for performing speech recognition on a voice stream, comprising the steps of:
-
calculating a frequency spectrum difference between a current frequency spectrum frame and a previous frequency spectrum frame, with said current frequency spectrum frame and said previous frequency spectrum frame being in a frequency domain and being separated by a predetermined time interval; and
mapping said frequency spectrum difference to a transneme table to convert said frequency spectrum difference to at least one transneme if said frequency spectrum difference is greater than a predetermined difference threshold;
wherein a digital voice stream representation of said voice stream is created from one or more transnemes thus produced.
-
-
46. A method for performing speech recognition on a voice stream, comprising the steps of:
-
performing a frequency domain transformation on said voice stream upon a predetermined time interval to create a current frequency spectrum frame;
normalizing said current frequency spectrum frame;
calculating a frequency spectrum difference between said current frequency spectrum frame and a previous frequency spectrum frame;
mapping said frequency spectrum difference to a transneme table to convert said frequency spectrum difference to at least one found transneme if said frequency spectrum difference is greater than a predetermined difference threshold; and
creating a digital voice stream representation of said voice stream from one or more found transnemes thus produced.
-
Specification