Representing speech using MIDI
First Claim
1. A method of encoding a speech signal into a MIDI compatible format, comprising the steps of:
- receiving an analog speech signal, said analog speech signal comprising a plurality of speech segments;
digitizing the analog speech signal;
identifying each of the plurality of speech segments in the received speech signal;
measuring one or more prosodic parameters for each of said identified speech segments; and
converting the speech segment identity and corresponding measured prosodic parameters for each of the identified speech segments into a speech signal having a MIDI compatible format.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech encoding system for encoding a digitized speech signal into a standard digital format, such as MIDI. The MIDI speech encoding system includes a memory storing a dictionary comprising a digitized pattern and a corresponding segment ID for each of a plurality of speech segments (i.e., phonemes). A speech analyzer identifies each of the segments in the digitized speech signal based on the dictionary. One or more prosodic parameter detectors measure values of the prosodic parameters of each received digitized speech segment. A MIDI speech encoder converts the segment IDs and the corresponding measured prosodic parameter values into a MIDI speech signal. A MIDI speech decoding system includes a MIDI data decoder and a speech synthesizer for converting the MIDI speech signal to a digitized speech signal.
-
Citations
30 Claims
-
1. A method of encoding a speech signal into a MIDI compatible format, comprising the steps of:
-
receiving an analog speech signal, said analog speech signal comprising a plurality of speech segments; digitizing the analog speech signal; identifying each of the plurality of speech segments in the received speech signal; measuring one or more prosodic parameters for each of said identified speech segments; and converting the speech segment identity and corresponding measured prosodic parameters for each of the identified speech segments into a speech signal having a MIDI compatible format. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method of generating an analog speech signal based on a speech signal in a MIDI compatible format, said method comprising the steps of:
-
storing a dictionary comprising; a) a digitized pattern for each of a plurality of speech segments; and b) a corresponding segment ID identifying each of the digitized segment patterns; receiving a speech signal in a MIDI compatible format; decoding the received speech signal in the MIDI compatible format; converting the received speech signal in the MIDI compatible format into a plurality of speech segment IDs and corresponding prosodic parameter values; selecting speech segment patterns in the dictionary corresponding to the speech segment IDs in the converted received speech signal; modifying the selected speech segment patterns according to the values of the corresponding prosodic parameters in the converted received speech signal; outputting the modified segment patterns to generate a digitized speech signal; and converting the outputted digitized speech signal to an analog format. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
-
26. A computer-readable medium having stored thereon a plurality of instructions including instructions, when executed by a processor result in:
-
identifying and analyzing each of a plurality of speech segments in a digitized speech signal; measuring a plurality of prosodic parameters for each said identified speech segment, said prosodic parameters comprising at least pitch and amplitude; converting the measured prosodic parameters to corresponding MIDI compatible values relating to prosody, including converting each measured pitch value to a corresponding MIDI note number and converting each measured amplitude value to a corresponding MIDI velocity number; and generating a MIDI speech signal comprising an identification of each identified speech segment and the corresponding MIDI compatible values relating to prosody.
-
-
27. A computer-readable medium having stored thereon a plurality of instructions including instructions, when executed by a processor result in:
-
analyzing a MIDI compatible speech signal, said MIDI compatible speech signal comprising a plurality of speech segment IDs and corresponding MIDI compatible values relating to prosody; identifying the plurality of speech segment IDs and corresponding MIDI compatible values relating to prosody in the MIDI speech signal; selecting a digitized speech segment pattern stored in memory corresponding to each of the identified speech segment IDs; modifying the selected digitized speech segment patterns according to the MIDI compatible values relating to prosody; outputting the modified speech segment patterns to generate a digitized speech signal.
-
-
28. An apparatus for encoding an analog speech signal into a MIDI speech signal comprising:
-
a memory storing a dictionary comprising a digitized pattern and a corresponding segment ID for each of a plurality of speech segments; an A/D converter having an input adapted for receiving an analog speech signal and providing a digitized speech signal output; a speech analyzer coupled to said memory and said A/D converter, said speech analyzer adapted to receive a digitized speech signal and identify each of the segments in the digitized speech signal based on said dictionary, said speech analyzer adapted to output the segment ID for each of said identified speech segments; one or more prosodic parameter detectors coupled to said memory and said speech analyzer, said detectors adapted to measure values of the prosodic parameters of each received digitized speech segment; and a MIDI speech encoder coupled to said speech analyzer and said prosodic parameter detectors, said MIDI speech encoder adapted to convert a segment ID and the measured values of the corresponding measured prosodic parameters for each of a plurality of speech segments into a MIDI speech signal.
-
-
29. An apparatus for generating a speech signal from a MIDI speech signal, said apparatus comprising:
-
a MIDI data decoder adapted to receive and decode a MIDI speech signal comprising MIDI compatible speech segment IDs and corresponding MIDI compatible values relating to prosody; a memory adapted to a store a dictionary, said dictionary comprising a plurality of speech segment patterns and speech segment IDs for a plurality of speech segments; a speech synthesizer coupled to the MIDI data decoder and the memory, said speech synthesizer selecting a digitized speech segment pattern stored in the dictionary corresponding to each of the speech segment IDs on the received MIDI compatible speech signal, modifying the selected digitized speech segment patterns according to the MIDI compatible values relating to prosody, and outputting the modified speech segment patterns to generate a digitized speech signal.
-
-
30. A computer for encoding a speech signal into a MIDI signal comprising:
-
a CPU; an audio input device adapted to receive an analog speech signal and having an output; an A/D converter having an input coupled to the output of said audio input device and providing a digitized speech signal output, said converter output coupled to said CPU; a memory coupled to said CPU, said memory storing a dictionary comprising a digitized speech segment pattern and a corresponding segment ID for each of a plurality of speech segments; and said CPU being adapted to; identify, using said dictionary, each of a plurality of speech segments in a received digitized speech signal; measure one or more prosodic parameters for each of the identified segments; and encode the speech segment ID of each identified speech segment and the corresponding measured prosodic parameters into a MIDI signal.
-
Specification