Retaining prosody during speech analysis for later playback

US 5,933,805 A
Filed: 12/13/1996
Issued: 08/03/1999
Est. Priority Date: 12/13/1996
Status: Expired due to Term

First Claim

Patent Images

1. A method of communicating speech signals comprising the steps of:

storing at a first location a plurality of input voice fonts, each input voice font comprising information describing a plurality of speech segments, each speech segment identified by a segment ID;

selecting one of the plurality of input voice fonts;

designating one of a plurality of voice fonts to be used as an output voice font;

receiving an analog speech signal, said analog speech signal comprising a plurality of speech segments;

digitizing the analog speech signal;

identifying each of the plurality of speech segments in the received speech signal;

measuring one or more prosodic parameters for each of said identified segments in relation to the segments of the selected input voice font; and

transmitting a data signal from the first location to a second location, said data signal comprising segment IDs, values of the measured prosodic parameters of the speech segments in the received speech signal, and an output voice font ID identifying the designated output voice font;

storing at the second location a plurality of output voice fonts, each output voice font comprising information describing a plurality of speech segments, each speech segment identified by a segment ID;

receiving the transmitted data signal at the second location;

identifying in said received data signal the segment IDs, the values of the measured prosodic parameters, and the designated output voice font corresponding to the received output voice font ID;

selecting, in the designated output voice font, the information describing a plurality of speech segments corresponding to the received segment IDs;

modifying the selected speech segment information according to the received values of the corresponding prosodic parameters; and

generating a speech signal based on the modified speech segment information.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech system includes a speech encoding system and a speech decoding system. The speech encoding system includes a speech analyzer for identifying each of the speech segments (i.e., phonemes) in the received digitized speech signal. A pitch detector, a duration detector, and an amplitude detector are each coupled to the memory and the analyzer and detect various prosodic parameters of each received speech segment. A speech encoder generates a data signal that includes the speech segment IDs and the values of the corresponding prosodic parameters. The speech decoding system includes a digital data decoder and a speech synthesizer for generating a speech signal based on the segment IDs and prosodic parameter values.

108 Citations

View as Search Results

10 Claims

1. A method of communicating speech signals comprising the steps of:
- storing at a first location a plurality of input voice fonts, each input voice font comprising information describing a plurality of speech segments, each speech segment identified by a segment ID;
  
  selecting one of the plurality of input voice fonts;
  
  designating one of a plurality of voice fonts to be used as an output voice font;
  
  receiving an analog speech signal, said analog speech signal comprising a plurality of speech segments;
  
  digitizing the analog speech signal;
  
  identifying each of the plurality of speech segments in the received speech signal;
  
  measuring one or more prosodic parameters for each of said identified segments in relation to the segments of the selected input voice font; and
  
  transmitting a data signal from the first location to a second location, said data signal comprising segment IDs, values of the measured prosodic parameters of the speech segments in the received speech signal, and an output voice font ID identifying the designated output voice font;
  
  storing at the second location a plurality of output voice fonts, each output voice font comprising information describing a plurality of speech segments, each speech segment identified by a segment ID;
  
  receiving the transmitted data signal at the second location;
  
  identifying in said received data signal the segment IDs, the values of the measured prosodic parameters, and the designated output voice font corresponding to the received output voice font ID;
  
  selecting, in the designated output voice font, the information describing a plurality of speech segments corresponding to the received segment IDs;
  
  modifying the selected speech segment information according to the received values of the corresponding prosodic parameters; and
  
  generating a speech signal based on the modified speech segment information.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein the output voice font is the same as the input voice font.
  - 3. The method of claim 1 wherein the output voice font is different from the input voice font.
  - 4. The method of claim 1 wherein said step of measuring one or more prosodic parameters for each of said segments comprises the steps of:
    - measuring the pitch for each of said segments;
      
      measuring the duration for each of said segments; and
      
      measuring the amplitude for each of said segments.
  - 5. The method of claim 1 wherein said step of receiving an analog speech signal comprises the step of receiving an analog speech signal, said analog speech signal comprising a plurality of phonemes.

6. An apparatus for encoding speech signals comprising:
- a memory storing a plurality of voice fonts, each said voice font comprising a digitized pattern for each of a plurality of speech segments, each speech segment identified by a segment ID;
  
  an A/D converter adapted to receive an analog speech signal and having an output;
  
  a speech analyzer coupled to said memory and said A/D converter, said speech analyzer adapted to receive a digitized speech signal and identify each of the segments in the digitized speech signal based on a selected one of said voice fonts, said speech analyzer adapted to output the segment ID for each of said identified speech segments;
  
  one or more prosodic parameter detectors coupled to said memory and said speech analyzer, said detectors adapted to measure values of the prosodic parameters of each received digitized speech segment; and
  
  a data encoder coupled to said speech analyzer and adapted to generate a digital data signal for transmission or storage, said digital data signal comprising a segment ID and the measured values of the corresponding measured prosodic parameters for each of the identified speech segments and a voice font ID identifying one of a plurality of output voice fonts for use in regenerating the speech signal.

7. A computer for encoding speech signals comprising:
- a CPU;
  
  an audio input device adapted to receive an analog audio or speech signal and having an output;
  
  an A/D converter having an input coupled to the output of said audio input device and an output coupled to said CPU;
  
  a memory coupled to said CPU, said memory storing software and a plurality of voice fonts, each voice font comprising a digitized pattern and a corresponding segment ID for each of a plurality of speech segments; and
  
  said CPU being adapted to;
  
  identify, using a selected one of said voice fonts as an input voice font, each of a plurality of speech segments in a received digitized speech signal;
  
  measure one or more prosodic parameters for each of the identified segments; and
  
  generate a data signal comprising segment IDs and values of the measured prosodic parameters of each of the identified speech segments and a voice font ID designating one of a plurality of voice fonts to be used as an output voice font for use in regenerating the speech signal.
- View Dependent Claims (8)
- - 8. The computer of claim 7 wherein said audio input device comprises a microphone.

9. An apparatus for decoding speech signals comprising:
- a memory storing a plurality of output voice fonts, each output voice font comprising a digitized pattern for each of a plurality of speech segments, each speech segment identified by a segment ID;
  
  a data decoder coupled to said memory and receiving a digital data stream from a transmission medium, said decoder identifying in the received data stream a voice font ID designating one of a plurality of voice fonts to be used as an output voice font, a segment ID and values of one or more corresponding prosodic parameters for each of the plurality of speech segments in the received data stream;
  
  a speech synthesizer coupled to said memory and said decoder, said synthesizer selecting digitized patterns in the designated output voice font corresponding to the identified segment IDs, modifying the selected digitized patterns according to the values of the corresponding prosodic parameters, and outputting the modified speech patterns to generate a speech signal.

10. A method of speech encoding comprising the steps of:
- selecting one of a plurality of voice fonts to be used as an input voice font;
  
  designating one of a plurality of voice fonts to be used as an output voice font, said output voice font being different from said input voice font;
  
  receiving an analog speech signal, said analog speech signal comprising a plurality of speech segments;
  
  digitizing the analog speech signal;
  
  identifying each of the plurality of speech segments in the received speech signal;
  
  measuring one or more prosodic parameters for each of said identified segments in relation to segments of the selected input voice font;
  
  outputting a data signal comprising a voice font ID identifying the designated output voice font, segment IDs and values of the measured prosodic parameters of the speech segments in the received speech signal;
  
  receiving the data signal; and
  
  generating a speech signal using the designated output voice font based on the segment IDs and the values of the measured prosodic parameters in the data signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Iyengar, Sridhar, Boss, Dale, Dennis, T. Don
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US08/764,961
Time in Patent Office

963 Days
Field of Search

704/249, 704/257, 704/207, 704/258, 704/201, 704/223, 704/209
US Class Current

704/249
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 19/0018   Speech coding using phoneti...

G10L 19/09   Long term prediction, i.e. ...

Retaining prosody during speech analysis for later playback

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

108 Citations

10 Claims

Specification

Use Cases

Quick Links

Others

Retaining prosody during speech analysis for later playback

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

108 Citations

10 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others