Creating speech models

US 5,832,441 A
Filed: 09/16/1996
Issued: 11/03/1998
Est. Priority Date: 09/16/1996
Status: Expired due to Term

First Claim

Patent Images

1. A method for selecting human speech samples for a speech model of human speech, the speech model including audio data specific to a particular sound in human speech, comprising the steps of:

presenting a graphic representing a human speech sample in a first area of a user interface on a computer display;

responsive to user input, marking a segment of the graphic, the marked segment of the graphic representing a portion of the human speech sample;

responsive to user input, playing the portion of the human speech sample represented by the marked segment; and

selecting the portion of the human speech sample for inclusion in the speech model,wherein the human speech sample is used for evaluating the accuracy of a later produced human speech sample as the particular sound.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Selecting human speech samples for a speech model of human speech is preformed. The system presents a graphic representing a human speech sample on a computer display, e.g., an amplitude vs. time graph of the speech sample. Through user input, the system marks a segment of the graphic. The marked segment of the graphic represents a portion of the human speech sample. The system plays the portion of the human speech sample represented by the marked segment back to the user to allow the user to determine its acceptability for inclusion in the speech model. If so indicated by the user, the portion of the human speech sample represented by the marked segment is selected for inclusion in the speech model. The system also analyzes the portion of the human speech sample represented by the marked segment for acoustic properties. These properties are presented to the user in a graphic of the analyzed portion representative of the acoustic properties, e.g., a spectral analysis of the sample graphed as a set of spectral lines. Thus, the user can select the analyzed portion for inclusion in the speech model due to the presence of desired acoustic properties in the analyzed portion.

40 Citations

View as Search Results

21 Claims

1. A method for selecting human speech samples for a speech model of human speech, the speech model including audio data specific to a particular sound in human speech, comprising the steps of:
- presenting a graphic representing a human speech sample in a first area of a user interface on a computer display;
  
  responsive to user input, marking a segment of the graphic, the marked segment of the graphic representing a portion of the human speech sample;
  
  responsive to user input, playing the portion of the human speech sample represented by the marked segment; and
  
  selecting the portion of the human speech sample for inclusion in the speech model,wherein the human speech sample is used for evaluating the accuracy of a later produced human speech sample as the particular sound.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method as recited in claim 1, further comprising the steps of:
    - analyzing the portion of the human speech sample represented by the marked segment for acoustic properties;
      
      presenting a graphic of the analyzed portion representative of the acoustic properties in a second area of the user interface;
      
      wherein the graphic of the analyzed portion depicts different acoustic properties than presented in the marked section.
  - 3. The method as recited in claim 2 wherein the graphic representing the speech sample is an amplitude versus time graph of the speech sample and the graphic of the analyzed portion is a graph of spectral lines of the portion of the speech sample represented by the marked segment.
  - 4. The method as recited in claim 2, further comprising the steps of:
    - searching for an existing speech model;
      
      presenting a graphic of the existing speech model in the second area of the user interface in a different manner than the graphic of the analyzed portion.
  - 5. The method as recited in claim 1, wherein portions of a plurality of speech samples each portion containing audio data for the particular sound comprise the speech model.
  - 6. The method as recited in claim 5, further comprising the steps of:
    - storing a first speech sample selected for inclusion in the speech model;
      
      comparing elements of a second speech sample to corresponding elements of the first speech sample; and
      
      storing those elements of the second speech sample which diverge from the elements of the first speech sample by a prescribed amount with the first speech sample.
  - 7. The method as recited in claim 4 wherein the prescribed amount of divergence is an adjustable value through the user interface.
  - 8. The method as recited in claim 1 wherein the speech model is for a phoneme.

9. A system including processor, memory, display and input devices for selecting human speech samples for a speech model of human speech, the speech model including audio data specific to a particular sound in human speech comprising:
- means for presenting a graphic representing acoustic values of a speech sample in a first area of a user interface on the display;
  
  means responsive to user input for marking a segment of the graphic, the marked segment of the graphic representing a portion of the speech sample;
  
  means for analyzing the portion of the speech sample represented by the marked segment for acoustic properties different from the acoustic values;
  
  means for presenting a graphic of the analyzed portion representative of the acoustic properties in a second area of the user interface; and
  
  means for selecting the analyzed portion for inclusion in the speech model.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The system as recited in claim 9, further comprising means responsive to user input for playing the portion of the speech sample represented by the marked segment.
  - 11. The system as recited in claim 9 further comprising:
    - means for analyzing the speech sample for desired acoustic properties; and
      
      means responsive to identifying desired acoustic properties in the speech sample for marking a segment of the graphic corresponding to the portion of the speech sample with the desired acoustic properties.
  - 12. The system as recited in claim 9 wherein elements from a plurality of speech samples are added to the speech model and are compacted according to a compaction threshold.
  - 13. The system as recited in claim 9 wherein one of the input devices is a microphone and the system further comprises:
    - means for generating a real time graphic of a speech sample as captured from the microphone; and
      
      means for correcting the real time graphic to produce a corrected graphic according to frames which were missing during the generation of the real time graphic.
  - 14. The system as recited in claim 9 wherein one of the input devices is a pointing device and wherein the means for marking the segment of the graphic are two vertical markers which are independently manipulated through pointing device input.

15. A computer program product in a computer readable medium for selecting human speech samples for a speech model of human speech, the speech model including audio data specific to a particular sound in human speech, comprising:
- means for presenting a graphic representing acoustic values of a speech sample in a first area of a user interface on the display;
  
  means for analyzing the speech sample for desired acoustic properties;
  
  means for presenting a graphic of an analyzed portion representative of the desired acoustic properties in a second area of the user interface, wherein the desired acoustic properties are different from acoustic values presented in the first area; and
  
  means for including the speech sample in the speech model.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The product as recited in claim 15 further comprising means responsive to user input for marking a segment of the graphic, the marked segment of the graphic representing a portion of the speech sample wherein the analyzing means analyzes the portion of the speech sample and the including means includes the portion of the speech sample in the speech model.
  - 17. The product as recited in claim 16, further comprising:
    - means for searching for an existing speech model;
      
      means for presenting a graphic of the existing speech model in the second area of the user interface in a different manner than the graphic of the analyzed portion.
  - 18. The product as recited in claim 16 further comprising means for displaying detected pitch in the speech sample in a different manner from portions of the speech sample where no pitch is detected.
  - 19. The product as recited in claim 15, further comprising means responsive to user input for playing the speech sample.
  - 20. The product as recited in claim 15 further comprising means for compacting a plurality of speech samples in the speech model.
  - 21. The product as recited in claim 15 further comprising means for displaying a graphic of an existing speech model concurrently with the graphics in the first and second areas.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Pinera, Carlos Victor, Mahaffey, Robert Bruce, Laws, Catherine Keefauver, Aaron, Joseph David, Brunet, Peter Thomas
Primary Examiner(s)
Dorvil, Richemond

Application Number

US08/710,148
Time in Patent Office

778 Days
Field of Search

704/270, 704/278, 704/260, 704/251, 704/254, 704/243, 704/200, 704/231, 704/267, 704/244, 704/245, 704/275, 704/255, 704/258, 704/271, 704/276, 704/272
US Class Current

704/276
CPC Class Codes

G10L 21/06 Transformation of speech in...

Creating speech models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

40 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Creating speech models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

40 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links