Speech synthesis and analysis of dialects

US 5,636,325 A
Filed: 01/05/1994
Issued: 06/03/1997
Est. Priority Date: 11/13/1992
Status: Expired due to Term

First Claim

Patent Images

1. A method of operating a speech synthesis system comprising the steps of:

generating a string of linguistic units containing pitch data by selecting linguistic units from a first memory segment of the system which correspond to characters in a text string and concatenating the selected linguistic units together in a second memory segment of the system;

selecting locations within the pitch data of the string of linguistic units;

retrieving a first set of dialect intervals for a first selected dialect, the first set of dialect intervals selected from a set of melodic intervals as being indicative of the first selected dialect and stored in a dialect table in a third memory segment of the system; and

applying the first set of dialect intervals to the pitch data at the selected locations so that synthesized speech of the first selected dialect produced.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A set of intonation intervals for a chosen dialect are applied to the intonational contour of a phomene string derived from a single set of stored linguistic units, e.g., phonemes. Sets of intonational intervals are stored to simulate or recognize different dialects or languages from a single set of stored phonemes. The interval rules preferably use a prosodic analysis of the phoneme string or other cues to apply a given interval to the phoneme string. A second set of interval data is provided for semantic information. The speech system is based on the observation that each dialect and language possess its own set of musical relationships or intonation intervals. These musical relationships are used by a human listener to identify the particular dialect or language. The speech system may be either a speech synthesis or speech analysis tool or may be a combined speech synthesis/analysis system.

Citations

42 Claims

1. A method of operating a speech synthesis system comprising the steps of:
- generating a string of linguistic units containing pitch data by selecting linguistic units from a first memory segment of the system which correspond to characters in a text string and concatenating the selected linguistic units together in a second memory segment of the system;
  
  selecting locations within the pitch data of the string of linguistic units;
  
  retrieving a first set of dialect intervals for a first selected dialect, the first set of dialect intervals selected from a set of melodic intervals as being indicative of the first selected dialect and stored in a dialect table in a third memory segment of the system; and
  
  applying the first set of dialect intervals to the pitch data at the selected locations so that synthesized speech of the first selected dialect produced.
- View Dependent Claims (2, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19)
- - 2. The method as recited in claim 1 wherein the applying step comprises changing at least one interval at a selected location in the pitch data to at least one dialect interval of the first set of dialect intervals.
  - 9. The method as recited in claim 1 wherein the dialect table includes sets of dialect intervals for a plurality of dialects.
  - 10. The method as recited in claim 1 wherein the dialect table includes a set of dialect intervals for a first language.
  - 11. The method as recited in claim 9 wherein the sets of dialect intervals are based on the diatonic scale.
  - 12. The method as recited in claim 1 which further comprises the steps of:
    - generating prosody data for the string of linguistic units according to prosody rules of the system; and
      
      altering the pitch data within the string of linguistic units according to the prosody data;
      
      wherein the selected locations are chosen within the altered pitch data.
  - 13. The method as recited in claim 1 which further comprises the steps of:
    - selecting a set of keywords located in the text string; and
      
      locating a set of locations which correspond to the keywords in the string of linguistic units;
      
      wherein the selected locations are selected according to locations in the pitch data which correspond to the locations of the set of keywords in the text string.
  - 14. The method as recited in claim 2 which further comprises the steps of:
    - retrieving a second set of dialect intervals for a second selected dialect, the second set of dialect intervals selected from a set of melodic intervals as being indicative of the second selected dialect stored in the dialect table; and
      
      changing at least one melodic interval at a selected location in the pitch data to one of the second set of dialect intervals to produce synthesized speech of the second selected dialect.
  - 16. The method as recited in claim 1 wherein the first dialect is British English and the first set of dialect intervals comprises an octave, a major seventh and a minor seventh.
  - 17. The method as recited in claim 1 wherein the first dialect is a Japanese and the first set of dialect intervals comprises a perfect fifth, a perfect fourth, a major second and a minor second.
  - 18. The method as recited in claim 1 wherein the first dialect is Irish and the first set of dialect intervals comprises a major sixth, a minor sixth and a major third.
  - 19. The method as recited in claim 1 wherein the first dialect is Midwestern English and the first set of dialect intervals comprises a perfect fifth, a major third, a perfect fourth and a minor third.

3. A method of operating a speech recognition system comprising the steps of:
- providing a digitized speech sample of human speech;
  
  selecting a set of melodic intervals in the digitized speech sample;
  
  retrieving a first set of dialect intervals for a first selected dialect, the first set of dialect intervals being melodic intervals which are indicative of the first selected dialect and stored in a dialect table; and
  
  comparing the set of melodic intervals to the first set of dialect intervals to determine whether the digitized speech sample is from human speech of the first selected dialect.
- View Dependent Claims (4, 5, 6, 7, 8, 15)
- - 4. The method as recited in claim 3 which further comprises the step of sending a message to the user interface of the system if there is a match between the set of melodic intervals and the first set of dialect intervals.
  - 5. The method as recited in claim 3 which further comprises the steps of:
    - retrieving a second set of dialect intervals for a second selected dialect;
      
      comparing the set of melodic intervals to the second set of dialect intervals to determine whether the digitized speech sample is from human speech of the second selected dialect; and
      
      ,sending a message to a user interface of the system indicating that there is a match between the set of melodic intervals and the second set of dialect intervals.
  - 6. The method as recited in claim 3 wherein the selecting step comprises identifying a melodic interval in the digitized speech sample which exceeds a predetermined threshold as a melodic interval in the set of melodic intervals.
  - 7. The method as recited in claim 3 which further comprises the steps of:
    - comparing the digitized speech sample with a code book which contains stored speech samples corresponding to phonemes to generate a string of phonemes corresponding to the digitized speech sample; and
      
      comparing the digitized speech sample to pitch data in the string of phonemes to select the set of melodic intervals.
  - 8. The method as recited in claim 3 wherein the selecting step comprises the steps of:
    - analyzing the digitized speech sample to generate prosodic data; and
      
      ,selecting the set of melodic intervals according to the prosodic data.
  - 15. The method as recited in claim 5 which further comprises the steps of:
    - determining a probability of match for the first and second selected dialects; and
      
      ,sending a message to a user interface indicating the probability that the string of linguistic units represents speech of the first or second dialect.

20. A computer program product on a computer readable medium for speech synthesis, the computer program product executable in a computer system comprising:
- program code means for generating a string of linguistic units containing pitch data by selecting linguistic units from a first memory segment of the system which correspond to characters in a text string and concatenating the selected linguistic units together in a second memory segment of the system;
  
  program code means for selecting locations within the pitch data of the string of linguistic units;
  
  program code means for retrieving a first set of dialect intervals for a first selected dialect, the first set of dialect intervals selected from a set of melodic intervals as being indicative of the first selected dialect stored in a dialect table in a third memory segment of the system; and
  
  program code means for applying the first set of dialect intervals to the set of melodic intervals.
- View Dependent Claims (21, 27)
- - 21. The product as recited in claim 20 wherein the applying means changes at least one melodic interval at a selected location in the pitch data to at least one, dialect interval of the first set of dialect intervals.
  - 27. The product as recited in claim 21 wherein the identifying means comprises:
    - program code means for generating prosody data for the string of linguistic units according to prosody rules of the system; and
      
      program code means for altering the pitch data within the string of linguistic units according to the prosody data;
      
      wherein the selected locations are chosen within the altered pitch data.

22. A computer program product in a computer readable medium for speech recognition, the computer program product executable in a computer system, comprising:
- program code means for providing a digitized speech sample of human speech;
  
  program code means for selecting a set of melodic intervals in the digitized speech sample;
  
  program code means for retrieving a first set of dialect intervals for a first selected dialect, the first set of dialect intervals being melodic intervals which are indicative of the first selected dialect and stored in a dialect table in a third memory segment of the system; and
  
  program code means for comparing the set of melodic intervals to the first set of dialect intervals to determine whether the digitized speech sample is from speech of the first selected dialect.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The product as recited in claim 22 which further comprises program code means for sending a message to a user interface of the system if there is a match between the set of melodic intervals and the first set of dialect intervals.
  - 24. The product as recited in claim 22 which further comprises:
    - program code means for retrieving a second set of dialect intervals for a second selected dialect;
      
      program code means for comparing the set of melodic intervals to the second set of dialect intervals to determine whether the digitized speech sample is from human speech of the second selected dialect; and
      
      ,program code means for sending a message to a user interface of the system indicating that there is a match between the set of melodic intervals and the second set of dialect intervals.
  - 25. The product as recited in claim 22 which further comprises:
    - program code means for comparing the digitized speech sample with a code book which contains stored speech samples corresponding to phonemes to generate a string of phonemes corresponding to the digitized speech sample; and
      
      program code means for comparing the digitized speech sample to pitch data in the string of phonemes to select the set of melodic intervals.
  - 26. The product as recited in claim 22 wherein the selecting means comprises:
    - program code means for analyzing the digitized speech sample to generate prosodic data; and
      
      ,program code means for selecting the melodic intervals according to the prosodic data.

28. A speech synthesis system comprising:
- a memory for storing set of instructions to perform speech processing and speech data;
  
  a processor coupled to the memory for executing the sets of instructions;
  
  means for generating a string of linguistic units containing pitch data by selecting dialect neutral linguistic units from a first memory segment of the system which correspond to characters in a text string and concatenating the selected linguistic units together in a second memory segment of the system;
  
  means for selecting locations within the pitch data of the string of linguistic units;
  
  means for retrieving a first set of dialect intervals for a first selected dialect, the first set of dialect intervals selected from a set of melodic intervals as being indicative of the first selected dialect and stored in a dialect table in a third memory; and
  
  means for applying the first set of dialect intervals to the pitch data at the selected locations so that synthesized speech of the first selected dialect produced.
- View Dependent Claims (29, 36, 37, 38, 39, 40, 41, 42)
- - 29. The system as recited in claim 28 wherein the applying means changes at least one melodic interval at a selected location in the pitch data to at least one dialect interval of the first set of dialect intervals.
  - 36. The system as recited in claim 28 wherein the dialect table includes sets of dialect intervals for a plurality of dialects.
  - 37. The system as recited in claim 28 wherein the dialect table includes a set of dialect intervals for a first language.
  - 38. The system as recited in claim 29 wherein the identifying means comprises:
    - means for generating prosody data for the string of linguistic units according to prosody rules of the system; and
      
      means for altering the pitch data within the string of linguistic units according to the prosody data;
      
      wherein the selected locations are chosen within the altered pitch data.
  - 39. The system as recited in claim 28 wherein the first dialect is British English and the first set of dialect intervals comprises an octave, a major seventh and a minor seventh.
  - 40. The system as recited in claim 28 wherein the first dialect is Japanese and the first set of dialect intervals comprises a perfect fifth, a perfect fourth, a major second and a minor second.
  - 41. The system as recited in claim 28 wherein the first dialect is Irish and the first set of dialect intervals comprises a major sixth, a minor sixth and a major third.
  - 42. The system as recited in claim 28 wherein the first dialect is Midwestern English and the first set of dialect intervals comprises a perfect fifth, a major third, a perfect fourth and a minor third.

30. A speech recognition system comprising:
- a memory for storing set of instructions to perform speech processing and speech data;
  
  a processor coupled to the memory for executing the sets of instructions;
  
  means for providing a digitized speech sample of human speech;
  
  means for selecting a set of melodic intervals in the digitized speech sample;
  
  means for retrieving a first set of dialect intervals for a first selected dialect, the first set of dialect intervals being melodic intervals which are indicative of the first selected dialect and stored in a dialect table; and
  
  means for comparing the set of melodic intervals to the first set of dialect intervals to determine whether the digitized speech sample is from human speech of the first selected dialect.
- View Dependent Claims (31, 32, 33, 34, 35)
- - 31. The system as recited in claim 30 which further comprises means for sending a message to a user interface of the system if there is a match between the set of melodic intervals and the first set of dialect intervals.
  - 32. The system as recited in claim 30 which further comprises:
    - means for retrieving a second set of dialect intervals for a second selected dialect;
      
      means for comparing the set of melodic intervals to the second set of dialect intervals to determine whether the digitized speech sample is from human speech of the second selected dialect; and
      
      ,means for sending a message to a user interface of the system indicating that there is a match between the set of melodic intervals and the second set of dialect intervals.
  - 33. The system as recited in claim 30 wherein the selecting means identifies a melodic interval in the digitized speech sample which exceeds a predetermined threshold as a melodic interval in the set of melodic intervals.
  - 34. The system as recited in claim 30 wherein the selecting means comprises:
    - means for comparing the digitized speech sample with a code book which contains stored speech samples corresponding to phonemes to generate a string of phonemes corresponding to the digitized speech sample; and
      
      means for comparing the digitized speech sample to pitch data in the string of phonemes to select the set of melodic intervals.
  - 35. The system as recited in claim 30 wherein the identifying means comprises:
    - means for analyzing the digitized speech sample to generate prosodic data; and
      
      ,means for selecting the set of melodic intervals according to the prosodic data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Farrett, Peter W.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Mattson, Robert

Application Number

US08/176,819
Time in Patent Office

1,245 Days
Field of Search

381/29-53, 395/2, 395/2.2, 395/2.4, 395/2.55, 395/2.6, 395/2.67, 395/2.69, 395/2.76, 395/2.77, 360/135
US Class Current

704/258
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

G10L 15/005 Language recognition

Speech synthesis and analysis of dialects

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

42 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis and analysis of dialects

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

42 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links