Source-dependent text-to-speech system

US 8,005,677 B2
Filed: 05/09/2003
Issued: 08/23/2011
Est. Priority Date: 05/09/2003
Status: Active Grant

First Claim

Patent Images

1. A method of generating speech from text messages, comprising:

determining a speech feature vector for a voice associated with a source of a first text message;

comparing the speech feature vector to a plurality of speaker models, wherein the plurality of speaker models are unrelated to the source of the first text message;

based on the comparison, selecting one of the speaker models as a preferred match for the voice;

associating the selected speaker model with the source of the first text message;

if the speech feature vector cannot be determined, selecting one of the speaker models as a default selection;

generating speech from the text message based on the selected speaker model; and

automatically generating speech from subsequent text messages received from the source of the first text message, based on the selected speaker model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of generating speech from text messages includes determining a speech feature vector for a voice associated with a source of a text message, and comparing the speech feature vector to speaker models. The method also includes selecting one of the speaker models as a preferred match for the voice based on the comparison, and generating speech from the text message based on the selected speaker model.

Citations

34 Claims

1. A method of generating speech from text messages, comprising:
- determining a speech feature vector for a voice associated with a source of a first text message;
  
  comparing the speech feature vector to a plurality of speaker models, wherein the plurality of speaker models are unrelated to the source of the first text message;
  
  based on the comparison, selecting one of the speaker models as a preferred match for the voice;
  
  associating the selected speaker model with the source of the first text message;
  
  if the speech feature vector cannot be determined, selecting one of the speaker models as a default selection;
  
  generating speech from the text message based on the selected speaker model; and
  
  automatically generating speech from subsequent text messages received from the source of the first text message, based on the selected speaker model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the step of determining comprises:
    - receiving a sample of the voice; and
      
      analyzing the sample to determine the speech feature vector for the voice.
  - 3. The method of claim 1, wherein the step of determining comprises:
    - requesting an endpoint that is the source of the text message to provide the speech feature vector; and
      
      receiving the speech feature vector from the endpoint.
  - 4. The method of claim 1, wherein the step of generating comprises communicating a command to generate the speech to a text-to-speech server, the command comprising the selected speaker model, wherein the text-to-speech server generates the speech based on the selected speaker model.
  - 5. The method of claim 1, wherein:
    - the speech feature vector comprises a feature vectors for a Gaussian mixture model; and
      
      the step of comparing comprises comparing a first Gaussian mixture model associated with the speech feature vector with a plurality of second Gaussian mixture models, each second Gaussian mixture model associated with at least one of the speaker models.
  - 6. The method of claim 1, further comprising:
    - generating a plurality of model voice samples; and
      
      analyzing the model voice samples to determine the speaker model for each model voice sample.
  - 7. The method of claim 6, wherein the model voice samples are generated based on a text sample associated with the voice sample.
  - 8. The method of claim 1, wherein the steps of the method are implemented by an endpoint in a communication network.
  - 9. The method of claim 1, wherein the steps of the method are implemented in a voice match server in a communication network.
  - 10. The method of claim 1, wherein:
    - the steps of the method are implemented in a unified messaging system; and
      
      the speech feature vector is associated with a user that provided the text message in a user profile.

11. A voice match server, comprising:
- an interface operable to;
  
  receive a speech feature vector for a voice associated with a source of a first text message; and
  
  communicate a command to a text-to-speech server instructing the text-to-speech server to generate speech from the text message based on a selected speaker model; and
  
  a processor operable to;
  
  compare the speech feature vector to a plurality of speaker models, wherein the plurality of speaker models are unrelated to the source of the text message;
  
  select one of the speaker models as a preferred match for the voice based on the comparison;
  
  associate the selected speaker model with the source of the first text message; and
  
  select one of the speaker models as a default selection if the interface does not receive the speech feature vector; and
  
  the interface further operable to communicate a command to a text-to-speech server instructing the text-to-speech server to automatically generate speech from subsequent text messages received from the source of the first text message, based on the selected speaker model.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The server of claim 11, further comprising a memory operable to store the plurality of speaker models.
  - 13. The server of claim 11, wherein:
    - the interface is further operable to cause the text-to-speech server to generate a plurality of model voice samples; and
      
      the speaker models are determined based on analysis of the model voice samples.
  - 14. The server of claim 13, wherein the model voice samples are generated based on a text sample associated with the voice sample.
  - 15. The server of claim 11, wherein:
    - the interface is further operable to communicate a request for the speech feature vector to an endpoint that is the source of the text message; and
      
      the interface receives the speech feature vector from the endpoint.
  - 16. The server of claim 11, wherein:
    - the speech feature vector comprises a feature vector for a Gaussian mixture model; and
      
      the step of comparing comprises comparing a first Gaussian mixture model associated with the speech feature vector to a plurality of second Gaussian mixture models, each second Gaussian mixture model associated with at least one of the speaker models.
  - 17. The server of claim 11, wherein:
    - the server is part of a unified messaging system; and
      
      the speech feature vector is associated with a user that provided the text message in a user profile.

18. An endpoint, comprising:
- a first interface operable to receive a first text message from a source; and
  
  a processor operable to;
  
  determine a speech feature vector for a voice associated with a source of the text message;
  
  compare the speech feature vector to a plurality of speaker models, wherein the plurality of speaker models are unrelated to the source of the first text message;
  
  select one of the speaker models as a preferred match for the voice based on the comparison;
  
  associate the selected speaker model with the source of the first text message;
  
  select one of the speaker models as a default selection if the processor cannot determine the speech feature vector;
  
  generate speech from the text message based on the selected speaker model; and
  
  automatically generate speech from subsequent text message received from the source of the first text message, based on the selected speaker model; and
  
  a second interface operable to output the generated speech to a user.
- View Dependent Claims (19, 20, 21)
- - 19. The endpoint of claim 18, wherein the first interface is further operable to:
    - communicate a request for the speech feature vector to the source of the text message; and
      
      receive the speech feature vector in response to the request.
  - 20. The endpoint of claim 18, wherein:
    - the first interface is further operable to receive a voice sample from the source of the text message; and
      
      the processor is further operable to analyze the voice sample to determine the speech feature vector.
  - 21. The endpoint of claim 18, wherein:
    - the first interface is further operable to receive speech from the source of the text message;
      
      the second interface is further operable to output the received speech; and
      
      the processor is further operable to analyze the received speech to determine the speech feature vector.

22. A system, comprising:
- a voice match server operable to;
  
  compare a speech feature vector, for a voice associated with a source of a first text message, to a plurality of speaker models, wherein the plurality of speaker models are unrelated to the source of the first text message; and
  
  select one of the speaker models as a preferred match for the voice based on the comparison;
  
  associate the selected speaker model with the source of the first text message;
  
  select one of the speaker models as a default selection if the speech feature vector cannot be determined; and
  
  a text-to-speech server operable to generate speech from the text message based on the selected speaker model; and
  
  the text-to-speech server further operable to automatically generate speech from subsequent text messages received from the source of the first text message, based on the selected speaker model.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The system of claim 22, further comprising a speech feature vector server operable to:
    - receive speech; and
      
      determine an associated speech feature vector based on the speech, wherein the speech feature vector compared by the voice match server is received from the speech feature vector server.
  - 24. The system of claim 22, wherein the voice match server is further operable to receive the speaker models from the speech feature vector server.
  - 25. The system of claim 24, wherein:
    - the voice match server is further operable to cause the text-to-speech server to generate a plurality of model voice samples; and
      
      the speech feature vector server is further operable to analyze the voice samples to determine the speaker models.
  - 26. The system of claim 22, wherein:
    - the text-to-speech server is one of a plurality of text-to-speech servers, each text-to-speech server operable to generate speech using a different speaker model; and
      
      the voice match server is further operable to select one of the text-to-speech servers to generate speech based on which text-to-speech server uses the selected speaker model.

27. Software embodied in a non-transitory tangible computer-readable medium, operable to perform the steps of:
- determining a speech feature vector for a voice associated with a source of a first text message;
  
  comparing the speech feature vector to a plurality of speaker models, wherein the plurality of speaker models are unrelated to the source of the first text message;
  
  based on the comparison, selecting one of the speaker models as a preferred match for the voice;
  
  associating the selected speaker model with the source of the first text message;
  
  selecting one of the speaker models as a default selection if the speech feature vector cannot be determined;
  
  generating speech from the text message based on the selected speaker model; and
  
  automatically generating speech from subsequent text messages received from the source of the first text message, based on the selected speaker model.
- View Dependent Claims (28, 29, 30)
- - 28. The software of claim 27, wherein the step of determining comprises:
    - receiving a sample of the voice; and
      
      analyzing the sample to determine the speech feature vector for the voice.
  - 29. The software of claim 27, wherein the step of determining comprises:
    - requesting an endpoint that is the source of the text message to provide the speech feature vector; and
      
      receiving the speech feature vector from the endpoint.
  - 30. The software of claim 27, further operable to perform the steps of:
    - generating a plurality of model voice samples; and
      
      analyzing the model voice samples to determine the speaker model for each model voice sample.

31. A system, comprising:
- means for determining a speech feature vector for a voice associated with a source of a first text message;
  
  means for comparing the speech feature vector to a plurality of speaker models, wherein the plurality of speaker models are unrelated to the source of the first text message;
  
  means for selecting one of the speaker models as a preferred match for the voice based on the comparison;
  
  means for associating the selected speaker model with the source of the first text message;
  
  means for selecting one of the speaker models as a default selection if the speech feature vector cannot be determined;
  
  means for generating speech from the text message based on the selected speaker model; and
  
  means for automatically generating speech from subsequent text messages received from the source of the first text message, based on the selected speaker model.
- View Dependent Claims (32, 33, 34)
- - 32. The system of claim 31, wherein the means for determining comprise:
    - means for receiving a sample of the voice; and
      
      means for analyzing the sample to determine the speech feature vector for the voice.
  - 33. The system of claim 31, wherein the means for determining comprise:
    - means for requesting an endpoint that is the source of the text message to provide the speech feature vector; and
      
      means for receiving the speech feature vector from the endpoint.
  - 34. The system of claim 31, further comprising:
    - means for generating a plurality of model voice samples; and
      
      means for analyzing the model voice samples to determine the speaker model for each model voice sample.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Cutaia, Nicholas J.
Primary Examiner(s)
Opsasnick; Michael N

Application Number

US10/434,683
Publication Number

US 20040225501A1
Time in Patent Office

3,028 Days
Field of Search

704/260
US Class Current

704/260
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

G10L 13/047 Architecture of speech synt...

Source-dependent text-to-speech system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Source-dependent text-to-speech system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links