Speech processing apparatus and method

US 20030144841A1
Filed: 01/25/2002
Published: 07/31/2003
Est. Priority Date: 01/25/2002
Status: Active Grant

First Claim

Patent Images

1. Apparatus for generating and testing speech models, said apparatus comprising:

a data collection unit operable to collect and store utterance data indicative of the pronunciation of one or more words by one or more speakers;

a speech model generation unit operable to generate speech models of words, utterances of which have been collected by said data collection unit; and

a testing unit operable to test the accuracy of the matching of utterances collected by said data collection unit to speech models generated by said speech model generation unit and to generate a visual display of the results of said testing by said testing unit.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer system is provided including a control module 20 and data collection module 22 which generate user interfaces enabling a user to identify a vocabulary and a number of speakers from whom utterances are to be obtained. The data collection module 22 then co-ordinates the collection of utterance data for the words in the vocabulary from these speakers and stores the data in a speaker database 24. When a satisfactory set of utterances have been collected the utterances are passed to a modal generation module 25 which generates a speech model using the utterances. The speech model is stored by the model generation module 25 in a model database 26. The generated model stored within the model database 26 can then be tested using a testing module 27 and other utterances stored within the speaker database 24. If the performance of the model is unsatisfactory further or different utterances can be used to generate new models for storage within the model database 26. When a speech model is determined to be satisfactory the control module 20 can invoke the output module 28 to output a copy of the model.

Citations

25 Claims

1. Apparatus for generating and testing speech models, said apparatus comprising:
- a data collection unit operable to collect and store utterance data indicative of the pronunciation of one or more words by one or more speakers;
  
  a speech model generation unit operable to generate speech models of words, utterances of which have been collected by said data collection unit; and
  
  a testing unit operable to test the accuracy of the matching of utterances collected by said data collection unit to speech models generated by said speech model generation unit and to generate a visual display of the results of said testing by said testing unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. Apparatus in accordance with claim 1, wherein said data collection unit comprises:
    - a vocabulary database operable to store word identifiers indicative of one or more words;
      
      a speaker database operable to store speaker identifiers indicative of speakers from whom utterance data is to be collected; and
      
      a co-ordination unit operable;
      
      to generate a first user interface to enable user input of speaker identifiers for storage in said speaker database;
      
      to generate a second user interface to enable user input of word identifiers for storage in said vocabulary database; and
      
      to generate a third user interface operable to generate a series of prompts to prompt the utterance of words corresponding to word identifiers stored in said vocabulary database by speakers identified by speaker identifiers stored in said speaker database and to synchronise said series of prompts with the collection of utterance data indicative of pronunciation of words.
  - 3. Apparatus in accordance with claim 2, wherein said series of prompts generated by said third user interface comprises a generation of a series of visual instructions to speakers identified by speaker identifiers in said speaker database to pronounce words identified by word identifiers stored in said word database.
  - 4. Apparatus in accordance with claim 3, wherein said third user interface is operable to generate a series of prompts comprising user instructions to stay quiet immediately preceding and succeeding instructions to pronounce a word identified by a word identifier, wherein said collection of utterance data is performed whilst all of said instructions are displayed.
  - 5. Apparatus in accordance with claim 2, wherein said third user interface is operable to display a waveform indicative of collected utterance data whilst said utterance data is being collected.
  - 6. Apparatus in accordance with claim 2, wherein said data collection unit is operable subsequent to the collection of an item of utterance data to generate a user interface to display a waveform corresponding to said collected utterance data and to pert user deletion of stored utterance data displayed by said data collection unit.
  - 7. Apparatus in accordance with claim 2, wherein said data collection unit is operable subsequent to the collection of an item of utterance data to output audio data corresponding to said collected utterance data and to permit user deletion of stored utterance data output by said data collection unit.
  - 8. Apparatus in accordance with claim 2, wherein said data collection unit further comprises a selection unit operable to generate a user interface enabling user selection of speaker identifiers stored in said speaker database and word identifiers stored in said vocabulary database wherein said co-ordination unit is operable to generate a third user interface to generate a series of prompts to prompt the utterance of words corresponding to selected word identifiers by speakers corresponding to selected speaker identifiers selected utilizing said selection unit.
  - 9. Apparatus in accordance with claim 2, wherein said co-ordination unit is operable to generate said prompts to prompt the utterance of words by identified speakers a number of times for each of said words and speakers wherein said number of prompts is determined by the number of items of utterance date stored by said data collection unit associated with said words and said speakers.
  - 10. Apparatus in accordance with claim 1, wherein said speech model generation unit comprises a selector for selecting utterance data wherein said speech model generation unit is operable to generate speech models utilizing said utterances selected by said selector.
  - 11. Apparatus in accordance with claim 10, wherein said selector comprises:
    - a user interface enabling a user to identify words and speakers associated with utterance data stored by said data collection unit, wherein said speech model generation unit is operable to generate speech models of words utilizing said utterance data associated with said identified speakers and words.
  - 12. Apparatus in accordance with claim 11, wherein said speech model generation unit further comprises a data store operable to store constraint data wherein said speech model generation unit is operable to generate speech models of words where said identification of words and speakers a am utilizing said selector fulfills the requirements defined by said constraint data.
  - 13. Apparatus in accordance with claim 12, when said speakers are each associated with gender data wherein said constraint data comprises data identifying a relationship and the gender data of said identified speakers must fulfil.
  - 14. Apparatus in accordance with claim 12, wherein said constraint data comprises data indicative of a number of utterances wherein said speech model generation unit is operable to generate speech models of words when said data collection unit has stored utterance data associated with said identified speakers selected by said selector corresponding to the number of repetitions identified by said constraint data of said words selected by said selector.
  - 15. Apparatus in accordance with claim 1, wherein said testing unit is operable to generate a user interface to enable a user to identify speech models generated by said speech generation unit and to select utterance data stored by said data collection unit and to test said identified models utilizing said selected utterances.
  - 16. Apparatus in accordance with claim 15, wherein said testing unit is operable to generate a user interface enabling a user to identify sets of utterances collected by said data collection data corresponding to utterances indicative of the pronunciation of different words by different speakers, said testing unit being responsive to the selection of said sets of utterance data to test speech models generated by said speech model generation unit utilizing said selected sets selected of utterance data.
  - 17. Apparatus in accordance with claim 16, wherein said testing unit is operable to enable user selection of sets of utterances comprising utterance data collected from the speakers from whom utterance data was utilized by said speech generation unit to generate said speech models being tested.
  - 18. Apparatus in accordance with claim 17, wherein said testing unit is operable to enable user selection of sets of utterances comprising utterance data collected from speakers, utterance data from whom was not utilized by said speech generation unit to generate said speech models being tested.
  - 19. A storage medium having computer implementable instructions stored thereon for generating within a programmable computer apparatus in accordance with any of claims 1 to 18.
  - 20. A storage medium accordance with claim 19, comprising a disk.
  - 21. A disk in accordance with claim 20 comprising magnetic, optical or magneto optical disk.
  - 22. A storage medium in accordance with claim 19 comprising an electrical signal in a communications network.

23. A method of collecting utterance data comprising the steps of:
- displaying a first user interface to enable user input of speaker identifiers and storing said speaker identifiers in a speaker database;
  
  displaying a second user interface to enable user input of word identifiers and storing said word identifiers in a vocabulary database;
  
  displaying a series of prompts to prompt the utterance of words corresponding to word identifiers stored in said vocabulary database by speakers identified by speaker identifiers stored in said speaker database; and
  
  synchronising the collection of utterance data indicative of the pronunciation of words with said series of prompts.

24. Apparatus for collecting utterance data indicative of the pronunciation of one or more words by one or more speakers, the apparatus comprising:
- a data collection unit operable to collect and store utterance data indicative of the pronunciation of one or more words by one or more speakers;
  
  a vocabulary database operable to store word identifiers indicative of one or more words;
  
  a speaker database operable to store speaker identifiers indicative of speaker from whom utterance data is to be collected; and
  
  a co-ordination unit, said co-ordination unit being operable;
  
  to generate a first user interface to enable user input of speaker identifiers for storage in said speaker database;
  
  to generate a second user interface to enable user input of word identifiers for storage in said vocabulary database; and
  
  to generate a third user interface operable to generate a series of prompts to prompt the utterance of words corresponding to word identifiers stored in said vocabulary database by speakers identified by speaker identifiers stored said speaker database and to synchronise said series of prompts with the collection of utterance data indicative of pronunciation of words.

25. A method of generating speech models comprising the steps of:
- providing a computer system operable to collect utterance data, to generate speech models utilising said collected utterance data and to test the accuracy of matching utterances to said generated speech models;
  
  collecting data indicative of the pronunciation of one or more words by one or more speakers utilising said apparatus;
  
  generating speech models utilizing said collected utterances;
  
  determining whether said accuracy of said generated models is satisfactory by testing said models utilizing said apparatus; and
  
  outputting speech models determined to be satisfactory in said determination step.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Europa N.V. (Canon Inc.)
Original Assignee
Canon Europa N.V. (Canon Inc.)
Inventors
Shao, Yuan

Granted Patent

US 7,054,817 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G10L 15/06 Creation of reference templ...

G10L 15/22 Procedures used during a sp...

Speech processing apparatus and method

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Speech processing apparatus and method

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links