System and method for time aligning speech

US 5,333,275 A
Filed: 06/23/1992
Issued: 07/26/1994
Est. Priority Date: 06/23/1992
Status: Expired due to Term

First Claim

Patent Images

1. A system for time aligning speech, comprising:

a data interface for inputting speech data representing speech signals from a speaker;

circuitry for inputting an orthographic transcription including a plurality of words transcribed from said speech signals;

circuitry coupled to said inputting circuitry for generating a sentence model indicating a selected order of said words in response to said orthographic transcription;

circuitry coupled to said inputting circuitry for generating word models in response to said orthographic transcription, said word models being associated with respective ones of said words and being generated from pronunciation representations formed independent of said speech data; and

circuitry coupled to said sentence model generating circuitry, to said word model generating circuitry and to said inputting circuitry, for aligning said orthographic transcription with said speech data in response to said sentence model, to said word models and to said speech data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system are provided for time aligning speech. Speech data is input representing speech signals from a speaker. An orthographic transcription is input including a plurality of words transcribed from the speech signals. A sentence model is generated indicating a selected order of the words in response to the orthographic transcription. In response to the orthographic transcription, word models are generated associated with respective ones of the words. The orthographic transcription is aligned with the speech data in response to the sentence model, to the word models and to the speech data.

Citations

46 Claims

1. A system for time aligning speech, comprising:
- a data interface for inputting speech data representing speech signals from a speaker;
  
  circuitry for inputting an orthographic transcription including a plurality of words transcribed from said speech signals;
  
  circuitry coupled to said inputting circuitry for generating a sentence model indicating a selected order of said words in response to said orthographic transcription;
  
  circuitry coupled to said inputting circuitry for generating word models in response to said orthographic transcription, said word models being associated with respective ones of said words and being generated from pronunciation representations formed independent of said speech data; and
  
  circuitry coupled to said sentence model generating circuitry, to said word model generating circuitry and to said inputting circuitry, for aligning said orthographic transcription with said speech data in response to said sentence model, to said word models and to said speech data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1 wherein said data interface inputs said speech data from a speech signal source and wherein said speech signal source includes a publicly switched telephone network.
  - 3. The system of claim 2 wherein said speech signal source includes a T1 line from a publicly switched telephone network.
  - 4. The system of claim 1 wherein said data interface inputs said speech data from a speech signal source and wherein said speech signal source includes a wireless communications device.
  - 5. The system of claim 4 wherein said speech signal source includes a wireless communications device of an air traffic control system.
  - 6. The system of claim 1 and further comprising circuitry coupled between said inputting circuitry and said word model generating circuitry for generating a list of selected words in said orthographic transcription, and wherein said word model generating circuitry is operable to generate said word models associated with respective ones of said listed words.
  - 7. The system of claim 1 wherein said word model generating circuitry is operable to generate said word models each indicating a plurality of phonetic Hidden Markov Models.
  - 8. The system of claim 1 wherein said aligning circuitry is operable to time align said orthographic transcription with said speech data.

9. A system for time aligning speech, comprising:
- a data interface for inputting speech data representing speech signals from multiple interlocutors;
  
  circuitry for inputting an orthographic transcription including a plurality of words transcribed from said speech signals;
  
  circuitry coupled to said inputting circuitry for generating a sentence model indicating a selected order of said words in response to said orthographic transcription;
  
  circuitry coupled to said inputting circuitry for generating word models in response to said orthographic transcription, said word models being associated with respective ones of said words; and
  
  circuitry coupled to said sentence model generating circuitry, to said word model generating circuitry and to said inputting circuitry, for aligning said orthographic transcription with said speech data in response to said sentence model, to said word models and to said speech data.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9 wherein said orthographic transcription indicates first and second ones of said words being simultaneous.
  - 11. The system of claim 10 wherein said sentence model generating circuitry is operable to generate said sentence model indicating said first and second words being simultaneous.
  - 12. The system of claim 11 wherein said aligning circuitry is operable to align said orthographic transcription with said speech data in response to said sentence model indicating said first and second words being simultaneous.
  - 13. The system of claim 12 wherein said aligning circuitry is operable to align said orthographic transcription with said speech data, such that a selected one of said first and second words is aligned with said speech data.
  - 14. The system of claim 9 wherein said speech data represent speech signals from multiple interlocutors in conversation.
  - 15. The system of claim 9 wherein said data interface is operable to:
    - input first speech data from a first channel, said first speech data representing speech signals from an associated first one of said interlocutors; and
      
      input second speech data from a second channel, said second speech data representing speech signals from an associated second one of said interlocutors.
  - 16. The system of claim 15 and further comprising circuitry for combining said first and second speech data, wherein said aligning circuitry is operable to align said orthographic transcription with said speech data in response to said combined speech data.

17. A system for time aligning speech, comprising:
- a data interface for inputting speech data representing unscripted speech signals from a speaker;
  
  circuitry for inputting an orthographic transcription including a plurality of words transcribed from said unscripted speech signals;
  
  circuitry coupled to said inputting circuitry for generating a sentence model indicating a selected order of said words in response to said orthographic transcription;
  
  circuitry coupled to said inputting circuitry for generating word models in response to said orthographic transcription, said word models being associated with respective ones of said words; and
  
  circuitry coupled to said sentence model generating circuitry, to said word model generating circuitry and to said inputting circuitry, for aligning said orthographic transcription with said speech data in response to said sentence model, to said word models and to said speech data.
- View Dependent Claims (18)
- - 18. The system of claim 17 and further comprising circuitry coupled between said inputting circuitry and said sentence model generating circuitry for normalizing said orthographic transcription, wherein said sentence model generating circuitry is operable to generate said sentence model in response to said normalized orthographic transcription.

19. A method of time aligning speech using process circuitry performing the following steps comprising:
- inputting speech data representing speech signals from a speaker;
  
  inputting an orthographic transcription including a plurality of words transcribed from said speech signals;
  
  generating a sentence model indicating a selected order of said words in response to said orthographic transcription;
  
  in response to said orthographic transcription, generating word models from pronunciation representations formed independent of said speech data, said word models being associated with respective ones of said words; and
  
  aligning said orthographic transcription with said speech data in response to said sentence model, to said word models and to said speech data.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 20. The method of claim 19 wherein said step of inputting speech data comprises the step of inputting said speech data from a speech signal source.
  - 21. The method of claim 20 wherein said step of inputting speech data comprises the step of inputting said speech data from a publicly switched telephone network.
  - 22. The method of claim 21 wherein said step of inputting speech data comprises the step of inputting said speech data over a T1 line from a publicly switched telephone network.
  - 23. The method of claim 20 wherein said step of inputting speech data comprises the step of inputting said speech data from a wireless communications device.
  - 24. The method of claim 23 wherein said step of inputting speech data comprises the step of inputting said speech data from a wireless communications device of an air traffic control system.
  - 25. The method of claim 19 wherein said step of inputting speech data comprises the step of converting said speech signals into said speech data.
  - 26. The method of claim 19 and further comprising the step of generating a list of selected words in said orthographic transcription, such that said word models are associated with respective ones of said listed words.
  - 27. The method of claim 19 wherein said step of generating said plurality of word models comprises the step of generating said plurality of word models each indicating a plurality of phonetic Hidden Markov Models.
  - 28. The method of claim 19 wherein said step of aligning comprises the step of time aligning said orthographic transcription with said speech data.

29. A method of time aligning speech using process circuitry performing the following steps comprising:
- inputting speech data representing speech signals from multiple interlocutors;
  
  inputting an orthographic transcription including a plurality of words transcribed from said speech signals;
  
  generating a sentence model indicating a selected order of said words in response to said orthographic transcription;
  
  in response to said orthographic transcription, generating word models associated with respective ones of said words; and
  
  aligning said orthographic transcription with said speech data in response to said sentence model, to said word models and to said speech data.
- View Dependent Claims (30, 31, 32, 33, 34, 35, 36)
- - 30. The method of claim 29 wherein said step of inputting said orthographic transcription comprises the step of inputting said orthographic transcription indicating first and second ones of said words being simultaneous.
  - 31. The method of claim 30 wherein said step of generating said sentence model comprises the step of generating said sentence model indicating said first and second words being simultaneous.
  - 32. The method of claim 31 wherein said step of aligning comprises the step of aligning said orthographic transcription with said speech data in response to said sentence model indicating said first and second words being simultaneous.
  - 33. The method of claim 32 wherein said step of aligning comprises the step of aligning said orthographic transcription with said speech data, such that a selected one of said first and second words is aligned with said speech data.
  - 34. The method of claim 29 wherein said step of inputting speech data comprises the step of inputting speech data representing speech signals from multiple interlocutors in conversation.
  - 35. The method of claim 29 wherein said step of inputting speech data comprises the steps of:
    - inputting first speech data from a first channel, said first speech data representing speech signals from an associated first one of said interlocutors; and
      
      inputting second speech data from a second channel, said second speech data representing speech signals from an associated second one of said interlocutors.
  - 36. The method of claim 35 and further comprising the step of combining said first and second speech data such that said orthographic transcription is aligned with said speech data in response to said combined speech data.

37. A method of time aligning speech using process circuitry performing the following steps comprising:
- inputting speech data representing unscripted speech signals from a speaker;
  
  inputting an orthographic transcription including a plurality of words transcribed from said unscripted speech signals;
  
  generating a sentence model indicating a selected order of said words in response to said orthographic transcription;
  
  in response to said orthographic transcription, generating word models associated with respective ones of said words; and
  
  aligning said orthographic transcription with said speech data in response to said sentence model, to said word models and to said speech data.
- View Dependent Claims (38)
- - 38. The method of claim 37 and further comprising the step of normalizing said orthographic transcription, such that said sentence model is generated in response to said normalized orthographic transcription.

39. A system for time aligning speech, comprising:
- a data interface for inputting speech data representing unscripted speech signals from multiple interlocutors;
  
  circuitry for inputting an orthographic transcription including a plurality of words transcribed from said unscripted speech signals;
  
  circuitry coupled to said inputting circuitry for generating a sentence model indicating a selected order of said words in response to said orthographic transciption;
  
  circuitry coupled to said inputting circuitry for generating word models in response to said orthographic transcription, said word models being associated with respective ones of said words and being generated from pronunciation representations formed independent of said speech data; and
  
  circuitry coupled to said sentence model generating circuitry, to said word model generating circuitry and to said inputting circuitry, for aligning said orthographic transcription with said speech data in response to said sentence model, to said word models and to said speech data.
- View Dependent Claims (40, 41, 42, 43, 44, 45, 46)
- - 40. The system of claim 39 and further comprising circuitry coupled between said inputting circuitry and said sentence model generating circuitry for normalizing said orthographic transcription, wherein said sentence model generating circuitry is operable to generate said sentence model in response to said normalized orthographic transcription.
  - 41. The system of claim 39 wherein said data interface is operable to:
    - input first speech data from a first channel, said first speech data representing speech signals from an associated first one of said interlocutors; and
      
      input second speech data from a second channel, said second speech data representing speech signals from an associated second one of said interlocutors.
  - 42. The system of claim 41 and further comprising circuitry for combining said first and second speech data, wherein said aligning circuitry is operable to align said orthographic transcription with said speech data in response to said combined speech data.
  - 43. The system of claim 39 wherein said orthographic transcription indicates first and second ones of said words being simultaneous.
  - 44. The system of claim 43 wherein said sentence model generating circuitry is operable to generate said sentence model indicating said first and second words being simultaneous.
  - 45. The system of claim 44 wherein said aligning circuitry is operable to align said orthographic transcription with said speech data in response to said sentence model indicating said first and second words being simultaneous.
  - 46. The system of claim 45 wherein said aligning circuitry is operable to align said orthographic transcription with said speech data, such that a selected one of said first and second words is aligned with said speech data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Hemphill, Charles T., Fisher, Thomas D., Wheatley, Barbara J., Doddington, George R.
Primary Examiner(s)
Fleming, Michael R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/903,033
Time in Patent Office

763 Days
Field of Search

381/29-53, 395/2.52, 395/2.4
US Class Current

704/243
CPC Class Codes

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/193   Formal grammars, e.g. finit...

G10L 2015/088   Word spotting

G10L 25/78   Detection of presence or ab...

System and method for time aligning speech

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

46 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for time aligning speech

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

46 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links