Displaying text of speech in synchronization with the speech

US 20050203750A1
Filed: 03/11/2005
Published: 09/15/2005
Est. Priority Date: 03/12/2004
Status: Abandoned Application

First Claim

Patent Images

1. ) A setting apparatus comprising setting means for setting a timing of displaying text of speech in synchronization with reproduction of said speech, the text of said speech being predetermined, said setting means comprising:

a scenario data obtaining unit for obtaining scenario data representing content of said speech;

a speech recognition unit for dividing textual data resulting from recognition of said speech being reproduced to generate a plurality of pieces of recognition data;

a character string detection unit for detecting in said scenario data a character string that matches each of said plurality of pieces of recognition data;

a character detection unit for detecting a character string that matches the recognition data from said scenario data by detecting a character contained in the recognition data for each recognition data with which said character string detection unit has detected no matching characters string; and

a display setting unit for setting the display timing of displaying each of the character strings contained in said scenario data to the timing at which speech recognized as a piece of recognition data that matches said character string is reproduced.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Displays a character string representing content of speech in synchronization with reproduction of the speech. An apparatus includes: a unit for obtaining scenario data representing the speech; a unit for dividing textual data resulting from recognition of the speech to generate pieces of recognition pieces of recognition data; a unit for detecting in the scenario data a character matching each character contained in each piece of recognition data for which no matching character string has been detected to detect in the scenario data a character string that matches the piece of recognition data; and a unit for setting the display timing of displaying each of character strings contained in the scenario data to the timing at which speech recognized as the piece of recognition data that matches the character string is reproduced.

29 Citations

View as Search Results

20 Claims

1. ) A setting apparatus comprising setting means for setting a timing of displaying text of speech in synchronization with reproduction of said speech, the text of said speech being predetermined, said setting means comprising:
- a scenario data obtaining unit for obtaining scenario data representing content of said speech;
  
  a speech recognition unit for dividing textual data resulting from recognition of said speech being reproduced to generate a plurality of pieces of recognition data;
  
  a character string detection unit for detecting in said scenario data a character string that matches each of said plurality of pieces of recognition data;
  
  a character detection unit for detecting a character string that matches the recognition data from said scenario data by detecting a character contained in the recognition data for each recognition data with which said character string detection unit has detected no matching characters string; and
  
  a display setting unit for setting the display timing of displaying each of the character strings contained in said scenario data to the timing at which speech recognized as a piece of recognition data that matches said character string is reproduced.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 20)
- - 2. ) The setting apparatus according to claim 1, further comprising a phoneme detection unit for detecting in a phonetic representation of said scenario data a phoneme that matches a phoneme contained in each character in said recognition data for which no matching character has been detected by said character detection unit, wherein said character detection unit detects in said scenario data, as a character that matches a character for which a matching phoneme has been found in said recognition data by said phoneme detection unit, a character containing said phoneme.
  - 3. ) The setting apparatus according to claim 2, further comprising a phoneticizing unit for generating a plurality of candidate phonetic representations of said scenario data, wherein said phoneme detection unit detects, in any of said plurality of candidate phonetic representations generated by said phoneticizing unit, a phoneme that matches the phoneme contained in a phonetic representation of a character in said recognition data for which no matching character is found in said scenario data by said character detection unit.
  - 4. ) The setting apparatus according to claim 3, wherein:
    - said phoneticizing unit generates each of said plurality of candidate phonetic representations in said scenario data along with information indicating the likelihood that said scenario data is sounded out in accordance with the candidate phonetic representation; and
      
      said phoneme detection unit compares a phoneme contained in a phonetic representation of a character contained in said recognition data with said plurality of candidate phonetic representations in descending order of likelihood of being sounded out.
  - 5. ) The setting apparatus according to claim 1, further comprising a reliability calculating unit for calculating reliability which represents the likelihood that each of said plurality of pieces of recognition data matches one character string, wherein:
    - said character string detection unit determines that said character string detection unit cannot detect any character string that matches a piece of recognition data having a reliability lower than a predetermined reference reliability if said character string detection unit cannot detect the character string that matches the character string following said low-reliability data.
  - 6. ) The setting apparatus according to claim 1, further comprising a reliability calculating unit for calculating reliability which represents a likelihood that each of said plurality of pieces of recognition data matches one character string, wherein:
    - said display setting unit makes a setting that, if the reliability associated with a character string to be displayed first in two successive character strings among said plurality of character strings in said scenario data is higher than the reliability associated with the next character string to be displayed in said two successive character strings, causes a concatenated character string including said character string to be displayed first and said next character string appended to said first character string to be displayed at a time point at which said first character string should be displayed.
  - 7. ) The setting apparatus according to claim 6, wherein said reliability calculating unit produces a higher reliability for a piece of recognition data for which a matching character string has been detected by said character string detection unit than the reliability of a piece of recognition data for which a matching character string has been detected by said character detection unit.
  - 8. ) The setting apparatus according to claim 6, further comprising a phoneme detection unit for detecting in a phonetic representation of said scenario data a phoneme that matches a phoneme contained in a character in said recognition data for which no matching character has been detected by said character detection unit, wherein said character detection unit detects in said scenario data, as a character that matches a character in said recognition data for which a matching phoneme has been detected by said phoneme detection unit, a character containing said phoneme;
    - and said reliability calculating unit produces a lower reliability for a piece of recognition data containing a character for which a matching phoneme has been detected by said phoneme detection unit than the reliability of a piece of recognition data containing a character for which no matching phoneme has been detected by said phoneme detection unit but for which a matching character has been detected by said character detection unit.
  - 9. ) The setting apparatus according to claim 1, wherein said speech recognition unit further generates a speech recognition certainty factor indicating the possibility that each of said plurality of pieces of recognition data resulting from speech recognition matches the content of speech being reproduced;
    - and said character string detection unit finds a character string that matches a piece of recognition data having a higher speech recognition certainty factor prior to finding a piece of recognition data having a lower speech recognition certainty factor and, if said character string detection unit detects a first character string that matches a first piece of said recognition data and a second character string that matches a second piece of said recognition data, detects a character string following said first character string and preceding said second character string as a character string that matches the piece of recognition data following said first piece of recognition data and preceding the second piece of recognition data.
  - 10. ) The setting apparatus according to claim 1, wherein said display setting unit makes a setting that causes a piece of recognition data for which no matching character string has been detected in said scenario data by said character string detection unit to be displayed during reproduction of speech recognized as said piece of recognition data through speech reproduction.
  - 20. ) A computer program product comprising a computer usable medium having computer readable program code means embodied therein for causing function of aetting apparatus, the computer readable program code means in said computer program product comprising computer readable program code means for causing a computer to effect the functions ofd claim 1.

11. ) A setting apparatus for setting the timing of displaying text of speech in synchronization with reproduction of said speech, the text of said speech being predetermined, said setting apparatus comprising:
- a reliability obtaining unit for obtaining, in connection with each of a plurality of character strings contained in scenario data representing the content of said speech being reproduced, a time point at which said character string should be displayed and reliability indicating the likelihood that speech representing said character string is reproduced at said time point; and
  
  a display setting unit for making a setting that, if the reliability associated with a character string to be displayed first in two successive character strings among said plurality of character strings is higher than the reliability associated with the next character string to be displayed in said two successive character strings, causes a concatenated character string including said character string to be displayed first and said next character string appended to said first character string to be displayed at a time point at which said first character should be displayed.
- View Dependent Claims (12)
- - 12. ) The setting apparatus according to claim 11, wherein said display setting unit makes a setting that, if the reliability associated with said character string to be displayed first is higher than the reliability associated with the succeeding character string that follows said character string to be displayed subsequently, causes a concatenated character string consisting of said concatenated character string and said succeeding character string appended to said concatenated character string to be displayed at a time point at which said character string to be displayed first should be displayed.

13. ) A program that causes a computer to function as a setting apparatus for setting the timing of displaying text of speech in synchronization with reproduction of said speech, the text of said speech being predetermined, said program causing said computer to function as:
- a scenario data obtaining unit for obtaining scenario data representing the content of said speech;
  
  a speech recognition unit for dividing textual data resulting from recognition of said speech being reproduced to generate a plurality of pieces of recognition data;
  
  a character string detection unit for detecting in said scenario data a character string that matches each of said plurality of pieces of recognition data;
  
  a character detection unit for detecting a character string that matches the recognition data from said scenario data by detecting the character contained in the recognition data for each recognition data with which said character string detection unit has detected no matching characters string; and
  
  a display setting unit for setting the display timing of displaying each of character strings contained in said scenario data to the timing at which speech recognized as the piece of recognition data that matches said character string is reproduced.
- View Dependent Claims (15)
- - 15. ) A recording medium on which the program according to claim 13 or 14 is recorded.

14. ) A program that causes a computer to function as a setting apparatus for setting the timing of displaying text of speech in synchronization with reproduction of said speech, the text of said speech being predetermined, said program causing said computer to function as:
- a reliability obtaining unit for obtaining in combination with each of a plurality of character strings contained in scenario data representing the content of said speech being reproduced, a time point at which said character string should be displayed and reliability indicating the likelihood that speech representing said character string is reproduced at said time point; and
  
  a display setting unit for making a setting that, if the reliability associated with a character string to be displayed first in two successive character strings among said plurality of character strings is higher than the reliability associated with the next character string to be displayed in said two successive character strings, causes a concatenated character string consisting of said character string to be displayed first and said next character string appended to said first character string to be displayed at a time point at which said first character string should be displayed.

16. ) A method for setting the timing of displaying text of speech in synchronization with reproduction of said text of speech, the text of said speech being predetermined, said method using a computer to perform;
- a scenario data obtaining step of obtaining scenario data representing the content of said speech;
  
  a speech recognition step of dividing textual data resulting from recognition of said speech being reproduced to generate a plurality of pieces of recognition data;
  
  a character string detecting step of detecting in said scenario data a character string that matches each of said plurality of pieces of recognition data;
  
  a character detection step for detecting a character string that matches the recognition data from said scenario data by detecting the character contained in the recognition data for each recognition data with which said character string detection step has detected no matching characters string; and
  
  a display setting step of setting the display timing of displaying each of character strings contained in said scenario data to the timing at which speech recognized as the piece of recognition data that matches said character string is reproduced.

17. ) A method comprising setting the timing of displaying text of speech in synchronization with reproduction of said text of speech, the text of said speech being predetermined, said method using a computer to perform:
- a reliability obtaining step of obtaining in connection with each of a plurality of character strings contained in scenario data representing the content of said speech being reproduced, a time point at which said character string should be displayed and reliability indicating the likelihood that speech representing said character string is reproduced as said time point; and
  
  a display setting step of making a setting that, if the reliability associated with a character string to be displayed first in two successive character strings among said plurality of character strings is higher than the reliability associated with the next character string to be displayed in said two successive character strings, causes a concatenated character string consisting of said character string to be displayed first and said next character string appended to said first character string to be displayed at a time point at which said first character string should be displayed.
- View Dependent Claims (18, 19)
- - 18. ) An article of manufacture comprising a computer usable medium having computer readable program code means embodied therein for causing setting of the timing of displaying text of speech in synchronization with reproduction of said text of speech, the computer readable program code means in said article of manufacture comprising computer readable program code means for causing a computer to effect the steps of claim 17.
  - 19. ) A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for setting the timing of displaying text of speech in synchronization with reproduction of said text of speech, said method steps comprising the steps of claim 17.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Miyamoto, Kohtaroh, Shoji, Midori

Application Number

US11/077,586
Publication Number

US 20050203750A1
Time in Patent Office

Days
Field of Search
US Class Current

704/276
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

Displaying text of speech in synchronization with the speech

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

29 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Displaying text of speech in synchronization with the speech

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links