Subtitle generation and retrieval combining document with speech recognition

US 7,739,116 B2
Filed: 01/23/2006
Issued: 06/15/2010
Est. Priority Date: 12/21/2004
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for recognizing speech in a presentation to generate a subtitle corresponding to the speech, said apparatus comprising:

a text extraction unit that receives presentation text and its attributes from a presentation document, and stores said text and attributes in the text attribute database on a page-by-page basis, wherein the attributes comprise a title, character size, character underlining, or boldface character;

a morphological analysis unit that morphologically analyzes the presentation text stored in the text attribute database, decomposes said presentation text into words, and stores the words in a word attribute database;

a common keyword generation unit that extracts the words and their attributes from the word attribute database, determines whether or not a word has been successfully extracted, initializes attribute weights of the words and extracts the attribute weights from an attribute weight database and sums them if it is determined that the word extraction is successful, extracts keywords that are found in the presentation document and assigns weights to the keywords, then selects as an additional keyword to add to the keyword database any word that has been determined, based on time and attribute weight, to represent a high level of importance among the words contained in the presentation;

a dictionary registration unit that adds the keywords registered in a keyword database to a dictionary database that is consulted at time of speech recognition;

a voice recognition unit that recognizes the speech in the presentation in consultation with the dictionary database by;

acquiring correspondence between a lapse of time from a start of the presentation and a result of voice recognition every moment, stores a correspondence between the time and the result of voice recognition in a subtitle database;

a page-time recording unit that detects a page-changing event and stores the events as timestamps in a page-time database;

a common keyword regeneration unit that initializes the keyword database, extracts a word, an attribute of the word and information about the page where the word appeared from the word attribute database, and further assigns weight depending on a number of times the keyword appeared as the voice in the presentation;

a display control unit that reads a correspondence between the time and the result of speech recognition from the subtitle database, and displays said correspondence on a subtitle candidate display region, causes keywords stored in the keyword database, presentation text stored in the text attribute database, and a master subtitle stored in a master subtitle database to cooperate together for display as a subtitle to the presentation, and accesses the page-time database and specifies the page corresponding to the result of voice recognition on the basis of the time information;

a display unit comprising;

the subtitle candidate display region, a common keyword list display region, a presentation text display region, and a master subtitle display region;

a speaker note generation unit that generates speaker notes from subtitles stored in the subtitle database and embeds them in presentation documents;

the text attribute database;

the word attribute database that stores the words obtained as a result of the decomposition performed by the morphological analysis unit, and their attributes;

the attribute weight database that stores presentation word attributes and their assigned weights;

the keyword database that stores the weighted words as keywords;

the dictionary database;

the subtitle database that stores, together with the time, the result of speech-recognition as the subtitle;

the page-time database that records a time that the page is turned and a time when the next page is turned, and calculates the weight of the keywords in the page based on a duration during which the page in question is displayed in the presentation, when it is determined that extraction of the word has been successful; and

a master subtitle database that stores master subtitles on a page-by-page basis.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provides subtitle generation methods and apparatus which recognizes voice in a presentation to generate subtitles thereof, and retrieval apparatus for retrieving character strings by use of the subtitles. An apparatus of the present invention includes: a extraction unit for extracting text from presentation documents; an analysis unit for morphologically analyzing text to decompose it into words; a generation unit for generating common keywords by assigning weights to words; a registration unit for adding common keywords to a voice recognition dictionary; a recognition unit for recognizing voice in a presentation; a record unit for recording the correspondence between page and time by detecting page switching events; a regeneration unit for regenerating common keywords by further referring to the correspondence between page and time; a control unit for controlling the display of subtitles, common keywords, text and master subtitles; and a note generation unit for generating speaker notes from subtitles.

Citations

15 Claims

1. An apparatus for recognizing speech in a presentation to generate a subtitle corresponding to the speech, said apparatus comprising:
- a text extraction unit that receives presentation text and its attributes from a presentation document, and stores said text and attributes in the text attribute database on a page-by-page basis, wherein the attributes comprise a title, character size, character underlining, or boldface character;
  
  a morphological analysis unit that morphologically analyzes the presentation text stored in the text attribute database, decomposes said presentation text into words, and stores the words in a word attribute database;
  
  a common keyword generation unit that extracts the words and their attributes from the word attribute database, determines whether or not a word has been successfully extracted, initializes attribute weights of the words and extracts the attribute weights from an attribute weight database and sums them if it is determined that the word extraction is successful, extracts keywords that are found in the presentation document and assigns weights to the keywords, then selects as an additional keyword to add to the keyword database any word that has been determined, based on time and attribute weight, to represent a high level of importance among the words contained in the presentation;
  
  a dictionary registration unit that adds the keywords registered in a keyword database to a dictionary database that is consulted at time of speech recognition;
  
  a voice recognition unit that recognizes the speech in the presentation in consultation with the dictionary database by;
  
  acquiring correspondence between a lapse of time from a start of the presentation and a result of voice recognition every moment, stores a correspondence between the time and the result of voice recognition in a subtitle database;
  
  a page-time recording unit that detects a page-changing event and stores the events as timestamps in a page-time database;
  
  a common keyword regeneration unit that initializes the keyword database, extracts a word, an attribute of the word and information about the page where the word appeared from the word attribute database, and further assigns weight depending on a number of times the keyword appeared as the voice in the presentation;
  
  a display control unit that reads a correspondence between the time and the result of speech recognition from the subtitle database, and displays said correspondence on a subtitle candidate display region, causes keywords stored in the keyword database, presentation text stored in the text attribute database, and a master subtitle stored in a master subtitle database to cooperate together for display as a subtitle to the presentation, and accesses the page-time database and specifies the page corresponding to the result of voice recognition on the basis of the time information;
  
  a display unit comprising;
  
  the subtitle candidate display region, a common keyword list display region, a presentation text display region, and a master subtitle display region;
  
  a speaker note generation unit that generates speaker notes from subtitles stored in the subtitle database and embeds them in presentation documents;
  
  the text attribute database;
  
  the word attribute database that stores the words obtained as a result of the decomposition performed by the morphological analysis unit, and their attributes;
  
  the attribute weight database that stores presentation word attributes and their assigned weights;
  
  the keyword database that stores the weighted words as keywords;
  
  the dictionary database;
  
  the subtitle database that stores, together with the time, the result of speech-recognition as the subtitle;
  
  the page-time database that records a time that the page is turned and a time when the next page is turned, and calculates the weight of the keywords in the page based on a duration during which the page in question is displayed in the presentation, when it is determined that extraction of the word has been successful; and
  
  a master subtitle database that stores master subtitles on a page-by-page basis.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The apparatus according to claim 1, wherein the common keyword generation unit assigns the weight to the keyword in the document data according to the attribute of the keyword.
  - 3. The apparatus according to claim 2, wherein the common keyword generation unit assigns the weight to the keyword according to a number of times the keyword appeared in the speech of the presentation.
  - 4. The apparatus according to claim 1, wherein the dictionary registration unit performs at least one of:
    - setting a dictionary which belongs to a category suitable for the keyword as the dictionary to be consulted at the time of recognizing the speech; and
      
      displaying the keyword that has been extracted together with the subtitle.
  - 5. The apparatus according to claim 1, wherein the common keyword regeneration unit further registers the subtitle that has been created so that the subtitle can be consulted at the presentation.

6. A method of causing a computer to combine a processing of a document having a plurality of pages with a processing of speech generated with reference to the document, comprising the steps of:
- receiving presentation text and its attributes from the document;
  
  storing the presentation text and the attributes on a page-by-page basis, wherein said attributes comprise a title, character size, character underlining, or boldface character;
  
  decomposing the presentation text into words;
  
  storing the words in a word attribute database;
  
  extracting the words and their attributes from the word attribute database;
  
  accessing a keyword database;
  
  extracting the keywords that are common in the document;
  
  assigns weight to the keywords depending on a number of times the keyword appeared as the voice in the presentation and their attributes;
  
  recognizing the speech in the presentation in consultation with a dictionary database by;
  
  acquiring correspondence between a lapse of time from a start of the presentation and a result of speech recognition every moment, storing a correspondence between the time and the result of speech recognition in a subtitle database;
  
  accessing a page-time database that records a time that the page is turned and a time when the next page is turned, and calculates the weight of the keywords in the page based on a duration during which the page in question is displayed in the presentation;
  
  specifying the page corresponding to the result of voice recognition on the basis of the time information;
  
  wherein the computer determines, among subtitles obtained by recognizing the speech, a specific subtitle obtained by recognizing speech generated with reference to a specific page of the document by;
  
  deriving a correspondence between the time and the result of speech recognition from the subtitle database, and displaying it on a subtitle candidate display region; and
  
  causing keywords stored in the keyword database, presentation text stored in the text attribute database, and a master subtitle stored in a master subtitle database to cooperate together for display; and
  
  generating speaker notes from subtitles stored in the subtitle database.
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The method of claim 6, further comprising the step of causing the computer to display the specific subtitle together with information about the specific page.
  - 8. The method of claim 7, wherein the information is text data contained in the specific page.
  - 9. The method of claim 7, wherein the information concerns speech generated with reference to a specific page in the past.
  - 10. The method of claim 6, further comprising the step of causing the computer to embed the specific subtitle in the specific page of the document.
  - 11. The method according to claim 6, further comprising the step of causing the computer to retrieve character strings, with a retrieval target range extended from the specific subtitle to text data contained in the specific page.

12. A program product stored on a computer readable medium comprising program code, that when executed, allows a computer to:
- receive presentation text and its attributes from a document, wherein said attributes comprise a title, character size, character underlining, or boldface character;
  
  store the presentation text and the attributes on a page-by-page basis;
  
  decompose the presentation text into words;
  
  access a word attribute database;
  
  extract the decomposed words and the assigned weights for their corresponding attributes;
  
  access a keyword database;
  
  extract the keywords that are found in the document, along with an assigned weight for each keyword when the weight is based on the number of times the keyword appeared as the voice in the presentation and keyword database;
  
  recognize speech in a presentation in consultation with a dictionary database by;
  
  acquiring correspondence between a lapse of time from a start of the presentation and a result of speech recognition every moment, storing a correspondence between the time and the result of voice recognition in a subtitle database;
  
  access a page-time database that records a time that the page is turned and a time when the next page is turned, and calculates the weight of the keywords in the page based on a duration during which the page in question is displayed in the presentation;
  
  read the correspondence between the time and the result of speech recognition from the subtitle database, and displaying it on a subtitle candidate display region; and
  
  cause keywords stored in the keyword database, presentation text stored in the text attribute database, and a master subtitle stored in a master subtitle database to cooperate together for display; and
  
  generate speaker notes from subtitles stored in the subtitle database.
- View Dependent Claims (13, 14, 15)
- - 13. The program product according to claim 12, wherein the keyword that has been extracted is added to a dictionary to be consulted at the time of recognizing the speech.
  - 14. The program product according to claim 12, wherein a dictionary which belongs to a category suitable for the keyword that has been extracted is set as a dictionary to be consulted at the time of recognizing the speech.
  - 15. The program product according to claim 12, wherein the keyword that has been extracted is displayed together with the subtitle.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Negishi, Noriko, Miyamoto, Kohtaroh, Arakawa, Kenichi
Primary Examiner(s)
Smits; Talivaldis I
Assistant Examiner(s)
BORSETTI, GREG

Application Number

US11/338,100
Publication Number

US 20070048715A1
Time in Patent Office

1,604 Days
Field of Search

704/246, 704/9, 704/270, 707/3, 382/321
US Class Current

704/270
CPC Class Codes

G06F 40/258   Heading extraction; Automat...

G10L 15/26   Speech to text systems G10L...

G10L 2015/088   Word spotting

H04N 21/4884   for displaying subtitles

H04N 5/44504   Circuit details of the addi...

Subtitle generation and retrieval combining document with speech recognition

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Subtitle generation and retrieval combining document with speech recognition

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links