Subtitle generation and retrieval combining document with speech recognition
First Claim
1. An apparatus for recognizing speech in a presentation to generate a subtitle corresponding to the speech, said apparatus comprising:
- a text extraction unit that receives presentation text and its attributes from a presentation document, and stores said text and attributes in the text attribute database on a page-by-page basis, wherein the attributes comprise a title, character size, character underlining, or boldface character;
a morphological analysis unit that morphologically analyzes the presentation text stored in the text attribute database, decomposes said presentation text into words, and stores the words in a word attribute database;
a common keyword generation unit that extracts the words and their attributes from the word attribute database, determines whether or not a word has been successfully extracted, initializes attribute weights of the words and extracts the attribute weights from an attribute weight database and sums them if it is determined that the word extraction is successful, extracts keywords that are found in the presentation document and assigns weights to the keywords, then selects as an additional keyword to add to the keyword database any word that has been determined, based on time and attribute weight, to represent a high level of importance among the words contained in the presentation;
a dictionary registration unit that adds the keywords registered in a keyword database to a dictionary database that is consulted at time of speech recognition;
a voice recognition unit that recognizes the speech in the presentation in consultation with the dictionary database by;
acquiring correspondence between a lapse of time from a start of the presentation and a result of voice recognition every moment, stores a correspondence between the time and the result of voice recognition in a subtitle database;
a page-time recording unit that detects a page-changing event and stores the events as timestamps in a page-time database;
a common keyword regeneration unit that initializes the keyword database, extracts a word, an attribute of the word and information about the page where the word appeared from the word attribute database, and further assigns weight depending on a number of times the keyword appeared as the voice in the presentation;
a display control unit that reads a correspondence between the time and the result of speech recognition from the subtitle database, and displays said correspondence on a subtitle candidate display region, causes keywords stored in the keyword database, presentation text stored in the text attribute database, and a master subtitle stored in a master subtitle database to cooperate together for display as a subtitle to the presentation, and accesses the page-time database and specifies the page corresponding to the result of voice recognition on the basis of the time information;
a display unit comprising;
the subtitle candidate display region, a common keyword list display region, a presentation text display region, and a master subtitle display region;
a speaker note generation unit that generates speaker notes from subtitles stored in the subtitle database and embeds them in presentation documents;
the text attribute database;
the word attribute database that stores the words obtained as a result of the decomposition performed by the morphological analysis unit, and their attributes;
the attribute weight database that stores presentation word attributes and their assigned weights;
the keyword database that stores the weighted words as keywords;
the dictionary database;
the subtitle database that stores, together with the time, the result of speech-recognition as the subtitle;
the page-time database that records a time that the page is turned and a time when the next page is turned, and calculates the weight of the keywords in the page based on a duration during which the page in question is displayed in the presentation, when it is determined that extraction of the word has been successful; and
a master subtitle database that stores master subtitles on a page-by-page basis.
8 Assignments
0 Petitions
Accused Products
Abstract
Provides subtitle generation methods and apparatus which recognizes voice in a presentation to generate subtitles thereof, and retrieval apparatus for retrieving character strings by use of the subtitles. An apparatus of the present invention includes: a extraction unit for extracting text from presentation documents; an analysis unit for morphologically analyzing text to decompose it into words; a generation unit for generating common keywords by assigning weights to words; a registration unit for adding common keywords to a voice recognition dictionary; a recognition unit for recognizing voice in a presentation; a record unit for recording the correspondence between page and time by detecting page switching events; a regeneration unit for regenerating common keywords by further referring to the correspondence between page and time; a control unit for controlling the display of subtitles, common keywords, text and master subtitles; and a note generation unit for generating speaker notes from subtitles.
-
Citations
15 Claims
-
1. An apparatus for recognizing speech in a presentation to generate a subtitle corresponding to the speech, said apparatus comprising:
-
a text extraction unit that receives presentation text and its attributes from a presentation document, and stores said text and attributes in the text attribute database on a page-by-page basis, wherein the attributes comprise a title, character size, character underlining, or boldface character; a morphological analysis unit that morphologically analyzes the presentation text stored in the text attribute database, decomposes said presentation text into words, and stores the words in a word attribute database; a common keyword generation unit that extracts the words and their attributes from the word attribute database, determines whether or not a word has been successfully extracted, initializes attribute weights of the words and extracts the attribute weights from an attribute weight database and sums them if it is determined that the word extraction is successful, extracts keywords that are found in the presentation document and assigns weights to the keywords, then selects as an additional keyword to add to the keyword database any word that has been determined, based on time and attribute weight, to represent a high level of importance among the words contained in the presentation; a dictionary registration unit that adds the keywords registered in a keyword database to a dictionary database that is consulted at time of speech recognition; a voice recognition unit that recognizes the speech in the presentation in consultation with the dictionary database by;
acquiring correspondence between a lapse of time from a start of the presentation and a result of voice recognition every moment, stores a correspondence between the time and the result of voice recognition in a subtitle database;a page-time recording unit that detects a page-changing event and stores the events as timestamps in a page-time database; a common keyword regeneration unit that initializes the keyword database, extracts a word, an attribute of the word and information about the page where the word appeared from the word attribute database, and further assigns weight depending on a number of times the keyword appeared as the voice in the presentation; a display control unit that reads a correspondence between the time and the result of speech recognition from the subtitle database, and displays said correspondence on a subtitle candidate display region, causes keywords stored in the keyword database, presentation text stored in the text attribute database, and a master subtitle stored in a master subtitle database to cooperate together for display as a subtitle to the presentation, and accesses the page-time database and specifies the page corresponding to the result of voice recognition on the basis of the time information; a display unit comprising;
the subtitle candidate display region, a common keyword list display region, a presentation text display region, and a master subtitle display region;a speaker note generation unit that generates speaker notes from subtitles stored in the subtitle database and embeds them in presentation documents; the text attribute database; the word attribute database that stores the words obtained as a result of the decomposition performed by the morphological analysis unit, and their attributes; the attribute weight database that stores presentation word attributes and their assigned weights; the keyword database that stores the weighted words as keywords; the dictionary database; the subtitle database that stores, together with the time, the result of speech-recognition as the subtitle; the page-time database that records a time that the page is turned and a time when the next page is turned, and calculates the weight of the keywords in the page based on a duration during which the page in question is displayed in the presentation, when it is determined that extraction of the word has been successful; and a master subtitle database that stores master subtitles on a page-by-page basis. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of causing a computer to combine a processing of a document having a plurality of pages with a processing of speech generated with reference to the document, comprising the steps of:
-
receiving presentation text and its attributes from the document; storing the presentation text and the attributes on a page-by-page basis, wherein said attributes comprise a title, character size, character underlining, or boldface character; decomposing the presentation text into words; storing the words in a word attribute database; extracting the words and their attributes from the word attribute database; accessing a keyword database; extracting the keywords that are common in the document; assigns weight to the keywords depending on a number of times the keyword appeared as the voice in the presentation and their attributes; recognizing the speech in the presentation in consultation with a dictionary database by;
acquiring correspondence between a lapse of time from a start of the presentation and a result of speech recognition every moment, storing a correspondence between the time and the result of speech recognition in a subtitle database;accessing a page-time database that records a time that the page is turned and a time when the next page is turned, and calculates the weight of the keywords in the page based on a duration during which the page in question is displayed in the presentation; specifying the page corresponding to the result of voice recognition on the basis of the time information; wherein the computer determines, among subtitles obtained by recognizing the speech, a specific subtitle obtained by recognizing speech generated with reference to a specific page of the document by; deriving a correspondence between the time and the result of speech recognition from the subtitle database, and displaying it on a subtitle candidate display region; and
causing keywords stored in the keyword database, presentation text stored in the text attribute database, and a master subtitle stored in a master subtitle database to cooperate together for display; andgenerating speaker notes from subtitles stored in the subtitle database. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A program product stored on a computer readable medium comprising program code, that when executed, allows a computer to:
-
receive presentation text and its attributes from a document, wherein said attributes comprise a title, character size, character underlining, or boldface character; store the presentation text and the attributes on a page-by-page basis; decompose the presentation text into words; access a word attribute database; extract the decomposed words and the assigned weights for their corresponding attributes; access a keyword database; extract the keywords that are found in the document, along with an assigned weight for each keyword when the weight is based on the number of times the keyword appeared as the voice in the presentation and keyword database; recognize speech in a presentation in consultation with a dictionary database by;
acquiring correspondence between a lapse of time from a start of the presentation and a result of speech recognition every moment, storing a correspondence between the time and the result of voice recognition in a subtitle database;access a page-time database that records a time that the page is turned and a time when the next page is turned, and calculates the weight of the keywords in the page based on a duration during which the page in question is displayed in the presentation; read the correspondence between the time and the result of speech recognition from the subtitle database, and displaying it on a subtitle candidate display region; and cause keywords stored in the keyword database, presentation text stored in the text attribute database, and a master subtitle stored in a master subtitle database to cooperate together for display; and generate speaker notes from subtitles stored in the subtitle database. - View Dependent Claims (13, 14, 15)
-
Specification