Subtitle generation and retrieval combining document processing with voice processing
First Claim
Patent Images
1. An apparatus for retrieving a character string, said apparatus comprising:
- a central processing unit configured for;
recognizing speech in a presentation to produce subtitles;
initializing a keyword database;
extracting keywords from document data used in the presentation;
adding the extracted keywords to a voice recognition dictionary; and
assigning word attribute weights to the keywords extracted from the presentation in accordance with the word attributes of said keywords, wherein said word attributes describe how the keywords appear in the document data;
recording a time that the page is turned and a time when a next page is turned to determine time spent on the page;
calculating a weight of the keywords on a page based on a combination of;
a duration during which said page is displayed in the presentation when it is determined that the extraction of the keyword has been successful; and
the word attribute weights assigned to the keywords, the word attributes comprising at least one of;
a title, character size, character underlining, and boldface character;
storage for storing;
the subtitles;
the keywords;
associated information of the subtitles and the keywords;
the word attributes read from a word attribute database; and
the time spent on the page when it is determined that the extraction of the keywords has been successful, wherein the time spent on the page is stored as a timestamp;
the word attribute database storing the word attributes; and
the central processing unit used as a retrieval device for retrieving, by use of the associated information, the character string from text data composed of the subtitles and the keywords.
7 Assignments
0 Petitions
Accused Products
Abstract
An apparatus for retrieving a character string includes: storage for storing text data obtained by recognizing a voice in a presentation, second text data extracted from document data used in the presentation, and associated information of the first text data and the second text data. The apparatus also includes a retrieval unit for retrieving, by use of the associated information, the character string from text data composed from the first text data and the second text data.
18 Citations
14 Claims
-
1. An apparatus for retrieving a character string, said apparatus comprising:
-
a central processing unit configured for; recognizing speech in a presentation to produce subtitles; initializing a keyword database; extracting keywords from document data used in the presentation; adding the extracted keywords to a voice recognition dictionary; and assigning word attribute weights to the keywords extracted from the presentation in accordance with the word attributes of said keywords, wherein said word attributes describe how the keywords appear in the document data; recording a time that the page is turned and a time when a next page is turned to determine time spent on the page; calculating a weight of the keywords on a page based on a combination of; a duration during which said page is displayed in the presentation when it is determined that the extraction of the keyword has been successful; and the word attribute weights assigned to the keywords, the word attributes comprising at least one of;
a title, character size, character underlining, and boldface character;storage for storing; the subtitles; the keywords; associated information of the subtitles and the keywords; the word attributes read from a word attribute database; and the time spent on the page when it is determined that the extraction of the keywords has been successful, wherein the time spent on the page is stored as a timestamp; the word attribute database storing the word attributes; and the central processing unit used as a retrieval device for retrieving, by use of the associated information, the character string from text data composed of the subtitles and the keywords. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory storage device comprising a program product which allows a computer to realize:
-
using a processor device configured to perform; recognizing speech in a presentation to produce subtitles; extracting keywords from document data used in the presentation; adding the extracted keywords to a voice recognition dictionary; and assigning word attribute weights to the keywords extracted from the presentation based on word attributes read from a word attribute database, wherein said word attributes comprise at least one of;
a title, character size, character underlining, and boldface character;recording a time that the page is turned and a time when a next page is turned to determine time spent on the page; storing the time spent as a timestamp; calculating a weight of the keywords on a page based on a combination of; a duration during which said page is displayed in the presentation when it is determined that the extraction of the keyword has been successful; and the word attribute weights assigned to the keywords; a function of determining, among the subtitles obtained by recognizing speech generated with reference to a predetermined document, a specific subtitle obtained by recognizing the speech generated with reference to a specific page of the document; and a function of storing a correspondence between the specific subtitle and the specific page. - View Dependent Claims (13, 14)
-
Specification