Transcription support system and transcription support method
First Claim
1. A text processing device comprising:
- a memory having computer executable components stored therein; and
a processing circuit communicatively coupled to the memory, the processing circuit configured togenerate voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, the voice positional information indicative of a temporal position in voice data and corresponding to the character string;
create text in response to an operation input of a user; and
when determining that a last character string of the text does not match any of the character strings included in the voice indices and further determining that any of the character strings other than the last character string of the text matches any of the character strings included in the voice indices,retrieve, from the voice indices, the voice positional information corresponding to a basing character string indicative of a character string closest to the last character string among the character strings matched with any of the character strings included in the voice indices,estimate a first playback time indicative of a time necessary to play back mismatched character strings indicative of the character strings from the character string next to the basing character string to the last character string among the character strings constituting the text,estimate already-transcribed voice positional information from the voice positional information corresponding to the basing character string and the first playback time, the already-transcribed voice positional information indicative of a temporal position at which the creation of the text is completed in the voice data,set the temporal position indicated by the estimated already-transcribed voice positional information as a playback starting position, anda playback circuit configured to play back the voice data based on the already-transcribed voice positional information at the first playback time.
1 Assignment
0 Petitions
Accused Products
Abstract
According to one embodiment, a transcription support system supports transcription work to convert voice data to text. The system includes a first storage unit configured to store therein the voice data; a playback unit configured to play back the voice data; a second storage unit configured to store therein voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, for which the voice positional information is indicative of a temporal position in the voice data and corresponds to the character string; a text creating unit that creates the text in response to an operation input of a user; and an estimation unit configured to estimate already-transcribed voice positional information indicative of a position at which the creation of the text is completed in the voice data based on the voice indices.
-
Citations
10 Claims
-
1. A text processing device comprising:
-
a memory having computer executable components stored therein; and
a processing circuit communicatively coupled to the memory, the processing circuit configured togenerate voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, the voice positional information indicative of a temporal position in voice data and corresponding to the character string; create text in response to an operation input of a user; and when determining that a last character string of the text does not match any of the character strings included in the voice indices and further determining that any of the character strings other than the last character string of the text matches any of the character strings included in the voice indices, retrieve, from the voice indices, the voice positional information corresponding to a basing character string indicative of a character string closest to the last character string among the character strings matched with any of the character strings included in the voice indices, estimate a first playback time indicative of a time necessary to play back mismatched character strings indicative of the character strings from the character string next to the basing character string to the last character string among the character strings constituting the text, estimate already-transcribed voice positional information from the voice positional information corresponding to the basing character string and the first playback time, the already-transcribed voice positional information indicative of a temporal position at which the creation of the text is completed in the voice data, set the temporal position indicated by the estimated already-transcribed voice positional information as a playback starting position, and a playback circuit configured to play back the voice data based on the already-transcribed voice positional information at the first playback time. - View Dependent Claims (2, 3, 4)
-
-
5. A text processing device comprising:
-
a memory having computer executable components stored therein; and
a processing circuit communicatively coupled to the memory, the processing circuit configured togenerate voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, the voice positional information indicative of a temporal position in voice data and corresponding to the character string; create text in response to an operation input of a user until a punctuation is input; and when determining that a last character string of the text does not match any of the character strings included in the voice indices and further determining that any of the character strings other than the last character string of the text matches any of the character strings included in the voice indices, retrieve, from the voice indices, the voice positional information corresponding to a basing character string indicative of a character string closest to the last character string among the character strings matched with any of the character strings included in the voice indices, estimate a first playback time indicative of a time necessary to play back mismatched character strings indicative of the character strings from the character string next to the basing character string to the last character string among the character strings constituting the text, estimate already-transcribed voice positional information from the voice positional information corresponding to the basing character string and the first playback time, set the temporal position indicated by the estimated already-transcribed voice positional information as a playback starting position, and a playback circuit configured to play back the voice data based on the already-transcribed voice positional information at the first playback time. - View Dependent Claims (6)
-
-
7. A text processing method comprising:
-
generating voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, the voice positional information indicative of a temporal position in voice data and corresponding to the character string; creating text in response to an operation input of a user; and when it is determined that a last character string of the text does not match any of character strings that are included in the voice indices and when it is further determined that any of the character strings other than the last character string of the text matches any of the character strings included in the voice indices, retrieving, from the voice indices, the voice positional information corresponding to a basing character string indicative of a character string closest to the last character string among the character strings matched with any of the character strings included in the voice indices, first estimating a first playback time indicative of a time necessary to play back mismatched character strings indicative of the character strings from the character string next to the basing character string to the last character string among the character strings constituting the text, second estimating already-transcribed voice positional information from the voice positional information corresponding to the basing character string and the first playback time, the already-transcribed voice positional information indicative of a temporal position at which the creation of the text is completed in the voice data, setting the temporal position indicated by the estimated already-transcribed voice positional information as a playback starting position, and playing back the voice data based on the already-transcribed voice positional information. - View Dependent Claims (8, 9, 10)
-
Specification