Information processing apparatus, information processing method and computer program product
First Claim
1. An information processing apparatus for displaying a transcript generated by an automatic voice recognition process to a user for correction of the automatically generated transcripts by the user, the apparatus comprising:
- a storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from an automatic voice recognition process with voice positional information, the voice positional information indicating a temporal position in voice data and corresponding to the character string, wherein the automatic voice recognition process is performed on an audio file to obtain a transcript of the audio file, wherein the voice text data is the transcript, and wherein the voice positional information is one or two timestamps associated with each character string of the transcript;
a display for displaying the transcript of the audio file to a transcribing user;
an input interface for receiving, from the transcribing user, selection of a portion of the audio file for playback, text input via a keyboard, and selection input for selecting one of a plurality of displayed options; and
a processor configured to;
detect playback section information in the voice data, the playback section information indicating temporal information from a start position instructed by the transcribing user to a stop position instructed by the transcribing user by correcting the transcript generated from performing the automatic voice recognition on the audio file via manual keyboard input to input interface;
receive text from the transcribing user during the transcribing operation, the transcribing user being a user who performs a transcribing operation, the transcribing operation being an operation in which the transcribing user inputs a text corresponding to the voice data being played back while listening to the voice data being played back, the playback section information being information indicating a part of the voice data, the part being a section for which playback has bcing been completed or a section that has already been played back at least once;
phonetically transcribe the character string or an initial portion of the character string to acquire reading information that is at least a part of a character string of a phrase to be transcribed from the voice data that has been played back, wherein the reading information is a phonetic pronunciation of at least part of the character string, and wherein the reading information is generated from the text input by the transcribing user in accordance with the transcribing operation by the transcribing user;
specify, as search targets, character strings whose associated voice positional information is included in the playback section information among the character strings included in the voice indices;
retrieve one or more character strings including a reading information matching the reading information input by the transcribing user, from among the specified character strings;
display, as input candidates to the transcribing user, when the character string retrieved from the search targets includes more than one retrieved character string, the character strings having reading information matching the reading information input by the transcribing user and also having voice positional information included in the voice positional information of the playback section;
receive a selection of one of the input candidates from the transcribing user as a replacement character string; and
replace the retrieved character string in the transcript with the replacement character string.
4 Assignments
0 Petitions
Accused Products
Abstract
According to an embodiment, an information processing apparatus includes a storage unit, a detector, an acquisition unit, and a search unit. The storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from a voice recognition process with voice positional information, the voice positional information indicating a temporal position in the voice data and corresponding to the character string. The acquisition unit acquires reading information being at least a part of a character string representing a reading of a phrase to be transcribed from the voice data played back. The search unit specifies, as search targets, character strings whose associated voice positional information is included in the played-back section information among the character strings included in the voice indices, and retrieves a character string including the reading represented by the reading information from among the specified character strings.
59 Citations
10 Claims
-
1. An information processing apparatus for displaying a transcript generated by an automatic voice recognition process to a user for correction of the automatically generated transcripts by the user, the apparatus comprising:
-
a storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from an automatic voice recognition process with voice positional information, the voice positional information indicating a temporal position in voice data and corresponding to the character string, wherein the automatic voice recognition process is performed on an audio file to obtain a transcript of the audio file, wherein the voice text data is the transcript, and wherein the voice positional information is one or two timestamps associated with each character string of the transcript; a display for displaying the transcript of the audio file to a transcribing user; an input interface for receiving, from the transcribing user, selection of a portion of the audio file for playback, text input via a keyboard, and selection input for selecting one of a plurality of displayed options; and a processor configured to; detect playback section information in the voice data, the playback section information indicating temporal information from a start position instructed by the transcribing user to a stop position instructed by the transcribing user by correcting the transcript generated from performing the automatic voice recognition on the audio file via manual keyboard input to input interface; receive text from the transcribing user during the transcribing operation, the transcribing user being a user who performs a transcribing operation, the transcribing operation being an operation in which the transcribing user inputs a text corresponding to the voice data being played back while listening to the voice data being played back, the playback section information being information indicating a part of the voice data, the part being a section for which playback has bcing been completed or a section that has already been played back at least once; phonetically transcribe the character string or an initial portion of the character string to acquire reading information that is at least a part of a character string of a phrase to be transcribed from the voice data that has been played back, wherein the reading information is a phonetic pronunciation of at least part of the character string, and wherein the reading information is generated from the text input by the transcribing user in accordance with the transcribing operation by the transcribing user; specify, as search targets, character strings whose associated voice positional information is included in the playback section information among the character strings included in the voice indices; retrieve one or more character strings including a reading information matching the reading information input by the transcribing user, from among the specified character strings; display, as input candidates to the transcribing user, when the character string retrieved from the search targets includes more than one retrieved character string, the character strings having reading information matching the reading information input by the transcribing user and also having voice positional information included in the voice positional information of the playback section; receive a selection of one of the input candidates from the transcribing user as a replacement character string; and replace the retrieved character string in the transcript with the replacement character string. - View Dependent Claims (2, 3, 4)
-
-
5. An information processing method for displaying a transcript generated by an automatic voice recognition process to a user for correction of the automatically generated transcripts by the user, the method comprising:
-
detecting playback section information in voice data, the playback section information indicating temporal information from a start position instructed by a transcribing user to a instructed by the transcribing user, the transcribing user being a user who performs a transcribing operation by correcting a transcript generated from performing an automatic voice recognition on an audio file via manual keyboard input to an input interface, wherein the input interface is for receiving, from the transcribing user, selection of a portion of the audio file for playback, text input via a keyboard, and selection input for selecting one of a plurality of displayed options; receiving text from the transcribing user during the transcribing operation, the transcribing operation being an operation in which the transcribing user inputs a text corresponding to the voice data being played back while listening to the voice data being played back, the playback section information being information indicating a part of the voice data, the part being a section for which playback has been completed or a section that has already been played back at least once; phonetically transcribing the character string or an initial portion of the character string to acquire reading information that is at least a part of a character string of a phrase to be transcribed from the voice data that has been played back, wherein the reading information is a phonetic pronunciation of at least part of the character string, and wherein the reading information is generated from the text input by the transcribing user in accordance with the transcribing operation by the transcribing user; specifying, among character strings included in voice indices, each of which associates a character string included in voice text data obtained from an automatic voice recognition process with voice positional information, the voice positional information indicating a temporal position in the voice data and corresponding to the character string, character strings whose associated voice positional information is included in the playback section information as search targets, wherein the automatic voice recognition process is performed on the audio file to obtain a transcript of the audio file, wherein the voice text data is the transcript, and wherein the voice positional information is one or two timestamps associated with each character string of the transcript; and retrieving one or more character strings including a reading information matching the reading information input by the transcribing user, from among the specified character strings; displaying, as input candidates to the transcribing user, when the character string retrieved from the search targets includes more than one retrieved character string, the character strings having reading information matching the reading information input by the transcribing user and also having voice positional information included in the voice positional information of the playback section; receiving a selection of one of the input candidates from the transcribing user as a replacement character string; and replacing the retrieved character string in the transcript with the replacement character string.
-
-
6. A computer program product comprising a non-transitory computer-readable medium including programmed instructions for displaying a transcript generated by an automatic voice recognition process to a user for correction of the automatically generated transcripts by the user, wherein the instructions, when executed by a computer, cause the computer to execute:
-
detecting playback section information in voice data, the playback section information indicating temporal information from a start position instructed by the transcribing user to a stop position instructed by the transcribing user, the transcribing user being a user who performs a transcribing operation by correcting a transcript generated from performing an automatic voice recognition on an audio file via manual keyboard input to an input interface, wherein the input interface is for receiving, from the transcribing user, selection of a portion of the audio file for playback, text input via a keyboard, and selection input for selecting one of a plurality of displayed options; receiving text from the transcribing user during the transcribing operation, the transcribing operation being an operation in which the transcribing user inputs a text corresponding to the voice data being played back while listening to the voice data being played back, the playback section information being information indicating a part of the voice data, the part being a section for which playback has been completed or a section that has already been played back at least once; phonetically transcribing the character string or an initial portion of the character string to acquire reading information that is at least a part of a character string of a phrase to be transcribed from the voice data that has been played back, wherein the reading information is a phonetic pronunciation of at least part of the character string, and wherein the reading information is generated from the text input by the transcribing user in accordance with the transcribing operation by the transcribing user; specifying, among character strings included in voice indices, each of which associates a character string included in voice text data obtained from an automatic voice recognition process with voice positional information, the voice positional information indicating a temporal position in the voice data and corresponding to the character string, character strings whose associated voice positional information is included in the playback section information as search targets, wherein the automatic voice recognition process is performed on the audio file to obtain a transcript of the audio file, wherein the voice text data is the transcript, and wherein the voice positional information is one or two timestamps associated with each character string of the transcript; and retrieving one or more character strings including a reading information matching the reading information input by the transcribing user, from among the specified character strings; displaying, as input candidates to the transcribing user, when the character string retrieved from the search targets includes more than one retrieved character string, the character strings having reading information matching the reading information input by the transcribing user and also having voice positional information included in the voice positional information of the playback section; receiving a selection of one of the input candidates from the transcribing user as a replacement character string; and replacing the retrieved character string in the transcript with the replacement character string.
-
-
7. An information processing apparatus for displaying a transcript generated by an automatic voice recognition process to a user for correction of the automatically generated transcripts by the user, the apparatus comprising:
-
a storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from an automatic voice recognition process with voice positional information, the voice positional information indicating a temporal position in voice data and corresponding to the character string, wherein the automatic voice recognition process is performed on an audio file to obtain a transcript of the audio file, wherein the voice text data is the transcript, and wherein the voice positional information is one or two timestamps associated with each character string of the transcript; a display for displaying the transcript of the audio file to a transcribing user; an input interface for receiving, from the transcribing user, selection of a portion of the audio file for playback, text input via a keyboard, and selection input for selecting one of a plurality of displayed options; a detecting circuit configured to detect playback section information in the voice data, the playback section information indicating temporal information from a start position instructed by the transcribing user to a stop position instructed by the transcribing user, the transcribing user being a user who performs a transcribing operation by correcting the transcript generated from performing the automatic voice recognition on the audio file via manual keyboard input to input interface; a text receiving circuit configured to receive text from the transcribing user during the transcribing operation, the transcribing operation being an operation in which the transcribing user inputs a text corresponding to the voice data being played back while listening to the voice data being played back, the playback section information being information indicating a part of the voice data, the part being a section for which playback has been completed or a section that has already been played back at least once; an acquiring circuit configured to phonetically transcribe the character string or an initial portion of the character string to acquire reading information that is at least a part of a character string of a phrase to be transcribed from the voice data that has been played back, wherein the reading information is a phonetic pronunciation of at least part of the character string, and wherein the reading information is generated from the text input by the transcribing user in accordance with the transcribing operation by the transcribing user; a searching circuit configured to specify, as search targets, character strings whose associated voice positional information is included in the playback section information among the character strings included in the voice indices, and retrieve one or more character strings including a reading information matching the reading information input by the transcribing user, from among the specified character strings; a displaying circuit configured to display, as input candidates to the transcribing user, when the character string retrieved from the search targets includes more than one retrieved character string, the character strings having reading information matching the reading information input by the transcribing user and also having voice positional information included in the voice positional information of the playback section; a selection receiving circuit configured to receive a selection of one of the input candidates from the transcribing user as a replacement character string; and a replacing circuit configured to replace the retrieved character string in the transcript with the replacement character string. - View Dependent Claims (8, 9, 10)
-
Specification