Information processing apparatus, information processing method and computer program product

US 9,798,804 B2
Filed: 06/26/2012
Issued: 10/24/2017
Est. Priority Date: 09/26/2011
Status: Active Grant

First Claim

Patent Images

1. An information processing apparatus for displaying a transcript generated by an automatic voice recognition process to a user for correction of the automatically generated transcripts by the user, the apparatus comprising:

a storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from an automatic voice recognition process with voice positional information, the voice positional information indicating a temporal position in voice data and corresponding to the character string, wherein the automatic voice recognition process is performed on an audio file to obtain a transcript of the audio file, wherein the voice text data is the transcript, and wherein the voice positional information is one or two timestamps associated with each character string of the transcript;

a display for displaying the transcript of the audio file to a transcribing user;

an input interface for receiving, from the transcribing user, selection of a portion of the audio file for playback, text input via a keyboard, and selection input for selecting one of a plurality of displayed options; and

a processor configured to;

detect playback section information in the voice data, the playback section information indicating temporal information from a start position instructed by the transcribing user to a stop position instructed by the transcribing user by correcting the transcript generated from performing the automatic voice recognition on the audio file via manual keyboard input to input interface;

receive text from the transcribing user during the transcribing operation, the transcribing user being a user who performs a transcribing operation, the transcribing operation being an operation in which the transcribing user inputs a text corresponding to the voice data being played back while listening to the voice data being played back, the playback section information being information indicating a part of the voice data, the part being a section for which playback has bcing been completed or a section that has already been played back at least once;

phonetically transcribe the character string or an initial portion of the character string to acquire reading information that is at least a part of a character string of a phrase to be transcribed from the voice data that has been played back, wherein the reading information is a phonetic pronunciation of at least part of the character string, and wherein the reading information is generated from the text input by the transcribing user in accordance with the transcribing operation by the transcribing user;

specify, as search targets, character strings whose associated voice positional information is included in the playback section information among the character strings included in the voice indices;

retrieve one or more character strings including a reading information matching the reading information input by the transcribing user, from among the specified character strings;

display, as input candidates to the transcribing user, when the character string retrieved from the search targets includes more than one retrieved character string, the character strings having reading information matching the reading information input by the transcribing user and also having voice positional information included in the voice positional information of the playback section;

receive a selection of one of the input candidates from the transcribing user as a replacement character string; and

replace the retrieved character string in the transcript with the replacement character string.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to an embodiment, an information processing apparatus includes a storage unit, a detector, an acquisition unit, and a search unit. The storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from a voice recognition process with voice positional information, the voice positional information indicating a temporal position in the voice data and corresponding to the character string. The acquisition unit acquires reading information being at least a part of a character string representing a reading of a phrase to be transcribed from the voice data played back. The search unit specifies, as search targets, character strings whose associated voice positional information is included in the played-back section information among the character strings included in the voice indices, and retrieves a character string including the reading represented by the reading information from among the specified character strings.

59 Citations

View as Search Results

10 Claims

1. An information processing apparatus for displaying a transcript generated by an automatic voice recognition process to a user for correction of the automatically generated transcripts by the user, the apparatus comprising:
- a storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from an automatic voice recognition process with voice positional information, the voice positional information indicating a temporal position in voice data and corresponding to the character string, wherein the automatic voice recognition process is performed on an audio file to obtain a transcript of the audio file, wherein the voice text data is the transcript, and wherein the voice positional information is one or two timestamps associated with each character string of the transcript;
  
  a display for displaying the transcript of the audio file to a transcribing user;
  
  an input interface for receiving, from the transcribing user, selection of a portion of the audio file for playback, text input via a keyboard, and selection input for selecting one of a plurality of displayed options; and
  
  a processor configured to;
  
  detect playback section information in the voice data, the playback section information indicating temporal information from a start position instructed by the transcribing user to a stop position instructed by the transcribing user by correcting the transcript generated from performing the automatic voice recognition on the audio file via manual keyboard input to input interface;
  
  receive text from the transcribing user during the transcribing operation, the transcribing user being a user who performs a transcribing operation, the transcribing operation being an operation in which the transcribing user inputs a text corresponding to the voice data being played back while listening to the voice data being played back, the playback section information being information indicating a part of the voice data, the part being a section for which playback has bcing been completed or a section that has already been played back at least once;
  
  phonetically transcribe the character string or an initial portion of the character string to acquire reading information that is at least a part of a character string of a phrase to be transcribed from the voice data that has been played back, wherein the reading information is a phonetic pronunciation of at least part of the character string, and wherein the reading information is generated from the text input by the transcribing user in accordance with the transcribing operation by the transcribing user;
  
  specify, as search targets, character strings whose associated voice positional information is included in the playback section information among the character strings included in the voice indices;
  
  retrieve one or more character strings including a reading information matching the reading information input by the transcribing user, from among the specified character strings;
  
  display, as input candidates to the transcribing user, when the character string retrieved from the search targets includes more than one retrieved character string, the character strings having reading information matching the reading information input by the transcribing user and also having voice positional information included in the voice positional information of the playback section;
  
  receive a selection of one of the input candidates from the transcribing user as a replacement character string; and
  
  replace the retrieved character string in the transcript with the replacement character string.
- View Dependent Claims (2, 3, 4)
- - 2. The apparatus according to claim 1, wherein the voice text data has a lattice structure that is a network structure in which recognition candidates are connected.
  - 3. The apparatus according to claim 1, further comprising a dictionary storage unit configured to store therein a dictionary in which a plurality of character strings is preregistered, whereinthe processor is further configured to retrieve a character string including the reading information from among the character strings registered in the dictionary storage unit.
  - 4. The apparatus according to claim 1, wherein the processor is further configured to play back the voice data.

5. An information processing method for displaying a transcript generated by an automatic voice recognition process to a user for correction of the automatically generated transcripts by the user, the method comprising:
- detecting playback section information in voice data, the playback section information indicating temporal information from a start position instructed by a transcribing user to a instructed by the transcribing user, the transcribing user being a user who performs a transcribing operation by correcting a transcript generated from performing an automatic voice recognition on an audio file via manual keyboard input to an input interface, wherein the input interface is for receiving, from the transcribing user, selection of a portion of the audio file for playback, text input via a keyboard, and selection input for selecting one of a plurality of displayed options;
  
  receiving text from the transcribing user during the transcribing operation, the transcribing operation being an operation in which the transcribing user inputs a text corresponding to the voice data being played back while listening to the voice data being played back, the playback section information being information indicating a part of the voice data, the part being a section for which playback has been completed or a section that has already been played back at least once;
  
  phonetically transcribing the character string or an initial portion of the character string to acquire reading information that is at least a part of a character string of a phrase to be transcribed from the voice data that has been played back, wherein the reading information is a phonetic pronunciation of at least part of the character string, and wherein the reading information is generated from the text input by the transcribing user in accordance with the transcribing operation by the transcribing user;
  
  specifying, among character strings included in voice indices, each of which associates a character string included in voice text data obtained from an automatic voice recognition process with voice positional information, the voice positional information indicating a temporal position in the voice data and corresponding to the character string, character strings whose associated voice positional information is included in the playback section information as search targets, wherein the automatic voice recognition process is performed on the audio file to obtain a transcript of the audio file, wherein the voice text data is the transcript, and wherein the voice positional information is one or two timestamps associated with each character string of the transcript; and
  
  retrieving one or more character strings including a reading information matching the reading information input by the transcribing user, from among the specified character strings;
  
  displaying, as input candidates to the transcribing user, when the character string retrieved from the search targets includes more than one retrieved character string, the character strings having reading information matching the reading information input by the transcribing user and also having voice positional information included in the voice positional information of the playback section;
  
  receiving a selection of one of the input candidates from the transcribing user as a replacement character string; and
  
  replacing the retrieved character string in the transcript with the replacement character string.

6. A computer program product comprising a non-transitory computer-readable medium including programmed instructions for displaying a transcript generated by an automatic voice recognition process to a user for correction of the automatically generated transcripts by the user, wherein the instructions, when executed by a computer, cause the computer to execute:
- detecting playback section information in voice data, the playback section information indicating temporal information from a start position instructed by the transcribing user to a stop position instructed by the transcribing user, the transcribing user being a user who performs a transcribing operation by correcting a transcript generated from performing an automatic voice recognition on an audio file via manual keyboard input to an input interface, wherein the input interface is for receiving, from the transcribing user, selection of a portion of the audio file for playback, text input via a keyboard, and selection input for selecting one of a plurality of displayed options;
  
  receiving text from the transcribing user during the transcribing operation, the transcribing operation being an operation in which the transcribing user inputs a text corresponding to the voice data being played back while listening to the voice data being played back, the playback section information being information indicating a part of the voice data, the part being a section for which playback has been completed or a section that has already been played back at least once;
  
  phonetically transcribing the character string or an initial portion of the character string to acquire reading information that is at least a part of a character string of a phrase to be transcribed from the voice data that has been played back, wherein the reading information is a phonetic pronunciation of at least part of the character string, and wherein the reading information is generated from the text input by the transcribing user in accordance with the transcribing operation by the transcribing user;
  
  specifying, among character strings included in voice indices, each of which associates a character string included in voice text data obtained from an automatic voice recognition process with voice positional information, the voice positional information indicating a temporal position in the voice data and corresponding to the character string, character strings whose associated voice positional information is included in the playback section information as search targets, wherein the automatic voice recognition process is performed on the audio file to obtain a transcript of the audio file, wherein the voice text data is the transcript, and wherein the voice positional information is one or two timestamps associated with each character string of the transcript; and
  
  retrieving one or more character strings including a reading information matching the reading information input by the transcribing user, from among the specified character strings;
  
  displaying, as input candidates to the transcribing user, when the character string retrieved from the search targets includes more than one retrieved character string, the character strings having reading information matching the reading information input by the transcribing user and also having voice positional information included in the voice positional information of the playback section;
  
  receiving a selection of one of the input candidates from the transcribing user as a replacement character string; and
  
  replacing the retrieved character string in the transcript with the replacement character string.

7. An information processing apparatus for displaying a transcript generated by an automatic voice recognition process to a user for correction of the automatically generated transcripts by the user, the apparatus comprising:
- a storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from an automatic voice recognition process with voice positional information, the voice positional information indicating a temporal position in voice data and corresponding to the character string, wherein the automatic voice recognition process is performed on an audio file to obtain a transcript of the audio file, wherein the voice text data is the transcript, and wherein the voice positional information is one or two timestamps associated with each character string of the transcript;
  
  a display for displaying the transcript of the audio file to a transcribing user;
  
  an input interface for receiving, from the transcribing user, selection of a portion of the audio file for playback, text input via a keyboard, and selection input for selecting one of a plurality of displayed options;
  
  a detecting circuit configured to detect playback section information in the voice data, the playback section information indicating temporal information from a start position instructed by the transcribing user to a stop position instructed by the transcribing user, the transcribing user being a user who performs a transcribing operation by correcting the transcript generated from performing the automatic voice recognition on the audio file via manual keyboard input to input interface;
  
  a text receiving circuit configured to receive text from the transcribing user during the transcribing operation, the transcribing operation being an operation in which the transcribing user inputs a text corresponding to the voice data being played back while listening to the voice data being played back, the playback section information being information indicating a part of the voice data, the part being a section for which playback has been completed or a section that has already been played back at least once;
  
  an acquiring circuit configured to phonetically transcribe the character string or an initial portion of the character string to acquire reading information that is at least a part of a character string of a phrase to be transcribed from the voice data that has been played back, wherein the reading information is a phonetic pronunciation of at least part of the character string, and wherein the reading information is generated from the text input by the transcribing user in accordance with the transcribing operation by the transcribing user;
  
  a searching circuit configured to specify, as search targets, character strings whose associated voice positional information is included in the playback section information among the character strings included in the voice indices, and retrieve one or more character strings including a reading information matching the reading information input by the transcribing user, from among the specified character strings;
  
  a displaying circuit configured to display, as input candidates to the transcribing user, when the character string retrieved from the search targets includes more than one retrieved character string, the character strings having reading information matching the reading information input by the transcribing user and also having voice positional information included in the voice positional information of the playback section;
  
  a selection receiving circuit configured to receive a selection of one of the input candidates from the transcribing user as a replacement character string; and
  
  a replacing circuit configured to replace the retrieved character string in the transcript with the replacement character string.
- View Dependent Claims (8, 9, 10)
- - 8. The apparatus according to claim 7, wherein the voice text data has a lattice structure that is a network structure in which recognition candidates are connected.
  - 9. The apparatus according to claim 7, further comprising a dictionary storage unit configured to store therein a dictionary in which a plurality of character strings is preregistered, whereinthe searching circuit retrieve a character string including the reading information from among the character strings registered in the dictionary storage unit.
  - 10. The apparatus according to claim 7, further comprising a playback circuit configured to play back the voice data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation), Toshiba Digital Solutions Corporation (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Shimogori, Nobuhiro, Ikeda, Tomoo, Ueno, Kouji, Nishiyama, Osamu, Suzuki, Hirokazu, Nagao, Manabu
Primary Examiner(s)
Sirjani, Fariba

Application Number

US13/533,091
Publication Number

US 20130080163A1
Time in Patent Office

1,946 Days
Field of Search

704235
US Class Current
CPC Class Codes

G06F 16/683 using metadata automaticall...

G10L 15/26 Speech to text systems G10L...

Information processing apparatus, information processing method and computer program product

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

59 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Information processing apparatus, information processing method and computer program product

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

59 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links