Transcription support system and transcription support method

US 10,304,457 B2
Filed: 03/15/2012
Issued: 05/28/2019
Est. Priority Date: 07/26/2011
Status: Expired due to Fees

First Claim

Patent Images

1. A text processing device comprising:

a memory having computer executable components stored therein; and

a processing circuit communicatively coupled to the memory, the processing circuit configured togenerate voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, the voice positional information indicative of a temporal position in voice data and corresponding to the character string;

create text in response to an operation input of a user; and

when determining that a last character string of the text does not match any of the character strings included in the voice indices and further determining that any of the character strings other than the last character string of the text matches any of the character strings included in the voice indices,retrieve, from the voice indices, the voice positional information corresponding to a basing character string indicative of a character string closest to the last character string among the character strings matched with any of the character strings included in the voice indices,estimate a first playback time indicative of a time necessary to play back mismatched character strings indicative of the character strings from the character string next to the basing character string to the last character string among the character strings constituting the text,estimate already-transcribed voice positional information from the voice positional information corresponding to the basing character string and the first playback time, the already-transcribed voice positional information indicative of a temporal position at which the creation of the text is completed in the voice data,set the temporal position indicated by the estimated already-transcribed voice positional information as a playback starting position, anda playback circuit configured to play back the voice data based on the already-transcribed voice positional information at the first playback time.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to one embodiment, a transcription support system supports transcription work to convert voice data to text. The system includes a first storage unit configured to store therein the voice data; a playback unit configured to play back the voice data; a second storage unit configured to store therein voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, for which the voice positional information is indicative of a temporal position in the voice data and corresponds to the character string; a text creating unit that creates the text in response to an operation input of a user; and an estimation unit configured to estimate already-transcribed voice positional information indicative of a position at which the creation of the text is completed in the voice data based on the voice indices.

Citations

10 Claims

1. A text processing device comprising:
- a memory having computer executable components stored therein; and
  
  a processing circuit communicatively coupled to the memory, the processing circuit configured togenerate voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, the voice positional information indicative of a temporal position in voice data and corresponding to the character string;
  
  create text in response to an operation input of a user; and
  
  when determining that a last character string of the text does not match any of the character strings included in the voice indices and further determining that any of the character strings other than the last character string of the text matches any of the character strings included in the voice indices,retrieve, from the voice indices, the voice positional information corresponding to a basing character string indicative of a character string closest to the last character string among the character strings matched with any of the character strings included in the voice indices,estimate a first playback time indicative of a time necessary to play back mismatched character strings indicative of the character strings from the character string next to the basing character string to the last character string among the character strings constituting the text,estimate already-transcribed voice positional information from the voice positional information corresponding to the basing character string and the first playback time, the already-transcribed voice positional information indicative of a temporal position at which the creation of the text is completed in the voice data,set the temporal position indicated by the estimated already-transcribed voice positional information as a playback starting position, anda playback circuit configured to play back the voice data based on the already-transcribed voice positional information at the first playback time.
- View Dependent Claims (2, 3, 4)
- - 2. The device according to claim 1, wherein a unit of each of the character strings constituting the created text is a morpheme.
  - 3. The device according to claim 1, wherein the processing circuit estimates the already-transcribed voice positional information by using a predetermined phoneme duration time.
  - 4. The device according to claim 3, wherein the processing circuitestimates the first playback time based on the predetermined phoneme duration time, andestimates voice positional information the first playback time ahead of the voice positional information corresponding to the basing character string as the already-transcribed voice positional information.

5. A text processing device comprising:
- a memory having computer executable components stored therein; and
  
  a processing circuit communicatively coupled to the memory, the processing circuit configured togenerate voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, the voice positional information indicative of a temporal position in voice data and corresponding to the character string;
  
  create text in response to an operation input of a user until a punctuation is input; and
  
  when determining that a last character string of the text does not match any of the character strings included in the voice indices and further determining that any of the character strings other than the last character string of the text matches any of the character strings included in the voice indices,retrieve, from the voice indices, the voice positional information corresponding to a basing character string indicative of a character string closest to the last character string among the character strings matched with any of the character strings included in the voice indices,estimate a first playback time indicative of a time necessary to play back mismatched character strings indicative of the character strings from the character string next to the basing character string to the last character string among the character strings constituting the text,estimate already-transcribed voice positional information from the voice positional information corresponding to the basing character string and the first playback time,set the temporal position indicated by the estimated already-transcribed voice positional information as a playback starting position, anda playback circuit configured to play back the voice data based on the already-transcribed voice positional information at the first playback time.
- View Dependent Claims (6)
- - 6. The device according to claim 5, wherein a unit of each of the character strings constituting the created text is a morpheme.

7. A text processing method comprising:
- generating voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, the voice positional information indicative of a temporal position in voice data and corresponding to the character string;
  
  creating text in response to an operation input of a user; and
  
  when it is determined that a last character string of the text does not match any of character strings that are included in the voice indices and when it is further determined that any of the character strings other than the last character string of the text matches any of the character strings included in the voice indices,retrieving, from the voice indices, the voice positional information corresponding to a basing character string indicative of a character string closest to the last character string among the character strings matched with any of the character strings included in the voice indices,first estimating a first playback time indicative of a time necessary to play back mismatched character strings indicative of the character strings from the character string next to the basing character string to the last character string among the character strings constituting the text,second estimating already-transcribed voice positional information from the voice positional information corresponding to the basing character string and the first playback time, the already-transcribed voice positional information indicative of a temporal position at which the creation of the text is completed in the voice data,setting the temporal position indicated by the estimated already-transcribed voice positional information as a playback starting position, andplaying back the voice data based on the already-transcribed voice positional information.
- View Dependent Claims (8, 9, 10)
- - 8. The method according to claim 7, wherein the creating includes creating the text in accordance with an input of the user who listens to the voice data.
  - 9. The method according to claim 7, wherein a unit of each of the character strings constituting the created text is a morpheme.
  - 10. The method according to claim 7, wherein the second estimating includes estimating the already-transcribed voice positional information by using a predetermined phoneme duration time.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Suzuki, Hirokazu, Shimogori, Nobuhiro, Ikeda, Tomoo, Ueno, Kouji, Nishiyama, Osamu, Nagao, Manabu
Primary Examiner(s)
Baker, Matthew H

Application Number

US13/420,827
Publication Number

US 20130030805A1
Time in Patent Office

2,630 Days
Field of Search

704211, 704231, 704235, 704260, 704270, 704276
US Class Current
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

Transcription support system and transcription support method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Transcription support system and transcription support method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links