Apparatus for synchronously processing text data and voice data

US 9,679,566 B2
Filed: 06/29/2015
Issued: 06/13/2017
Est. Priority Date: 06/30/2014
Status: Active Grant

First Claim

Patent Images

1. An apparatus for synchronously processing text data and voice data, comprising:

a storing unit that stores text data constituted by a plurality of phrases and voice data of the text data; and

a central processing unit (CPU) which performs;

dividing the text data stored in the storing unit into the phrases and storing the divided text data, with identifiers which respectively correspond to the divided text data and indicate the division order, in the storing unit;

phonemically converting the divided text data, phrase by phrase, to obtain text data phoneme conversion values and storing the text data phoneme conversion values, which respectively correspond to the phrases, in the storing unit;

calculating accumulated values of the text data phoneme conversion value of each phrase of the divided text data by calculating percentage of the text data phoneme conversion accumulated value TN of each of the phrases to the text data phoneme conversion accumulated value TN of the final phrase of the text data TD, to the second decimal point and by multiplying the percentage of the text data phoneme conversion accumulated value TN of each of the divided text data DTD by 100 and storing the accumulated values, which respectively correspond to the phrases of the divided text data, in the storing unit;

extracting a silent segment, from the voice data, on the basis of a predetermined silent segment decision datum, dividing the voice data in the extracted silent segment, and storing the divided voice data, with identifiers which respectively correspond to the divided voice data and indicate the division order, in the storing unit;

phonemically converting the divided voice data, which have been divided division range by division range, to obtain voice data phoneme conversion values and storing the voice data phoneme conversion values, which respectively correspond to the division ranges, in the storing unit;

calculating accumulated values of the voice data phoneme conversion value of each division range of the divided voice data by calculating percentage of the calculated voice data phoneme conversion accumulated values SN to a total value of the calculated voice data phoneme conversion accumulated values SN, to the second decimal point and multiplying the percentage by 100 and storing the accumulated values, which respectively correspond to the division ranges of the divided voice data, in the storing unit;

extracting the nearest approximate values of the voice data phoneme accumulated values with respect to the text data phoneme conversion accumulated values corresponding to the phrases of the divided text data, and producing phrase corresponding data, in which the voice data phoneme conversion accumulated values respectively corresponding to the phrases of the divided text data are associated with identifiers indicating playback order of the phrases of the divided text data; and

outputting the corresponding phrases of the text data and the divided voice data, which correspond to each other, on the basis of the phrase corresponding data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The apparatus for synchronously processing text data and voice data, comprises: a storing unit for storing text data and voice data; a text data dividing section for dividing the text data; a text data phoneme converting section for phonemically converting the divided text data; a text data phoneme conversion accumulated value calculating section for calculating accumulated values of text data phoneme conversion values; a voice data dividing section for dividing the voice data; a reading data phoneme converting section for phonemically converting the divided voice data; a voice data phoneme conversion accumulated value calculating section for calculating accumulated values of voice data phoneme conversion values; a phrase corresponding data producing section for producing phrase corresponding data; and an output section for synchronously outputting the text data and the divided voice data.

15 Citations

6 Claims

1. An apparatus for synchronously processing text data and voice data, comprising:
- a storing unit that stores text data constituted by a plurality of phrases and voice data of the text data; and
  
  a central processing unit (CPU) which performs;
  
  dividing the text data stored in the storing unit into the phrases and storing the divided text data, with identifiers which respectively correspond to the divided text data and indicate the division order, in the storing unit;
  
  phonemically converting the divided text data, phrase by phrase, to obtain text data phoneme conversion values and storing the text data phoneme conversion values, which respectively correspond to the phrases, in the storing unit;
  
  calculating accumulated values of the text data phoneme conversion value of each phrase of the divided text data by calculating percentage of the text data phoneme conversion accumulated value TN of each of the phrases to the text data phoneme conversion accumulated value TN of the final phrase of the text data TD, to the second decimal point and by multiplying the percentage of the text data phoneme conversion accumulated value TN of each of the divided text data DTD by 100 and storing the accumulated values, which respectively correspond to the phrases of the divided text data, in the storing unit;
  
  extracting a silent segment, from the voice data, on the basis of a predetermined silent segment decision datum, dividing the voice data in the extracted silent segment, and storing the divided voice data, with identifiers which respectively correspond to the divided voice data and indicate the division order, in the storing unit;
  
  phonemically converting the divided voice data, which have been divided division range by division range, to obtain voice data phoneme conversion values and storing the voice data phoneme conversion values, which respectively correspond to the division ranges, in the storing unit;
  
  calculating accumulated values of the voice data phoneme conversion value of each division range of the divided voice data by calculating percentage of the calculated voice data phoneme conversion accumulated values SN to a total value of the calculated voice data phoneme conversion accumulated values SN, to the second decimal point and multiplying the percentage by 100 and storing the accumulated values, which respectively correspond to the division ranges of the divided voice data, in the storing unit;
  
  extracting the nearest approximate values of the voice data phoneme accumulated values with respect to the text data phoneme conversion accumulated values corresponding to the phrases of the divided text data, and producing phrase corresponding data, in which the voice data phoneme conversion accumulated values respectively corresponding to the phrases of the divided text data are associated with identifiers indicating playback order of the phrases of the divided text data; and
  
  outputting the corresponding phrases of the text data and the divided voice data, which correspond to each other, on the basis of the phrase corresponding data.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The apparatus according to claim 1, further comprising:
    - detecting existence of duplicate association of the voice data phoneme conversion accumulated values in the phrase corresponding data; and
      
      resetting the phrase corresponding data so as to eliminate the duplicate association of the voice data phoneme conversion accumulated values in the phrase correspondingdata, wherein the CPU defines all of the divided voice data, whose voice data phoneme conversion accumulated values are duplicately-associated, as resetting segment data when the duplicate association of the voice data phoneme conversion accumulated values is detected;
      
      extracting a second silent segment, from the resetting segment data, on the basis of a second silent segment decision datum whose condition is more restricted than that of the silent segment decision datum;
      
      producing second divided voice data, which are obtained by dividing the resetting segment data on the basis of a result of extracting the second silent segment;
      
      calculating second phoneme conversion values, which are obtained by phonemically converting the divided segments of the second divided voice data, and calculating a voice data phoneme conversion accumulated value of the resetting segment data, which is accumulated, in division order, in the resetting segment data;
      
      extracting the nearest approximate values of the voice data phoneme conversion accumulated values of the resetting segment data with respect to the text data phoneme conversion accumulated values corresponding to the phrases of the divided text data in the resetting segment data, and making the extracted values correspond to the phrases of the divided text data in the resetting segment data;
      
      producing phrase corresponding data of the resetting segment, in which accumulated values of the divided voice data phoneme conversion in the resetting segment data are respectively corresponded to the phrases of the divided text data in the resetting segment data; and
      
      producing corrected phrase corresponding data by integrating the phrase corresponding data with the phrase corresponding data of the resetting segment on the basis of the identifiers in the divided text data.
  - 3. The apparatus according to claim 2,wherein, when the CPU detects the duplicate association of the voice data phoneme conversion accumulated values of the resetting segment data in the corrected phrase corresponding data andwherein, the CPU performs:
    - producing forcible processing object data, which include the voice data phoneme conversion accumulated value of the resetting segment data from which the duplicate association has been detected by the CPU, the second divided voice data being corresponded thereto and the divided text data;
      
      calculating a total value of the text data phoneme conversion values of the divided text data of the forcible processing object data, and calculating a ratio of the text data phoneme conversion values of the divided text data of the forcible processing object data to the total value;
      
      forming forcibly-divided segments in the second voice data according to the calculated ratio of the text data phoneme conversion values, and calculating voice data phoneme conversion accumulated values of the resetting segments ire the forcibly-divided segments;
      
      extracting voice data phoneme conversion accumulated values of the forcible process object data, each of which is the nearest to the text data phoneme conversion accumulated values of the phrases of the divided text data in the forcible process object data, and making the extracted values correspond to the phrases of the divided text data in the forcible process object data;
      
      producing phrase corresponding data of the forcible process object data, in which the voice data phoneme conversion accumulated values in the forcible process object data are respectively associated with the phrases of the divided text data in the forcible process object data; and
      
      producing forcibly-corrected phrase corresponding data by integrating the phrase corresponding data, the phrase corresponding data in the resetting segments and the phrase corresponding data in the forcible process object data on the basis of the identifiers in the divided text data.
  - 4. The apparatus according to claim 1,wherein the CPU converts the voice data into text data once by voice recognition processing, and phonemically converts the text data of the voice data.
  - 5. The apparatus according to claim 2,wherein the CPU converts the voice data into text data once by voice recognition processing, and phonemically converts the text data of the voice data.
  - 6. The apparatus according to claim 3,wherein the CPU converts the voice data into text data once by voice recognition processing, and phonemically converts the text data of the voice data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Shinano Kenshi Kabushiki Kaisha
Original Assignee
Shinano Kenshi Kabushiki Kaisha
Inventors
Kodaira, Tomoki, Nishizawa, Tatsuo
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
KIM, JONATHAN C

Application Number

US14/753,429
Publication Number

US 20150379996A1
Time in Patent Office

715 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 25/48   specially adapted for parti...

G10L 25/87   Detection of discrete point...

Apparatus for synchronously processing text data and voice data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

15 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus for synchronously processing text data and voice data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links