Please download the dossier by clicking on the dossier button x
×

SYNCHRONIZATION OF AN INPUT TEXT OF A SPEECH WITH A RECORDING OF THE SPEECH

  • US 20090006087A1
  • Filed: 06/25/2008
  • Published: 01/01/2009
  • Est. Priority Date: 06/28/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method for synchronizing words in an input text of a speech with a continuous recording of the speech, said method implemented by execution of instructions by a processor of a computer system, said instructions being stored on computer readable storage media of the computer system, said method comprising:

  • generating a first dictionary stored in a first dictionary database of the computer system, said first dictionary comprising the words in the input text and associated first pronunciation speech data;

    receiving input speech data encompassing the speech and being structured as a waveform obtained from the continuous recording of the speech spoken by a speaker reading the speech;

    performing a first speech recognition of the input speech data, by comparing the input speech data with the first pronunciation speech data in the first dictionary, to generate a first recognition text comprising recognized words of the input text;

    determining, from comparing the input text with the first recognition text, first erroneous recognition text comprising words of the input text erroneously recognized during performing the first speech recognition and not matching respective words of the first recognition text;

    performing a second speech recognition of a first portion of the input speech data, corresponding to the first erroneous recognition text, to generate a second recognition text comprising recognized words of the first portion of the input speech data;

    determining, from comparing the second recognition text with the first erroneous recognition text, second erroneous recognition text comprising words of the first erroneous recognition text differing from the words of second recognition text;

    generating synthetic speech data corresponding to the second erroneous recognition text;

    determining a second portion of the input speech data to which each word of the synthetic speech data corresponds;

    computing, from the second portion of the input speech data to which each word of the synthetic speech data corresponds, ratio data comprising a ratio of a pronunciation time in the input speech data of each word of the second erroneous recognition text to a pronunciation time in the input speech data of each other word of the second erroneous recognition text;

    determining, through use of the computed ratio data, a first association between each word of the second erroneous recognition text and a time to reproduce each portion of the input speech data corresponding to said each word of the second erroneous recognition text; and

    recording the first association in a recording medium of the computer system and/or displaying the first association on a display device of the computer system.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×