Systems and methods for automatic repair of speech recognition engine output using a sliding window mechanism
First Claim
1. A processor implemented method comprising:
- receiving a text output from a general purpose automatic speech recognition (GP-ASR) engine;
identifying in real-time, a current environment of a speaker associated with the text output based on one or more environment variables; and
automatically identifying one or more erroneously recognized terms in the text output and selectively correcting the output of the GP-ASR by replacing the one or more erroneously recognized terms in the text output with one or more best-fit terms by performing multi-stage correction steps on the received text output, using a pre-determined threshold for a fitness function, wherein the pre-determined threshold is a function of environment and is determined based on the identified current environment, wherein the fitness function is a function to determine candidate terms for replacement of the one or more erroneously recognized terms in the text output, and wherein the multi-stage correction steps comprise;
a first stage of domain ontology based correction, wherein at least one of a phonetic-level match and an edit-distance-level-match is performed to obtain a matching stage output, wherein the matching stage output comprises matching of a first set of candidate terms from domain ontology with the text output based on a sliding window mechanism which consists of a predetermined number of words of the text output;
a second stage of contextual correction of the matching stage output, wherein at least one of the contextual phonetic-level match and the edit-distance-level-match is performed to obtain a mapping stage output, mapping the matching stage output with a second set of candidate terms from the domain ontology based on a sliding window mechanism; and
a third stage of linguistic correction of the mapping stage output, wherein semantic and linguistic repair rules based on the identified current environment are applied on the mapping stage output to obtain a domain specific repaired output corresponding to the output of the GP-ASR.
2 Assignments
0 Petitions
Accused Products
Abstract
Text output of speech recognition engines tend to be erroneous when spoken data has domain specific terms. The present disclosure facilitates automatic correction of errors in speech to text conversion using abstractions of evolutionary development and artificial development. The words in a speech recognition engine text output are treated as a set of injured genes in a biological cell that need repair which are then repaired and form genotypes that are then repaired to phenotypes through a series of repair steps based on a matching, mapping and linguistic repair through a fitness criteria. A basic genetic level repair involves phonetic MATCHING function together with a FITNESS function to select the best among the matching genes. A second genetic level repair involves a contextual MAPPING function for repairing remaining ‘injured’ genes of the speech recognition engine output. Finally, a genotype to phenotype repair involves using linguistic rules and semantic rules of the domain.
-
Citations
17 Claims
-
1. A processor implemented method comprising:
-
receiving a text output from a general purpose automatic speech recognition (GP-ASR) engine; identifying in real-time, a current environment of a speaker associated with the text output based on one or more environment variables; and automatically identifying one or more erroneously recognized terms in the text output and selectively correcting the output of the GP-ASR by replacing the one or more erroneously recognized terms in the text output with one or more best-fit terms by performing multi-stage correction steps on the received text output, using a pre-determined threshold for a fitness function, wherein the pre-determined threshold is a function of environment and is determined based on the identified current environment, wherein the fitness function is a function to determine candidate terms for replacement of the one or more erroneously recognized terms in the text output, and wherein the multi-stage correction steps comprise; a first stage of domain ontology based correction, wherein at least one of a phonetic-level match and an edit-distance-level-match is performed to obtain a matching stage output, wherein the matching stage output comprises matching of a first set of candidate terms from domain ontology with the text output based on a sliding window mechanism which consists of a predetermined number of words of the text output; a second stage of contextual correction of the matching stage output, wherein at least one of the contextual phonetic-level match and the edit-distance-level-match is performed to obtain a mapping stage output, mapping the matching stage output with a second set of candidate terms from the domain ontology based on a sliding window mechanism; and a third stage of linguistic correction of the mapping stage output, wherein semantic and linguistic repair rules based on the identified current environment are applied on the mapping stage output to obtain a domain specific repaired output corresponding to the output of the GP-ASR. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
one or more data storage devices operatively coupled to one or more hardware processors and configured to store instructions configured for execution by the one or more hardware processors to; receive a text output from a general purpose automatic speech recognition (GP-ASR) engine; identify in real-time, a current environment of a speaker associated with the text output based on one or more environment variables associated thereof;
the identified current environment being classified as at least one of a broad classification and a granular classification;and automatically identifying one or more erroneously recognized terms in the text output and selectively correct the output of the GP-ASR by replacing the one or more erroneously recognized terms in the text output with one or more best-fit terms by performing multi-stage correction steps on the received text output, using a pre-determined threshold for a fitness function, wherein the pre-determined threshold is a function of environment and is determined based on the identified current environment, wherein the fitness function is a function to determine candidate terms for replacement of the one or more erroneously recognized terms in the text output, and wherein the multi-stage correction steps comprise; a first stage of domain ontology based correction, wherein at least one of a phonetic-level match and an edit-distance-level-match is performed to obtain a matching stage output, wherein the matching stage output comprises matching of a first set of candidate terms from domain ontology with the text output based on a sliding window mechanism which consists of a predetermined number of words of the text output; a second stage of contextual correction of the matching stage output, wherein at least one of the contextual phonetic-level match and the edit-distance-level-match is performed to obtain a mapping stage output, mapping the matching stage output with a second set of candidate terms from the domain ontology based on a sliding window mechanism; and a third stage of linguistic correction of the mapping stage output, wherein semantic and linguistic repair rules based on the identified current environment are applied on the mapping stage output to obtain a domain specific repaired output corresponding to the output of the GP-ASR. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:
-
receiving a text output from a general purpose automatic speech recognition (GP-ASR) engine; identifying in real-time, a current environment of a speaker associated with the text output based on one or more environment variables; and automatically identifying one or more erroneously recognized terms in the text output and selectively correcting the output of the GP-ASR by replacing the one or more erroneously recognized terms in the text output with one or more best-fit terms by performing multi-stage correction steps on the received text output, using a pre-determined threshold for a fitness function, wherein the pre-determined threshold is a function of environment and is determined based on the identified current environment, wherein the fitness function is a function to determine candidate terms for replacement of the one or more erroneously recognized terms in the text output, and wherein the multi-stage correction steps comprise; a first stage of domain ontology based correction, wherein at least one of a phonetic-level match and an edit-distance-level-match is performed to obtain a matching stage output, wherein the matching stage output comprises matching of a first set of candidate terms from domain ontology with the text output based on a sliding window mechanism which consists of a predetermined number of words of the text output; a second stage of contextual correction of the matching stage output, wherein at least one of the contextual phonetic-level match and the edit-distance-level-match is performed to obtain a mapping stage output, mapping the matching stage output with a second set of candidate terms from the domain ontology based on a sliding window mechanism; and a third stage of linguistic correction of the mapping stage output, wherein semantic and linguistic repair rules based on the identified current environment are applied on the mapping stage output to obtain a domain specific repaired output corresponding to the output of the GP-ASR.
-
Specification