Systems and methods for automatic repair of speech recognition engine output using a sliding window mechanism

US 10,410,622 B2
Filed: 07/13/2017
Issued: 09/10/2019
Est. Priority Date: 07/13/2016
Status: Active Grant

First Claim

Patent Images

1. A processor implemented method comprising:

receiving a text output from a general purpose automatic speech recognition (GP-ASR) engine;

identifying in real-time, a current environment of a speaker associated with the text output based on one or more environment variables; and

automatically identifying one or more erroneously recognized terms in the text output and selectively correcting the output of the GP-ASR by replacing the one or more erroneously recognized terms in the text output with one or more best-fit terms by performing multi-stage correction steps on the received text output, using a pre-determined threshold for a fitness function, wherein the pre-determined threshold is a function of environment and is determined based on the identified current environment, wherein the fitness function is a function to determine candidate terms for replacement of the one or more erroneously recognized terms in the text output, and wherein the multi-stage correction steps comprise;

a first stage of domain ontology based correction, wherein at least one of a phonetic-level match and an edit-distance-level-match is performed to obtain a matching stage output, wherein the matching stage output comprises matching of a first set of candidate terms from domain ontology with the text output based on a sliding window mechanism which consists of a predetermined number of words of the text output;

a second stage of contextual correction of the matching stage output, wherein at least one of the contextual phonetic-level match and the edit-distance-level-match is performed to obtain a mapping stage output, mapping the matching stage output with a second set of candidate terms from the domain ontology based on a sliding window mechanism; and

a third stage of linguistic correction of the mapping stage output, wherein semantic and linguistic repair rules based on the identified current environment are applied on the mapping stage output to obtain a domain specific repaired output corresponding to the output of the GP-ASR.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Text output of speech recognition engines tend to be erroneous when spoken data has domain specific terms. The present disclosure facilitates automatic correction of errors in speech to text conversion using abstractions of evolutionary development and artificial development. The words in a speech recognition engine text output are treated as a set of injured genes in a biological cell that need repair which are then repaired and form genotypes that are then repaired to phenotypes through a series of repair steps based on a matching, mapping and linguistic repair through a fitness criteria. A basic genetic level repair involves phonetic MATCHING function together with a FITNESS function to select the best among the matching genes. A second genetic level repair involves a contextual MAPPING function for repairing remaining ‘injured’ genes of the speech recognition engine output. Finally, a genotype to phenotype repair involves using linguistic rules and semantic rules of the domain.

Citations

17 Claims

1. A processor implemented method comprising:
- receiving a text output from a general purpose automatic speech recognition (GP-ASR) engine;
  
  identifying in real-time, a current environment of a speaker associated with the text output based on one or more environment variables; and
  
  automatically identifying one or more erroneously recognized terms in the text output and selectively correcting the output of the GP-ASR by replacing the one or more erroneously recognized terms in the text output with one or more best-fit terms by performing multi-stage correction steps on the received text output, using a pre-determined threshold for a fitness function, wherein the pre-determined threshold is a function of environment and is determined based on the identified current environment, wherein the fitness function is a function to determine candidate terms for replacement of the one or more erroneously recognized terms in the text output, and wherein the multi-stage correction steps comprise;
  
  a first stage of domain ontology based correction, wherein at least one of a phonetic-level match and an edit-distance-level-match is performed to obtain a matching stage output, wherein the matching stage output comprises matching of a first set of candidate terms from domain ontology with the text output based on a sliding window mechanism which consists of a predetermined number of words of the text output;
  
  a second stage of contextual correction of the matching stage output, wherein at least one of the contextual phonetic-level match and the edit-distance-level-match is performed to obtain a mapping stage output, mapping the matching stage output with a second set of candidate terms from the domain ontology based on a sliding window mechanism; and
  
  a third stage of linguistic correction of the mapping stage output, wherein semantic and linguistic repair rules based on the identified current environment are applied on the mapping stage output to obtain a domain specific repaired output corresponding to the output of the GP-ASR.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The processor implemented method of claim 1, further comprising:
    - receiving speech input associated with the speaker and information pertaining to the identified current environment; and
      
      deriving the one or more environment variables from the received speech input and the information pertaining to the current environment.
  - 3. The processor implemented method of claim 2, wherein the one or more environment variables are based on one or more of location information associated with the speaker;
    - movement information associated with the speaker;
      
      olfactory information;
      
      proximity information; and
      
      image information.
  - 4. The processor implemented method of claim 1, wherein the identified current environment is classified one of a broad classification and a granular classification.
  - 5. The processor implemented method of claim 1, wherein the pre-determined threshold is empirical.
  - 6. The processor implemented method of claim 1, wherein the first stage through third stage are iteratively repeated based on the real-time current environment.
  - 7. The processor implemented method of claim 1, wherein the fitness function for domain ontology based correction is a weighted cost function that matches the text output with a first set of candidate terms from the domain ontology based on a sliding window mechanism;
    - and replaces the one or more erroneously recognized terms in the text output with one or more best-fit terms from the first set of candidate terms from the domain ontology based on the pre-determined threshold associated with the identified current environment.
  - 8. The processor implemented method of claim 1, wherein the fitness function for contextual correction is a weighted cost function that maps the matching stage output with a second set of candidate terms from the domain ontology based on a sliding window mechanism and replaces the one or more erroneously recognized terms in the matching stage output with one or more best-fit terms from the second set of candidate terms from the domain ontology based on the pre-determined threshold associated with the identified current environment.
  - 9. The processor implemented method of claim 1, wherein the fitness function for the linguistic correction is a function of the mapping stage output and a set of semantic and linguistic repair rules.

10. A system comprising:
- one or more data storage devices operatively coupled to one or more hardware processors and configured to store instructions configured for execution by the one or more hardware processors to;
  
  receive a text output from a general purpose automatic speech recognition (GP-ASR) engine;
  
  identify in real-time, a current environment of a speaker associated with the text output based on one or more environment variables associated thereof;
  
  the identified current environment being classified as at least one of a broad classification and a granular classification;
  
  andautomatically identifying one or more erroneously recognized terms in the text output and selectively correct the output of the GP-ASR by replacing the one or more erroneously recognized terms in the text output with one or more best-fit terms by performing multi-stage correction steps on the received text output, using a pre-determined threshold for a fitness function, wherein the pre-determined threshold is a function of environment and is determined based on the identified current environment, wherein the fitness function is a function to determine candidate terms for replacement of the one or more erroneously recognized terms in the text output, and wherein the multi-stage correction steps comprise;
  
  a first stage of domain ontology based correction, wherein at least one of a phonetic-level match and an edit-distance-level-match is performed to obtain a matching stage output, wherein the matching stage output comprises matching of a first set of candidate terms from domain ontology with the text output based on a sliding window mechanism which consists of a predetermined number of words of the text output;
  
  a second stage of contextual correction of the matching stage output, wherein at least one of the contextual phonetic-level match and the edit-distance-level-match is performed to obtain a mapping stage output, mapping the matching stage output with a second set of candidate terms from the domain ontology based on a sliding window mechanism; and
  
  a third stage of linguistic correction of the mapping stage output, wherein semantic and linguistic repair rules based on the identified current environment are applied on the mapping stage output to obtain a domain specific repaired output corresponding to the output of the GP-ASR.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The system of claim 10, wherein the one or more hardware processors are further configured to:
    - receive speech input associated with the speaker and information pertaining to the current environment associated thereof; and
      
      derive the one or more environment variables from the received speech input and the information pertaining to the current environment.
  - 12. The system of claim 10, wherein the one or more environment variables are based on one or more of (i) location information associated with the speaker;
    - (ii) movement information associated with the speaker;
      
      (iii) olfactory information;
      
      (iv) proximity information; and
      
      (v) image information.
  - 13. The system of claim 10, wherein the pre-determined threshold is empirical.
  - 14. The system of claim 10, wherein the one or more hardware processors are further configured to perform the domain ontology based correction, wherein the fitness function is a weighted cost function that matches the text output with a first set of candidate terms from the domain ontology based on a sliding window mechanism;
    - and replaces the one or more erroneously recognized terms in the text output with one or more best-fit terms from the first set of candidate terms from the domain ontology based on the pre-determined threshold associated with the identified current environment.
  - 15. The system of claim 10, wherein the one or more hardware processors are further configured to perform the contextual correction, wherein the fitness function is a weighted cost function that maps the matching stage output with a second set of candidate terms from the domain ontology based on a sliding window mechanism;
    - and replaces the one or more erroneously recognized terms in the matching stage output with one or more best-fit terms from the second set of candidate terms from the domain ontology based on the pre-determined threshold associated with the identified current environment.
  - 16. The system of claim 10, wherein the one or more hardware processors are further configured to perform the linguistic correction, wherein the fitness function is a function of the mapping stage output and a set of semantic and linguistic repair rules.

17. A computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:
- receiving a text output from a general purpose automatic speech recognition (GP-ASR) engine;
  
  identifying in real-time, a current environment of a speaker associated with the text output based on one or more environment variables; and
  
  automatically identifying one or more erroneously recognized terms in the text output and selectively correcting the output of the GP-ASR by replacing the one or more erroneously recognized terms in the text output with one or more best-fit terms by performing multi-stage correction steps on the received text output, using a pre-determined threshold for a fitness function, wherein the pre-determined threshold is a function of environment and is determined based on the identified current environment, wherein the fitness function is a function to determine candidate terms for replacement of the one or more erroneously recognized terms in the text output, and wherein the multi-stage correction steps comprise;
  
  a first stage of domain ontology based correction, wherein at least one of a phonetic-level match and an edit-distance-level-match is performed to obtain a matching stage output, wherein the matching stage output comprises matching of a first set of candidate terms from domain ontology with the text output based on a sliding window mechanism which consists of a predetermined number of words of the text output;
  
  a second stage of contextual correction of the matching stage output, wherein at least one of the contextual phonetic-level match and the edit-distance-level-match is performed to obtain a mapping stage output, mapping the matching stage output with a second set of candidate terms from the domain ontology based on a sliding window mechanism; and
  
  a third stage of linguistic correction of the mapping stage output, wherein semantic and linguistic repair rules based on the identified current environment are applied on the mapping stage output to obtain a domain specific repaired output corresponding to the output of the GP-ASR.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AA R&D LLC, TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Original Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Inventors
Anantaram, Chandrasekhar, Kopparapu, Sunil Kumar, Patel, Chiragkumar Rameshbhai, Mittal, Aditya
Primary Examiner(s)
Kazeminezhad, Farzad

Application Number

US15/649,010
Publication Number

US 20180018960A1
Time in Patent Office

789 Days
Field of Search

704257
US Class Current
CPC Class Codes

G06F 40/232   Orthographic correction, e....

G06F 40/30   Semantic analysis

G10L 15/01   Assessment or evaluation of...

G10L 15/183   using context dependencies,...

G10L 15/20   Speech recognition techniqu...

G10L 15/24   Speech recognition using no...

G10L 15/26   Speech to text systems G10L...

G10L 21/00   Speech or voice signal proc...

Systems and methods for automatic repair of speech recognition engine output using a sliding window mechanism

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for automatic repair of speech recognition engine output using a sliding window mechanism

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links