Speech-to-text engine customization

US 10,832,680 B2
Filed: 11/27/2018
Issued: 11/10/2020
Est. Priority Date: 11/27/2018
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for automated identification of one or more potential errors in a text output of a speech-to-text engine, the method comprising:

receiving, using a processor, the text output of the speech-to-text engine;

determining, using the processor, a first vector representation of a first word in the text output;

determining, using the processor, a second vector representation of a second word in the text output;

determining, using the processor, that the first vector representation and the second vector representation satisfy a similarity threshold;

determining that the first word and the second word form a synonym cluster based at least in part on determining that the first vector representation and the second vector representation satisfy the similarity threshold, wherein the synonym cluster is indicative of a potential error in the text output; and

generating a text snippet from the text output, wherein the text snippet comprises at least the first word and the second word.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, and computer-readable media are described for automatically identifying potential errors in the text output of a domain-agnostic speech-to-text engine and generating text snippets that contain words representative of the potential errors and other words in the neighborhoods of such words for context. In this manner, a substantially reduced amount of text (i.e., the text snippets) can be reviewed for errors in the speech-to-text conversion rather than the entire text output, thereby significantly reducing the burden associated with error identification in the text output.

24 Citations

View as Search Results

20 Claims

1. A computer-implemented method for automated identification of one or more potential errors in a text output of a speech-to-text engine, the method comprising:
- receiving, using a processor, the text output of the speech-to-text engine;
  
  determining, using the processor, a first vector representation of a first word in the text output;
  
  determining, using the processor, a second vector representation of a second word in the text output;
  
  determining, using the processor, that the first vector representation and the second vector representation satisfy a similarity threshold;
  
  determining that the first word and the second word form a synonym cluster based at least in part on determining that the first vector representation and the second vector representation satisfy the similarity threshold, wherein the synonym cluster is indicative of a potential error in the text output; and
  
  generating a text snippet from the text output, wherein the text snippet comprises at least the first word and the second word.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, wherein the first word and the second word forming a synonym cluster indicates that respective probabilities that one or more other words are in a first neighborhood of the first word are approximately equal to respective probabilities that the one or more other words are in a second neighborhood of the second word.
  - 3. The computer-implemented method of claim 1, further comprising learning the first vector representation of the first word, wherein learning the first vector representation comprises:
    - performing matrix multiplication of the first vector representation with a hidden matrix to obtain a matrix product;
      
      determining, for each other word in the text output that is within a co-occurrence window of the first word, a respective backpropagation error between the matrix product and a respective vector encoding representation of the each other word; and
      
      adjusting one or more parameters of the first vector representation until each respective backpropagation error satisfies a threshold value.
  - 4. The computer-implemented method of claim 1, further comprising determining that the synonym cluster is indicative of an error in the text output by determining that the synonym cluster fails to satisfy the similarity threshold with respect to a public dataset.
  - 5. The computer-implemented method of claim 1, further comprising learning the first vector representation of the first word, wherein learning the first vector representation comprises:
    - generating an information matrix V having dimensions W×
      
      W, wherein W is a size of a vocabulary associated with the speech-to-text engine, and wherein each entry of the information matrix V is a conditional probability that a respective word in the vocabulary appears in a respective neighborhood of another respective word in the vocabulary; and
      
      applying a singular value decomposition technique to approximate V as UAX^T, wherein UA is a W×
      
      N matrix approximation of V and X^Tis a N×
      
      W hidden matrix,wherein a particular row of UA represents the first vector representation of the first word.
  - 6. The computer-implemented method of claim 5, wherein learning the first vector representation of the first word further comprises:
    - determining an upper bound on a reconstruction error between the information matrix V and the approximation UAX^T;
      
      determining that the upper bound does not satisfy a threshold value;
      
      increasing the dimensionality N; and
      
      re-applying the singular value decomposition technique to approximate V as UAX^T.
  - 7. The computer-implemented method of claim 1, wherein determining that the first vector representation and the second vector representation satisfy a similarity threshold comprises:
    - determining a similarity metric between the first vector representation and the second vector representation; and
      
      determining that the similarity metric satisfies a threshold value.
  - 8. The computer-implemented method of claim 1, wherein it is determined that the first word is an erroneous output of the speech-to-text engine, and wherein the speech-to-text engine is customized to correctly recognize the first word.

9. A system for automated identification of one or more potential errors in a text output of a speech-to-text engine, the system comprising:
- at least one processor; and
  
  at least one memory storing computer-executable instructions, wherein the at least one processor is configured to access the at least one memory and execute the computer-executable instructions to;
  
  receive the text output of the speech-to-text engine;
  
  determine a first vector representation of a first word in the text output;
  
  determine a second vector representation of a second word in the text output;
  
  determine that the first vector representation and the second vector representation satisfy a similarity threshold;
  
  determine that the first word and the second word form a synonym cluster based at least in part on determining that the first vector representation and the second vector representation satisfy the similarity threshold, wherein the synonym cluster is indicative of a potential error in the text output; and
  
  generate a text snippet from the text output, wherein the text snippet comprises at least the first word and the second word.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9, wherein the first word and the second word forming a synonym cluster indicates that respective probabilities that one or more other words are in a first neighborhood of the first word are approximately equal to respective probabilities that the one or more other words are in a second neighborhood of the second word.
  - 11. The system of claim 9, wherein the at least one processor is further configured to execute the computer-executable instructions to learn the first vector representation of the first word, and wherein the at least one processor is configured to learn the first vector representation by executing the computer-executable instructions to:
    - perform matrix multiplication of the first vector representation with a hidden matrix to obtain a matrix product;
      
      determine, for each other word in the text output that is within a co-occurrence window of the first word, a respective backpropagation error between the matrix product and a respective vector encoding representation of the each other word; and
      
      adjust one or more parameters of the first vector representation until each respective backpropagation error satisfies a threshold value.
  - 12. The system of claim 11, wherein the at least one processor is further configured to execute the computer-executable instructions to learn the hidden matrix, wherein the at least one processor is configured to learn the hidden matrix by executing the computer-executable instructions to adjust one or more parameters of the hidden matrix until each respective backpropagation error satisfies a threshold value.
  - 13. The system of claim 9, wherein the at least one processor is further configured to learn the first vector representation of the first word, wherein the at least one processor is configured to learn the first vector representation by executing the computer-executable instructions to:
    - generate an information matrix V having dimensions W×
      
      W, wherein W is a size of a vocabulary associated with the speech-to-text engine, and wherein each entry of the information matrix V is a conditional probability that a respective word in the vocabulary appears in a respective neighborhood of another respective word in the vocabulary; and
      
      apply a singular value decomposition technique to approximate V as UAX^T, wherein UA is a W×
      
      N matrix approximation of V and X^Tis a N×
      
      W hidden matrix,wherein a particular row of UA represents the first vector representation of the first word.
  - 14. The system of claim 13, wherein the at least one processor is further configured to learn the first vector representation of the first word by executing the computer-executable instructions to:
    - determine an upper bound on a reconstruction error between the information matrix V and the approximation UAX^T;
      
      determine that the upper bound does not satisfy a threshold value;
      
      increase the dimensionality N; and
      
      re-apply the singular value decomposition technique to approximate V as UAX^T.
  - 15. The system of claim 9, wherein the at least one processor is configured to determine that the first vector representation and the second vector representation satisfy the similarity threshold by executing the computer-executable instructions to:
    - determine a similarity metric between the first vector representation and the second vector representation; and
      
      determine that the similarity metric satisfies a threshold value.

16. A computer program product for automated identification of one or more potential errors in a text output of a speech-to-text engine, the computer program product comprising a storage medium readable by a processing circuit, the storage medium storing instructions executable by the processing circuit to cause the processing circuit to perform a method comprising:
- receiving the text output of the speech-to-text engine;
  
  determining a first vector representation of a first word in the text output;
  
  determining a second vector representation of a second word in the text output;
  
  determining that the first vector representation and the second vector representation satisfy a similarity threshold;
  
  determining that the first word and the second word form a synonym cluster based at least in part on determining that the first vector representation and the second vector representation satisfy the similarity threshold, wherein the synonym cluster is indicative of a potential error in the text output; and
  
  generating a text snippet from the text output, wherein the text snippet comprises at least the first word and the second word.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer program product of claim 16, wherein the first word and the second word forming a synonym cluster indicates that respective probabilities that one or more other words are in a first neighborhood of the first word are approximately equal to respective probabilities that the one or more other words are in a second neighborhood of the second word.
  - 18. The computer program product of claim 16, the method further comprising learning the first vector representation of the first word, wherein learning the first vector representation comprises:
    - performing matrix multiplication of the first vector representation with a hidden matrix to obtain a matrix product;
      
      determining, for each other word in the text output that is within a co-occurrence window of the first word, a respective backpropagation error between the matrix product and a respective vector encoding representation of the each other word; and
      
      adjusting one or more parameters of the first vector representation until each respective backpropagation error satisfies a threshold value.
  - 19. The computer program product of claim 18, the method further comprising learning the hidden matrix by adjusting one or more parameters of the hidden matrix until each respective backpropagation error satisfies a threshold value.
  - 20. The computer program product of claim 16, the method further comprising learning the first vector representation of the first word, wherein learning the first vector representation comprises:
    - generating an information matrix V having dimensions W×
      
      W, wherein W is a size of a vocabulary associated with the speech-to-text engine, and wherein each entry of the information matrix V is a conditional probability that a respective word in the vocabulary appears in a respective neighborhood of another respective word in the vocabulary; and
      
      applying a singular value decomposition technique to approximate V as UAX^T, wherein UA is a W×
      
      N matrix approximation of V and X^Tis a N×
      
      W hidden matrix,wherein a particular row of UA represents the first vector representation of the first word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Ganti, Raghu Kiran, Srirangamsridharan, Shreeranjani, Srivatsa, Mudhakar, Agrawal, Dakshi
Primary Examiner(s)
Singh, Satwant K

Application Number

US16/201,447
Publication Number

US 20200168226A1
Time in Patent Office

714 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G06F 40/20   Natural language analysis s...

G06F 40/279   Recognition of textual enti...

G10L 15/18   using natural language mode...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Speech-to-text engine customization

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

24 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech-to-text engine customization

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links