SYSTEM AND METHOD FOR OPEN SPEECH RECOGNITION

US 20120084086A1
Filed: 09/30/2010
Published: 04/05/2012
Est. Priority Date: 09/30/2010
Status: Active Grant

First Claim

Patent Images

1. A method of speech recognition, the method comprising:

recognizing received speech with a plurality of domain-specific speech recognizers comprising at least two domain-specific speech recognizers from different domains, to yield respective speech recognition outputs;

determining at least one speech recognition confidence score for the respective speech recognition outputs, wherein each of the at least one speech recognition confidence score corresponds to a different segment of the respective speech recognition outputs;

selecting speech recognition candidates from segments of the speech recognition outputs based on the at least one speech recognition confidence score for the respective speech recognition outputs;

combining, via a machine-learning algorithm, the speech recognition candidates to yield a combination of the speech recognition candidates; and

generating text based on the combination.

View all claims

15 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods and non-transitory computer-readable media for performing speech recognition across different applications or environments without model customization or prior knowledge of the domain of the received speech. The disclosure includes recognizing received speech with a collection of domain-specific speech recognizers, determining a speech recognition confidence for each of the speech recognition outputs, selecting speech recognition candidates based on a respective speech recognition confidence for each speech recognition output, and combining selected speech recognition candidates to generate text based on the combination.

Citations

20 Claims

1. A method of speech recognition, the method comprising:
- recognizing received speech with a plurality of domain-specific speech recognizers comprising at least two domain-specific speech recognizers from different domains, to yield respective speech recognition outputs;
  
  determining at least one speech recognition confidence score for the respective speech recognition outputs, wherein each of the at least one speech recognition confidence score corresponds to a different segment of the respective speech recognition outputs;
  
  selecting speech recognition candidates from segments of the speech recognition outputs based on the at least one speech recognition confidence score for the respective speech recognition outputs;
  
  combining, via a machine-learning algorithm, the speech recognition candidates to yield a combination of the speech recognition candidates; and
  
  generating text based on the combination.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein the speech recognition candidates are selected without knowledge of the domain of the received speech.
  - 3. The method of claim 1, wherein at least one of the different domains comprises travel, banking, or business.
  - 4. The method of claim 1, wherein the machine-learning algorithm comprises a mixture of experts from different domains, wherein said mixture of experts comprises at least one of the following:
    - local business search, web search, SMS, question/answering, video search, broadcast news, and voicemail to text.
  - 5. The method of claim 4, wherein selecting the speech recognition candidates further comprises comparing experts in the mixture of experts to select the best speech recognition candidates.
  - 6. The method of claim 1, wherein selecting the speech recognition candidates further comprises selecting a speech recognition candidate having a highest confidence score.
  - 7. The method of claim 1, wherein selecting the speech recognition candidates further comprises:
    - dividing the received speech into substrings; and
      
      selecting a best speech recognition candidate for each substring.
  - 8. The method of claim 1, further comprising mixing the speech recognition candidates.
  - 9. The method of claim 1, further comprising creating a mesh of the speech recognition candidates.
  - 10. The method of claim 1, wherein a speech recognition candidate comprises at least one of a lattice, confidence scores, and speech recognition metadata.
  - 11. The method of claim 1, further comprising:
    - collecting usage statistics based on the speech recognition candidates; and
      
      training parameters associated with the collection of domain-specific speech recognizers based on the usage statistics.
  - 12. The method of claim 1, further comprising:
    - collecting usage statistics based on the speech recognition candidates; and
      
      training the machine-learning algorithm based on the usage statistics.
  - 13. The method of claim 12, wherein training parameters are based on at least one of a lattice combination and a neural network graph that learns from an edit distance between the speech recognition candidates and a correct recognition candidate.

14. A system for open domain speech recognition, the system comprising:
- a processor;
  
  a first module configured to control the processor to recognize received speech with a plurality of domain-specific speech recognizers comprising at least two domain-specific speech recognizers from different domains, to yield respective speech recognition outputs;
  
  a second module configured to control the processor to determine at least one speech recognition confidence score for the respective speech recognition outputs, wherein each of the at least one speech recognition confidence score corresponds to a different segment of the respective speech recognition outputs;
  
  a third module configured to control the processor to select speech recognition candidates from segments of the speech recognition outputs based on the at least one speech recognition confidence score for the respective speech recognition outputs;
  
  a fourth module configured to control the processor to combine, via a machine-learning algorithm, the speech recognition candidates to yield a combination of the speech recognition candidates;
  
  a fifth module configured to control the processor to generate text based on the combination;
  
  a sixth module configured to control the processor to collect usage statistics based on the speech recognition candidates and train parameters associated with the plurality of domain-specific speech recognizers based on the usage statistics; and
  
  a seventh module configured to control the processor to collect usage statistics based on the speech recognition candidates and train the machine-learning algorithm based on the usage statistics.
- View Dependent Claims (15, 16)
- - 15. The system of claim 14, wherein the speech recognition candidates are selected without knowledge of the domain of the received speech.
  - 16. The system of claim 14, wherein training parameters for the machine-learning algorithm are based on at least one of a lattice combination and a neural network graph that learns from an edit distance between the speech recognition candidates and a correct recognition candidate.

17. A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device, cause the computing device to perform automatic speech recognition, the instructions comprising:
- recognizing received speech with a plurality of domain-specific speech recognizers comprising at least two domain-specific speech recognizers from different domains, to yield respective speech recognition outputs;
  
  determining at least one speech recognition confidence score for the respective speech recognition outputs, wherein each of the at least one speech recognition confidence score corresponds to a different segment of the respective speech recognition outputs;
  
  selecting speech recognition candidates from segments of the speech recognition outputs based on the at least one speech recognition confidence score for the respective speech recognition outputs;
  
  combining, via a machine-learning algorithm, the speech recognition candidates to yield a combination of the speech recognition candidates; and
  
  generating text based on the combination.
- View Dependent Claims (18, 19, 20)
- - 18. The instructions of claim 17, wherein the plurality of domain-specific speech recognizers comprises at least two speech recognizers from different domains.
  - 19. The instructions of claim 17, wherein the speech recognition candidates are selected without knowledge of the domain of the received speech.
  - 20. The instructions of claim 17, wherein training parameters for the machine-learning algorithm are based on at least one of a lattice combination and a neural network graph that learns from an edit distance between the speech recognition candidates and a correct recognition candidate.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interactions, LLC
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
GILBERT, Mazin, Bangalore, Srinivas, Haffner, Patrick, Bell, Robert

Granted Patent

US 8,812,321 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/063   Training

G10L 15/26   Speech to text systems G10L...

G10L 15/32   Multiple recognisers used i...

G10L 2015/0638   Interactive procedures

SYSTEM AND METHOD FOR OPEN SPEECH RECOGNITION

First Claim

15 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR OPEN SPEECH RECOGNITION

First Claim

15 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links