Parse information encoding in a finite state transducer

US 8,972,243 B1
Filed: 11/20/2012
Issued: 03/03/2015
Est. Priority Date: 11/20/2012
Status: Active Grant

First Claim

Patent Images

1. A method of performing speech recognition, the method comprising:

creating a first finite state transducer (FST) using a speech recognition grammar, wherein a first arc of the first FST comprises a first semantic identifier and a second arc of the FST comprises a second semantic identifier;

obtaining a second FST, wherein the second FST is for transducing speech recognition feature vectors to words;

creating a third FST by composing the first FST and the second FST;

receiving audio data comprising speech;

performing speech recognition on the received audio data using the third FST to produce speech recognition results, wherein the speech recognition results comprise the first semantic identifier and the second semantic identifier; and

processing the speech recognition results with an application, wherein the application processes the first semantic identifier and the second semantic identifier.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In automatic speech recognition, certain parsing information, such as rules and tags, may be embedded into a finite state transducer (FST) to produce FST output that includes speech recognition results along with codes indicating parsing results of the recognized speech. The codes in the FST output may be formatted using a markup language, such as XML or JSON, for processing by a later application. The FST may be constructed according to a grammar defining the parsing information. The codes for inclusion in the FST output may be embedded into arcs of the FST and then included in the FST output when the speech recognition engine traverses the arcs of the FST.

Citations

21 Claims

1. A method of performing speech recognition, the method comprising:
- creating a first finite state transducer (FST) using a speech recognition grammar, wherein a first arc of the first FST comprises a first semantic identifier and a second arc of the FST comprises a second semantic identifier;
  
  obtaining a second FST, wherein the second FST is for transducing speech recognition feature vectors to words;
  
  creating a third FST by composing the first FST and the second FST;
  
  receiving audio data comprising speech;
  
  performing speech recognition on the received audio data using the third FST to produce speech recognition results, wherein the speech recognition results comprise the first semantic identifier and the second semantic identifier; and
  
  processing the speech recognition results with an application, wherein the application processes the first semantic identifier and the second semantic identifier.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the application comprises one of a music playing application, a reminder application, a calendar application, or a communication application.
  - 3. The method of claim 1, wherein the first semantic identifier comprises one of a rule beginning, a rule end, or a tag.
  - 4. The method of claim 1, wherein the first semantic identifier comprises notation in Extensible Markup Language or JavaScript Object Notation.
  - 5. The method of claim 1, wherein creating the first FST using the speech recognition grammar comprises:
    - creating a first sub-FST from a first rule and creating a second sub-FST from a second rule; and
      
      creating the first FST using the first sub-FST and the second sub-FST.

6. A method, comprising:
- receiving audio data comprising speech;
  
  obtaining a speech recognition finite state transducer (FST), wherein a first arc of the speech recognition FST comprises text and a first semantic identifier and a second arc of the speech recognition FST comprises a second semantic identifier;
  
  performing speech recognition on the received audio data using the speech recognition FST to produce speech recognition results output from the speech recognition FST; and
  
  wherein the speech recognition results comprise an output string, the output string including the text, the first semantic identifier and the second semantic identifier.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
- - 7. The method of claim 6, wherein the first semantic identifier corresponds to hierarchical designation information corresponding to the speech.
  - 8. The method of claim 6, further comprising processing the speech recognition results by an application, wherein the application processes the first semantic identifier and the second semantic identifier.
  - 9. The method of claim 6, wherein the first semantic identifier comprises notation in Extensible Markup Language or JavaScript Object Notation.
  - 10. The method of claim 6, wherein the first semantic identifier comprises one of a rule beginning, a rule ending, or a tag.
  - 11. The method of claim 6, wherein the speech recognition results comprise a top-N list of output string hypotheses.
  - 12. The method of claim 6, wherein the FST was created by composing a first FST for transducing feature vectors to hidden Markov model states, a second FST for transducing the hidden Markov model states to speech units in context, a third FST for transducing the speech units in context to words, and a fourth FST representing a grammar.
  - 13. The method of claim 6, wherein performing speech recognition comprises dynamically composing the FST with a second FST.

14. A computing device, comprising:
- at least one processor;
  
  a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the processor;
  
  to receive audio data comprising speech;
  
  to obtain a speech recognition finite state transducer (FST), wherein a first arc of the speech recognition FST comprises text and a first semantic identifier and a second arc of the speech recognition FST comprises a second semantic identifier;
  
  to perform speech recognition on the received audio data using the speech recognition FST to produce speech recognition results output from the speech recognition FST; and
  
  wherein the speech recognition results comprise an output string, the output string including the text, the first semantic identifier and the second semantic identifier.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
- - 15. The computing device of claim 14, wherein the first semantic identifier corresponds to hierarchical designation information corresponding to the speech.
  - 16. The computing device of claim 14, wherein the processor is further configured to process the speech recognition results by an application, wherein the application processes the first semantic identifier and the second semantic identifier.
  - 17. The computing device of claim 14, wherein the first semantic identifier comprises notation in Extensible Markup Language or JavaScript Object Notation.
  - 18. The computing device of claim 14, wherein the first semantic identifier comprises one of a rule beginning, a rule ending, or a tag.
  - 19. The computing device of claim 14, wherein the speech recognition results comprise a top-N list of hypotheses.
  - 20. The computing device of claim 14, wherein the FST was created by composing a first FST for transducing feature vectors to hidden Markov model states, a second FST for transducing the hidden Markov model states to speech units in context, a third FST for transducing the speech units in context to words, and a fourth FST representing a grammar.
  - 21. The computing device of claim 14, wherein the processor configured to process speech recognition comprises the processor configured to dynamically compose the FST with a second FST.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Strom, Nikko, Ramakrishnan, Karthik
Primary Examiner(s)
Godbold, Douglas

Application Number

US13/681,503
Time in Patent Office

833 Days
Field of Search

704/1, 704/9, 704/10, 704/275, 704/275.231
US Class Current

704/9
CPC Class Codes

G10L 15/1815 Semantic context, e.g. disa...

G10L 15/193 Formal grammars, e.g. finit...

Parse information encoding in a finite state transducer

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Parse information encoding in a finite state transducer

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links