Parse information encoding in a finite state transducer
First Claim
Patent Images
1. A method of performing speech recognition, the method comprising:
- creating a first finite state transducer (FST) using a speech recognition grammar, wherein a first arc of the first FST comprises a first semantic identifier and a second arc of the FST comprises a second semantic identifier;
obtaining a second FST, wherein the second FST is for transducing speech recognition feature vectors to words;
creating a third FST by composing the first FST and the second FST;
receiving audio data comprising speech;
performing speech recognition on the received audio data using the third FST to produce speech recognition results, wherein the speech recognition results comprise the first semantic identifier and the second semantic identifier; and
processing the speech recognition results with an application, wherein the application processes the first semantic identifier and the second semantic identifier.
1 Assignment
0 Petitions
Accused Products
Abstract
In automatic speech recognition, certain parsing information, such as rules and tags, may be embedded into a finite state transducer (FST) to produce FST output that includes speech recognition results along with codes indicating parsing results of the recognized speech. The codes in the FST output may be formatted using a markup language, such as XML or JSON, for processing by a later application. The FST may be constructed according to a grammar defining the parsing information. The codes for inclusion in the FST output may be embedded into arcs of the FST and then included in the FST output when the speech recognition engine traverses the arcs of the FST.
-
Citations
21 Claims
-
1. A method of performing speech recognition, the method comprising:
-
creating a first finite state transducer (FST) using a speech recognition grammar, wherein a first arc of the first FST comprises a first semantic identifier and a second arc of the FST comprises a second semantic identifier; obtaining a second FST, wherein the second FST is for transducing speech recognition feature vectors to words; creating a third FST by composing the first FST and the second FST; receiving audio data comprising speech; performing speech recognition on the received audio data using the third FST to produce speech recognition results, wherein the speech recognition results comprise the first semantic identifier and the second semantic identifier; and processing the speech recognition results with an application, wherein the application processes the first semantic identifier and the second semantic identifier. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method, comprising:
-
receiving audio data comprising speech; obtaining a speech recognition finite state transducer (FST), wherein a first arc of the speech recognition FST comprises text and a first semantic identifier and a second arc of the speech recognition FST comprises a second semantic identifier; performing speech recognition on the received audio data using the speech recognition FST to produce speech recognition results output from the speech recognition FST; and wherein the speech recognition results comprise an output string, the output string including the text, the first semantic identifier and the second semantic identifier. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A computing device, comprising:
-
at least one processor; a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the processor; to receive audio data comprising speech; to obtain a speech recognition finite state transducer (FST), wherein a first arc of the speech recognition FST comprises text and a first semantic identifier and a second arc of the speech recognition FST comprises a second semantic identifier; to perform speech recognition on the received audio data using the speech recognition FST to produce speech recognition results output from the speech recognition FST; and wherein the speech recognition results comprise an output string, the output string including the text, the first semantic identifier and the second semantic identifier. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
Specification