Method and system of automatic speech recognition with dynamic vocabularies
First Claim
Patent Images
1. A computer-implemented method of automatic speech recognition, comprising:
- obtaining, via at least one acoustic signal receiving unit, audio data including human speech;
generating, via a decoder, a static vocabulary weighted finite state transducer (WFST) having nodes connected by arcs to propagate at least one token through the static vocabulary WFST and at least one dynamic vocabulary trigger marker at at least one of the arcs;
propagating, via the decoder, a token through at least one dynamic vocabulary WFST upon the at least one token reaching the trigger marker;
propagating, via the decoder, a token through at least one grammar WFST having at least one dynamic vocabulary class marker that indicates a type of dynamic vocabulary and is associated with the dynamic vocabulary of at least one of the dynamic vocabulary WFSTs with a propagating token;
providing, via the decoder, a hypothetical word or phrase based at least in part on the obtained human speech and depending, at least in part, on the WFSTs and comprising terms in the static vocabulary, dynamic vocabulary, or both vocabularies;
determining, via an interpretation engine, user intent based at least in part on output from the decoder based at least in part on the hypothetical word or phrase; and
initiating, via the interpretation engine, a response or action based at least in part on the determined user intent, the initiated response or action being implemented via speech output from a speaker component, via visual output from display component, and/or via other action from one or more end devices.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, article, and method of automatic speech recognition with dynamic vocabularies is described herein.
10 Citations
25 Claims
-
1. A computer-implemented method of automatic speech recognition, comprising:
-
obtaining, via at least one acoustic signal receiving unit, audio data including human speech; generating, via a decoder, a static vocabulary weighted finite state transducer (WFST) having nodes connected by arcs to propagate at least one token through the static vocabulary WFST and at least one dynamic vocabulary trigger marker at at least one of the arcs; propagating, via the decoder, a token through at least one dynamic vocabulary WFST upon the at least one token reaching the trigger marker; propagating, via the decoder, a token through at least one grammar WFST having at least one dynamic vocabulary class marker that indicates a type of dynamic vocabulary and is associated with the dynamic vocabulary of at least one of the dynamic vocabulary WFSTs with a propagating token; providing, via the decoder, a hypothetical word or phrase based at least in part on the obtained human speech and depending, at least in part, on the WFSTs and comprising terms in the static vocabulary, dynamic vocabulary, or both vocabularies; determining, via an interpretation engine, user intent based at least in part on output from the decoder based at least in part on the hypothetical word or phrase; and initiating, via the interpretation engine, a response or action based at least in part on the determined user intent, the initiated response or action being implemented via speech output from a speaker component, via visual output from display component, and/or via other action from one or more end devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented system of automatic speech recognition comprising:
-
at least one acoustic signal receiving unit to obtain audio data including human speech; at least one processor communicatively connected to the acoustic signal receiving unit; at least one memory communicatively coupled to the at least one processor; and a WFST decoder operated by the at least one processor and to; generate a static vocabulary weighted finite state transducer (WFST) having nodes connected by arcs to propagate at least one token through the static vocabulary WFST and at least one dynamic vocabulary trigger marker at at least one of the arcs; propagate a token through at least one dynamic vocabulary WFST upon the at least one token reaching the trigger marker; propagate a token through at least one grammar WFST having at least one dynamic vocabulary class marker that indicates a type of dynamic vocabulary and is associated with the dynamic vocabulary of at least one of the dynamic vocabulary WFSTs with a propagating token; provide a hypothetical word or phrase based at least in part on the obtained human speech and depending, at least in part, on the WFSTs and comprising terms in the static vocabulary, dynamic vocabulary, or both vocabularies; and an interpretation engine to; determine user intent based at least in part on output from the decoder based at least in part on the hypothetical word or phrase; and initiate a response or action based at least in part on the determined user intent, the initiated response or action being implemented via speech output from a speaker component, via visual output from display component, and/or via other action from one or more end devices. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
-
24. At least one non-transitory computer readable medium comprising a plurality of instructions that in response to being executed on a computing device, causes the computing device perform automatic speech recognition, comprising:
-
obtain audio data including human speech; generate, via a decoder, a static vocabulary weighted finite state transducer (WFST) having nodes connected by arcs to propagate at least one token through the static vocabulary WFST and at least one dynamic vocabulary trigger marker at at least one of the arcs; propagate, via the decoder, a token through at least one dynamic vocabulary WFST upon the at least one token reaching the trigger marker; propagate, via the decoder, a token through at least one grammar WFST having at least one dynamic vocabulary class marker that indicates a type of dynamic vocabulary and is associated with the dynamic vocabulary of at least one of the dynamic vocabulary WFSTs with a propagating token; and provide, via the decoder, a hypothetical word or phrase based at least in part on the obtained human speech and depending, at least in part, on the WFSTs and comprising terms in the static vocabulary, dynamic vocabulary, or both vocabularies; determine, via an interpretation engine, user intent based at least in part on output from the decoder based at least in part on the hypothetical word or phrase; and initiate, via the interpretation engine, a response or action based at least in part on the determined user intent, the initiated response or action being implemented via speech output from a speaker component, via visual output from display component, and/or via other action from one or more end devices. - View Dependent Claims (25)
-
Specification