Systems and methods for hands-free voice control and voice search
First Claim
Patent Images
1. A method comprising:
- receiving, in a processor, an acoustic input signal; and
processing the acoustic input signal with a plurality of acoustic recognition processes to recognize a predetermined target sound within the acoustic input signal, the acoustic input signal being temporally divided into a plurality of frames, the plurality of acoustic recognition processes being implemented using a single Viterbi search of a plurality of states corresponding to acoustic units of an acoustic model of the predetermined target sound, the plurality of states including initial states and final states of the acoustic model, whereinthe initial states are reset on each frame if a score for the initial state on a previous frame is below a first threshold,said score is calculated for a plurality of said states on each frame, the calculating comprising increasing the score by a per frame offset so that scores of different durations are comparable, anda result is generated when a score of a final state increases above a second threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step.
30 Citations
21 Claims
-
1. A method comprising:
-
receiving, in a processor, an acoustic input signal; and processing the acoustic input signal with a plurality of acoustic recognition processes to recognize a predetermined target sound within the acoustic input signal, the acoustic input signal being temporally divided into a plurality of frames, the plurality of acoustic recognition processes being implemented using a single Viterbi search of a plurality of states corresponding to acoustic units of an acoustic model of the predetermined target sound, the plurality of states including initial states and final states of the acoustic model, wherein the initial states are reset on each frame if a score for the initial state on a previous frame is below a first threshold, said score is calculated for a plurality of said states on each frame, the calculating comprising increasing the score by a per frame offset so that scores of different durations are comparable, and a result is generated when a score of a final state increases above a second threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer readable storage medium having stored thereon program code executable by a processor, said program code comprising:
-
code that causes the processor to receive an acoustic input signal; and code that causes the processor to process the acoustic input signal with a plurality of acoustic recognition processes to recognize a predetermined target sound within the acoustic input signal, the acoustic input signal being temporally divided into a plurality of frames, the plurality of acoustic recognition processes being implemented using a single Viterbi search of a plurality of states corresponding to acoustic units of an acoustic model of the predetermined target sound, the plurality of states including initial states and final states of the acoustic model, wherein the initial states are reset on each frame if a score for the initial state on a previous frame is below a first threshold, said score is calculated for a plurality of said states on each frame, the calculating comprising increasing the score by a per frame offset so that scores of different durations are comparable, and a result is generated when a score of a final state increases above a second threshold. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer system comprising:
-
a processor; and a non-transitory computer readable medium having stored thereon instructions that, when executed by the processor, causes the processor to; receive an acoustic input signal; and process the acoustic input signal with a plurality of acoustic recognition processes to recognize a predetermined target sound within the acoustic input signal, the acoustic input signal being temporally divided into a plurality of frames, the plurality of acoustic recognition processes being implemented using a single Viterbi search of a plurality of states corresponding to acoustic units of an acoustic model of the predetermined target sound, the plurality of states including initial states and final states of the acoustic model, wherein the initial states are reset on each frame if a score for the initial state on a previous frame is below a first threshold, said score is calculated for a plurality of said states on each frame, the calculating comprising increasing the score by a per frame offset so that scores of different durations are comparable, and a result is generated when a score of a final state increases above a second threshold. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification