Voice activation
First Claim
1. A system for a tailorable and adaptable implementation of a voice activation function capable of a practical application of multiple voice activation algorithms, receiving an audio input signal and furnishing a trigger impulse as output signal, comprising:
- an analog audio signal pick-up sensor;
an analog/digital converting means digitizing said audio signal and thus transforming said audio signal into a digital signal, then named ‘
Digital Audio Input Signal’
;
a modular assembly of multiple voice activation algorithm specific circuits made up of building block modules containing processing means for amplitude and energy values of said ‘
Digital Audio Input Signal’
as well as and especially for Noise and Speech estimation calculations, intermediate storing means, comparing means, connecting means and means for selecting and operating said voice activation algorithms; and
a means generating said trigger impulse.
1 Assignment
0 Petitions
Accused Products
Abstract
A circuit and a method are given, to realize a very flexible voice activation system using a modular building block approach, that is adaptively tailored to handle certain relevant and case specific operational characteristics describing most of the possible acoustical differing environmental cases to be found in the field of speech recognition. Included are determinations of “Noise estimation and “Speech estimation” values, done effectively without use of Fast Fourier Transform (FFT) methods or zero crossing algorithms only by analyzing the modulation properties of human voice. Said circuit and method are designed in order to be implemented with a very economic number of components, capable to be realized with modern integrated circuit technologies.
60 Citations
54 Claims
-
1. A system for a tailorable and adaptable implementation of a voice activation function capable of a practical application of multiple voice activation algorithms, receiving an audio input signal and furnishing a trigger impulse as output signal, comprising:
-
an analog audio signal pick-up sensor;
an analog/digital converting means digitizing said audio signal and thus transforming said audio signal into a digital signal, then named ‘
Digital Audio Input Signal’
;
a modular assembly of multiple voice activation algorithm specific circuits made up of building block modules containing processing means for amplitude and energy values of said ‘
Digital Audio Input Signal’
as well as and especially for Noise and Speech estimation calculations, intermediate storing means, comparing means, connecting means and means for selecting and operating said voice activation algorithms; and
a means generating said trigger impulse. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A circuit, realizing a voice activation system capable of implementing multiple voice activation algorithms and being composed of four levels of building block modules as well as connection means, receiving an audio input signal and furnishing a trigger impulse as output signal, comprising:
-
an input terminal as entry for said audio input signal into a first level of modules;
a first level of modules consisting of a set of processing modules including modules for signal amplitude preparation, energy calculation and especially noise and speech estimation;
a second level of modules consisting of a set of intermediate storage modules for threshold and signal values;
a multipurpose connection means in order to transfer said audio input signal to said first level modules and to appropriately connect said first level modules to each other and to said second level of modules;
a third level of modules consisting of comparator modules;
a fourth level of modules as trigger generating means including additional configuration, setup and logic modules; and
an output terminal for said IRQ signal as said output signal in form of said trigger impulse. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A circuit for a tailorable voice activation system evaluating an audio input signal and generating a voice activation trigger signal as output and capable of implementing multiple voice activation algorithms thus realizing a very flexible and adaptable voice activation function, built in form of a multilevel structure of building block modules, comprising:
-
two terminal pins for input and output externally connecting said audio input signal named ‘
Digital Audio Input Signal’ and
said voice activation trigger signal named ‘
Interrupt ReQuest (IRQ) signal’
to said circuit;
a means for processing said ‘
Digital Audio Input Signal’
directly as signal amplitude variable and thus generally designated as processing means;
a means for processing derivatives of said signal amplitude variable such as energy, noise and speech signal variables and thus generally designated also as processing means;
a means for intermediately storing the resulting values from said processing means of signal variables and thus generally designated as intermediate storing means;
a means for intermediately storing threshold values for said amplitude, energy, noise and speech signals and thus generally also designated as intermediate storing means;
a means for comparing said intermediately stored and respectively correlated signal variable and threshold values and thus designated as comparing means;
a means for generating a triggering impulse which is signalling a recognized event for said wanted voice activation function and thus designated as trigger impulse generating means;
a means for connecting said ‘
Digital Audio Input Signal’
via said input terminal pin and also for connecting said derivative signals thereof to and between said processing means;
a means for connecting said intermediately stored and respectively correlated signal values to said comparing means;
a means for configuring said means for connecting, processing, storing, comparing and trigger impulse generating and thus designated as configuring means; and
a means for setting-up said storage and said comparing means for their corresponding threshold values and also setting-up an IRQ value or a boolean combination of IRQ values for said trigger impulse generating means and thus designated as set-up means, named “
IRQ Status/Config”
module, and therefore also connected to the pertaining modules in order to furnish said voice activation function. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
-
49. A method for a general tailorable and adaptable voice activation circuits system capable of implementing multiple diverse voice activation algorithms with an input terminal for an audio input signal and an output terminal for a generated voice activation trigger signal and being composed of four levels of building block modules together with two levels of connection layers, altogether being dynamically set-up, configured and operated within the framework of a flexible timing schedule, comprising:
-
providing as processing means—
four first level modules named “
Amplitude Processing”
block, “
Energy Processing”
block, “
Noise Processing”
block and “
Speech Processing”
block, which act on its input signal named ‘
Digital Audio Input Signal’
either directly or indirectly, i.e. either on its amplitude value as input variable or on processed derivatives thereof, i.e. on energy, noise and speech values as processing variables;
providing as storing means four pairs of second level modules designated as value and threshold storing blocks or units respectively, namely for intermediate storage of pairs of amplitude, signal energy, noise energy and speech energy values in each case, named “
Amplitude Threshold” and
“
Amplitude Value”
, “
Energy Threshold” and
“
Energy Value”
, “
Noise Threshold” and
“
Noise Energy Value”
, as well as “
Speech Threshold” and
“
Speech Energy Value”
providing as comparing means within a third level of modules four comparator blocks, named “
Amplitude Comparator”
, “
Energy Comparator”
, “
Noise Comparator”
, and “
Speech Comparator”
;
providing as triggering means and fourth module level an “
IRQ Logic”
block together with its “
IRQ Status/Config”
block, delivering an IRQ output signal for voice activation;
providing also a “
First Interconnection Layer”
within and between said first level modules for processing said ‘
Digital Audio Input Signal’
values from its amplitude, energy, noise and speech variables and said second level modules, whereby said amplitude value of said ‘
Digital Audio Input Signal’
may be fed into said “
Amplitude Processing”
block, and/or into said “
Energy Processing”
block, and/or into said “
Noise Processing”
block and/or into said “
Speech Processing”
block, thus receiving from each other already processed values as possible input and/or control signals separately or in parallel and whereby finally from all said processing the resulting variables with their calculated and/or estimated values are fed into said respective second level storing units, named “
Amplitude Value”
, “
Signal Energy Value”
, “
Noise Energy Value”
, and “
Speech Energy Value”
;
providing further a “
Second Interconnection Layer”
between said second and third level of modules for storing and comparing said processed values of said amplitude, energy, noise (SNR) and speech variables, whereby always the corresponding values of threshold and variable result pairs are fed into their respective comparator blocks located within said third level of modules and whereby said comparator blocks may also receive via an extra input additional control signals from others of said second level modules;
providing an extra “
Config”
block for setting-up and configuring all necessary threshold values and operating states for said blocks within all four levels of modules according to said voice activation algorithm to be actually implemented;
connecting the output of each of said comparators in module level three to said fourth level “
IRQ Logic”
block as inputs;
establishing a recursively adapting and iteratively looping and timing schedule as operating scheme for said tailorable voice activation circuits system capable of implementing multiple diverse voice activation algorithms and thus being able to being continuously adapted for its optimum operation;
initializing with pre-set operating states and pre-set threshold values a start-up operating cycle of said operating scheme for said voice activation circuit;
starting said operating scheme for said adaptable voice activation circuits system by feeding said ‘
Digital Audio Input Signal’
as sampled digital amplitude values into the circuit, namely said “
First Interconnection Layer”
, for further processing e.g. by calculating said signal energy, and/or by estimating said noise energy and/or said speech energy;
deciding upon said voice activation algorithm to be chosen for actual implementation with the help of crucial variable values such as said amplitude value from said audio signal input variable and also said already calculated and estimated signal energy, noise energy and speech energy values as processing variables critical and crucial for said voice activation algorithm and in conjunction with some sort of a decision table, leading to optimum choices for said voice activation algorithms;
setting-up the operating function of said “
First Interconnection Layer”
element appropriately with the help of said “
Status/Config”
block considering the requirements of said voice activation algorithm to be actually implemented for the connections within and between said first and second level modules;
setting-up the operating function of said “
Second Interconnection Layer”
element appropriately with the help of said “
Status/Config”
block considering the requirements of said voice activation algorithm to be actually implemented for the connections within and between said second and third level modules;
configuring said necessary operating states e.g. in internal modules each with specific registers by algorithm defining values corresponding to said actually chosen voice activation algorithm for future operations;
setting-up the operating function of said “
IRQ Logic”
block appropriately with the help of said “
IRQ Status/Config”
block considering said voice activation algorithm to be actually implemented;
processing continuously within said “
Energy Processing”
block e.g. said “
Signal Energy Value”
calculation, acting on said input signal named ‘
Digital Audio Input Signal’
;
processing continuously within said “
Noise Processing”
block e.g. said “
Noise Energy Value”
estimation, which depends on its input signal, e.g. said already formerly calculated “
Signal Energy Value”
;
processing continuously within said “
Speech Estimation”
block e.g. said “
Speech Energy Value”
, which depends on its input signal, e.g. said already formerly calculated “
Signal Energy Value”
;
storing within its corresponding storing units located within module level two the results of said preceding “
Amplitude Processing”
, “
Energy Processing”
, “
Noise Processing” and
“
Speech Processing”
operations, namely said “
Amplitude Value”
, “
Signal Energy Value”
, “
Noise Energy Value”
, and “
Speech Energy Value”
all taken directly or indirectly from said ‘
Digital Audio Input Signal’
;
setting-up within said storing units said respective threshold values named “
Amplitude Threshold”
, “
Energy Threshold”
, “
Noise Threshold” and
“
Speech Threshold”
corresponding to said actually chosen voice activation algorithm for future comparing operations;
comparing with the help of said “
Amplitude Comparator”
, “
Energy Comparator”
, “
Noise Comparator”
, and “
Speech Comparator”
said “
Amplitude Threshold” and
“
Amplitude Value”
, said “
Energy Threshold” and
“
Signal Energy Value”
, said “
Noise Threshold” and
“
Noise Energy Value”
, as well as said “
Speech Threshold” and
“
Speech Energy Value”
;
evaluating the outcome of the former comparing operations within said “
IRQ Logic”
block with respect to said earlier set-up operating function;
generating, depending on said “
IRQ Logic”
evaluation in the case where applicable a trigger event as IRQ impulse signalling a recognized speech element for said voice activation; and
re-starting again said once established operating scheme for said voice activation circuits system from said starting point above and continue its looping schedule. - View Dependent Claims (50, 51)
-
-
52. A method for a tailorable and adaptable voice activation circuits system capable of implementing multiple diverse voice activation algorithms with an input terminal for an audio input signal and an output terminal for a generated voice activation trigger signal and being composed of four levels of building block modules together with two sets of connections, altogether being set-up, configured and operated within the framework of a timing schedule, comprising:
-
providing as processing means three first level modules named “
Energy Calculation”
block, “
Noise Estimation”
block and “
Speech Estimation”
block, which act on its input signal named ‘
Digital Audio Input Signal’
directly, i.e. on its amplitude value as input variable and also on processed derivatives thereof, i.e. on energy, noise and speech values as processing variables;
providing as storing means four pairs of second level modules designated as value and threshold storing blocks or units respectively, namely for intermediate storage of pairs of amplitude, signal energy, noise energy and speech energy values in each case, named “
Amplitude Threshold” and
“
Amplitude Value”
, “
Energy Threshold” and
“
Energy Value”
, “
SNR Threshold” and
“
Noise Energy Value”
, as well as “
Speech Threshold” and
“
Speech Energy Value”
;
providing as comparing means within a third level of modules four comparator blocks, named “
Amplitude Comparator”
, “
Energy Comparator”
, “
Noise (SNR) Comparator”
, and “
Speech Comparator”
;
providing as triggering means and fourth module level an “
IRQ Logic”
block together with its “
IRQ Status/Config”
block, delivering an IRQ output signal for voice activation;
providing also a first set of interconnections within and between said first level modules for processing said ‘
Digital Audio Input Signal’
values from its amplitude, energy, noise (SNR) and speech variables and said second level modules, whereby said amplitude value of said ‘
Digital Audio Input Signal’
is fed into said “
Energy Calculation”
block and in turn both estimation blocks, for “
Noise Estimation” and
for “
Speech Estimation”
namely, receive from it said therein calculated signal energy value in parallel and whereby finally from all said resulting variables their calculated and estimated values are fed into said respective second level storing units, named “
Amplitude Value”
, “
Energy Value”
, “
Noise Energy Value”
, and “
Speech Energy Value”
;
providing further a second set of interconnections between said second and third level of modules for storing and comparing said processed values from said amplitude, energy, noise (SNR) and speech variables, whereby always the corresponding values of threshold and variable result pairs are fed into their respective comparator blocks and only said “
Noise (SNR) Comparator”
block receives via an extra input from said “
Speech Energy Value”
block said speech energy value as additional control signal;
providing an extra “
Config”
block for setting-up and configuring all necessary threshold values and operating states for said blocks within all four levels of modules according to said voice activation algorithm to be actually implemented;
connecting the output of each of said comparators in module level three to said fourth level “
IRQ Logic”
block as inputs;
establishing a recursively adapting and iteratively looping and timing schedule as operating scheme for said tailorable voice activation circuits system capable of implementing multiple diverse voice activation algorithms and thus being able to being continuously adapted for its optimum operation;
initializing with pre-set operating states and pre-set threshold values a start-up operating cycle of said operating scheme for said voice activation circuit;
starting said operating scheme for said adaptable voice activation circuits system by feeding said ‘
Digital Audio Input Signal’
as sampled digital amplitude values into the circuit, by calculating said signal energy, and estimating said noise energy (SNR) and said speech energy;
deciding upon said voice activation algorithm to be chosen for actual implementation with the help of crucial variable values such as said amplitude value from said audio signal input variable and also said already calculated and estimated signal energy, noise energy and speech energy values as processing variables critical and crucial for said voice activation algorithm and in conjunction with some sort of a decision table, leading to optimum choices for said voice activation algorithms;
configuring said necessary operating states e.g. in internal modules each with specific registers by algorithm defining values corresponding to said actually chosen voice activation algorithm for future operations;
setting-up the operating function of said “
IRQ Logic”
block appropriately with the help of said “
IRQ Status/Config”
block considering said voice activation algorithm to be actually implemented;
calculating continuously within said “
Energy Calculation”
block said “
Energy Value”
, acting on said input signal named ‘
Digital Audio Input Signal’
;
estimating continuously within said “
Noise Estimation”
block said “
Noise Energy Value”
, which depends on its input signal, namely said already formerly calculated “
Energy Value”
;
estimating continuously within said “
Speech Estimation”
block said “
Speech Energy Value”
, which depends on its input signal, namely said already formerly calculated “
Energy Value”
;
storing within its corresponding storing units located within module level two the results of said preceding “
Energy Calculation”
, “
Noise Estimation” and
“
Speech Estimation”
operations, namely said “
Energy Value”
, “
Noise Energy Value”
, and “
Speech Energy Value”
as well as said “
Amplitude Value”
taken directly from said ‘
Digital Audio Input Signal’
;
setting-up within said storing units said respective threshold values named “
Amplitude Threshold”
, “
Energy Threshold”
, “
SNR Threshold” and
“
Speech Threshold”
corresponding to said actually chosen voice activation algorithm for future comparing operations;
comparing with the help of said “
Amplitude Comparator”
, “
Energy Comparator”
, “
Noise (SNR) Comparator”
, and “
Speech Comparator”
said “
Amplitude Threshold” and
“
Amplitude Value”
, said “
Energy Threshold” and
“
Energy Value”
, said “
SNR Threshold” and
“
Noise Energy Value”
, as well as said “
Speech Threshold” and
“
Speech Energy Value”
;
evaluating the outcome of the former comparing operations within said “
IRQ Logic”
block with respect to said earlier set-up operating function;
generating, depending on said “
IRQ Logic”
evaluation in the case where applicable a trigger event as IRQ impulse signalling a recognized speech element for said voice activation; and
re-starting again said once established operating scheme for said voice activation circuits system from said starting point above and continue its looping schedule. - View Dependent Claims (53, 54)
-
Specification