AUTOMATIC SPEECH RECOGNITION WITH DETECTION OF AT LEAST ONE CONTEXTUAL ELEMENT, AND APPLICATION MANAGEMENT AND MAINTENANCE OF AIRCRAFT
1 Assignment
0 Petitions
Accused Products
Abstract
An automatic speech recognition with detection of at least one contextual element, and application to aircraft flying and maintenance are provided. The automatic speech recognition device comprises a unit for acquiring an audio signal, a device for detecting the state of at least one contextual element, and a language decoder for determining an oral instruction corresponding to the audio signal. The language decoder comprises at least one acoustic model defining an acoustic probability law and at least two syntax models each defining a syntax probability law. The language decoder also comprises an oral instruction construction algorithm implementing the acoustic model and a plurality of active syntax models taken from among the syntax models, a contextualization processor to select, based on the state of the order each contextual element detected by the detection device, at least one syntax model selected from among the plurality of active syntax models, and a processor for determining the oral instruction corresponding to the audio signal.
-
Citations
31 Claims
-
1-15. -15. (canceled)
-
16. An automatic speech recognition device comprising:
an acquisition unit for acquiring an audio signal, a forming member for forming the audio signal, to divide the audio signal into frames, a detection device to detect the state of at least one contextual element, and a language decoder for determining an oral instruction corresponding to the audio signal, the language decoder comprising; at least one acoustic model defining an acoustic probability law for calculating, for each phoneme of a sequence of phonemes, an acoustic probability of that phoneme and a corresponding frame of the audio signal matching; at least two syntax models defining a syntax probability law for calculating, for each phoneme of a sequence of phonemes analyzed using said acoustic model, a syntax probability of that phoneme following the phoneme or group of phonemes preceding said phoneme in the sequence of phonemes; an oral instruction construction algorithm implementing the acoustic model and a plurality of active syntax models from among the syntax models to build, for each active syntax model, a candidate sequence of phonemes associated with said active syntax model so that the product of the acoustic and syntax probabilities of the different phonemes making up said candidate sequence of phonemes is maximal; a contextualization processor to select, based on the state of the or each contextual element detected by the detection device, at least one syntax model selected from among the plurality of active syntax models; and a determination processor for determining the oral instruction corresponding to the audio signal, to define the candidate sequence of phonemes associated with the selected syntax model or, if several syntax models are selected, the sequence of phonemes, from among the candidate sequences of phonemes associated with the selected acoustic models, for which the product of the acoustic and syntax probabilities of different phonemes making up said sequence of phonemes is maximal, as constituting the oral instruction corresponding to the audio signal. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
24. An automatic speech recognition method comprising:
determining an oral instruction corresponding to an audio signal, the determining of the oral instruction being implemented by an automatic speech recognition device comprising; at least one acoustic model defining an acoustic probability law for calculating, for each phoneme of a sequence of phonemes, an acoustic probability of that phoneme and a corresponding frame of the audio signal matching, at least two syntax models defining a syntax probability law for calculating, for each phoneme of a sequence of phonemes analyzed using said acoustic model, a syntax probability of that phoneme following the phoneme or group of phonemes preceding said phoneme in the sequence of phonemes, acquiring the audio signal, detecting the status of at least one contextual element, activating a plurality of syntax models forming active syntax models, forming the audio signal, said forming comprising dividing the audio signal into frames, building, for each active syntax model, using the acoustic model and said active syntax model, a candidate sequence of phonemes associated with said active syntax model so that the product of the acoustic and syntax probabilities of the different phonemes making up said candidate sequence of phonemes is maximal, selecting, based on the state of the detected contextual element, at least one syntax model from among the active syntax models, and defining the candidate sequence of phonemes associated with the selected syntax model or, if several syntax models are selected, the sequence of phonemes, from among the candidate sequences of phonemes associated with the selected syntax models, for which the product of the acoustic and syntax probabilities of different phonemes making up said sequence of phonemes is maximal, as constituting the oral instruction corresponding to the audio signal. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31)
Specification