MULTIMODAL UNIFICATION OF ARTICULATION FOR DEVICE INTERFACING

US 20100070268A1
Filed: 09/10/2009
Published: 03/18/2010
Est. Priority Date: 09/10/2008
Status: Active Grant

First Claim

Patent Images

1. A system for a multimodal unification of articulation, comprising:

a voice signal modality receiving a voice signal;

a control signal modality receiving an input from a user while the voice signal is being inputted, the control signal modality generating a control signal from the input, the input selected from predetermined inputs to help decipher ambiguities arising from syllable boundary, word boundary, homonym, prosody, or intonation; and

a multimodal integration system receiving and integrating the voice signal and the control signal, the multimodal integration system comprising an inference engine to delimit a context of a spoken utterance of the voice signal by discretizing the voice signal into phonetic frames, the inference engine analyzing the discretized voice signal integrated with the control signal to output a recognition result.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for a multimodal unification of articulation includes a voice signal modality to receive a voice signal, and a control signal modality which receives an input from a user and generates a control signal from the input which is selected from predetermined inputs directly corresponding to the phonetic information. The interactive voice based phonetic input system also includes a multimodal integration system to receive and integrates the voice signal and the control signal. The multimodal integration system delimits a context of a spoken utterance of the voice signal by using the control signal to preprocess and discretize into phonetic frames. A voice recognizer analyzing the voice signal integrated with the control signal to output a voice recognition result. This new paradigm helps overcome constraints found in interfacing mobile devices. Context information facilitates the handling of the commands in the application environment.

Citations

30 Claims

1. A system for a multimodal unification of articulation, comprising:
- a voice signal modality receiving a voice signal;
  
  a control signal modality receiving an input from a user while the voice signal is being inputted, the control signal modality generating a control signal from the input, the input selected from predetermined inputs to help decipher ambiguities arising from syllable boundary, word boundary, homonym, prosody, or intonation; and
  
  a multimodal integration system receiving and integrating the voice signal and the control signal, the multimodal integration system comprising an inference engine to delimit a context of a spoken utterance of the voice signal by discretizing the voice signal into phonetic frames, the inference engine analyzing the discretized voice signal integrated with the control signal to output a recognition result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 2. The system of claim 1, wherein the voice signal comprises a signal of a continuous speech, and the inference engine comprises a continuous speech recognizer.
  - 3. The system of claim 1, wherein the voice signal comprises a signal of an isolated word speech, and the inference engine comprises an isolated word speech recognizer.
  - 4. The system of claim 1, wherein the voice signal modality comprises at least one selected from the group consisting of a microphone, an artificial voice generator, and a combination thereof.
  - 5. The system of claim 1, wherein the control signal modality comprises at least one selected from the group consisting of a keyboard, a mouse, a touchscreen, a wireless pointing device, an eye-tracking device, a brain-machine interface, and a combination thereof.
  - 6. The system of claim 5, further comprising a non-invasive on-screen dialog manager interface to be displayed for touch and/or pen based control signal input.
  - 7. The system of claim 5, wherein the input from the user comprises at least one selected from the group consisting of pressing a predetermined key of the keyboard, tapping the touchscreen in a predetermined pattern at a predetermined area of the touchscreen, stroking the touchscreen with a predetermined pattern at a predetermined area of the touchscreen, and moving the mouse with a predetermined pattern.
  - 8. The system of claim 1, wherein the control signal modality is a touchscreen, and the input from the user is generated by at least one of the user'"'"'s tapping and stroking on the touchscreen respectively for each syllable or word spoken by the user, on a predetermined area with a predetermined number of fingers.
  - 9. The system of claim 1, further comprising an analog-to-digital conversion module converting the voice signal into quantized input stream and a spectral feature extraction module transforming the quantized input stream into frames of vectors.
  - 10. The system of claim 9, wherein the inference engine comprises:
    - an acoustic model mapping the frames of vectors into internal phonetic representation;
      
      a language model; and
      
      a dialog manager working with the language model for determining how the utterance is interpreted.
  - 11. The system of claim 10, wherein the input further comprises a context information for at least one of the dialog manager and the language model, the context information indicating at least one selected from the group consisting of which language is used, whether utterance should be executed or transcribed, and whether the voice signal is related to a punctuation symbol, a programming language token, or a phrase from a predetermined vocabulary subset.
  - 12. The system of claim 10, wherein the control signal facilitates inference in the acoustic model, from ambiguities in at least one selected from the group consisting of allophones, syllable boundaries, word boundaries, prosodies, and intonations.
  - 13. The system of claim 1, wherein the inference engine tolerates misalignments in the control signal.
  - 14. The system of claim 10, wherein the control signal facilitates inference in the language model, from ambiguities from homonym.
  - 15. The system of claim 10, wherein the control signal facilitates interpretation of a command in the dialog manager.
  - 16. The system of claim 1, wherein the input from the user corresponds to tone levels of a tonal language, and the multimodal integration system disambiguates n-best candidates by using a confirmatory process.
  - 17. The system of claim 11, wherein the control signal modality is a touchscreen, and the input is generated by touching the touchscreen with a shape corresponding to tone levels of a tonal language.
  - 18. The system of claim 1, wherein the input from the user corresponds to syllable boundaries and prosody in Japanese, and the multimodal integration system disambiguates n-best candidates by using a confirmatory process.
  - 19. The system of claim 1, wherein the voice signal is generated by an artificial speech through audible or non-audible ultrasonic glottal pulse generation.
  - 20. This system of claim 19, wherein the control signal generation and the glottal pulse generation. are integrated.
  - 21. The system of claim 1, further comprising a confirmatory processing to confirm a partial result of n-best candidates from the inference engine executing concurrently while receiving the input.
  - 22. A portable device having the system for the multimodal unification of articulation of claim 1.
  - 23. A navigation system having the system for the multimodal unification of articulation of claim 1.
  - 24. A networked service system for the multimodal unification of articulation of claim 1.

25. A method for performing a multimodal unification of articulation, comprising:
- receiving a voice signal;
  
  receiving an input from a user while the voice signal is being received, the input selected from predetermined inputs directly corresponding to phonetic information;
  
  generating a control signal generated by the input from the user to make the control signal carry phonetic information of the voice signal;
  
  integrating the voice signal and the control signal;
  
  discretizing the voice signal into phonetic frames to delimit a context of a spoken utterance of the voice signal; and
  
  analyzing the discretized voice signal integrated with the control signal to output a recognition result.
- View Dependent Claims (26, 27, 28, 29, 30)
- - 26. The method of claim 25, wherein the voice signal is a signal of a continuous speech.
  - 27. The method of claim 25, wherein the input is generated by at least one selected from the group consisting of pressing a predetermined key of the keyboard, tapping the touchscreen in a predetermined pattern at a predetermined area of the touchscreen, stroking the touchscreen with a predetermined pattern at a predetermined area of the touchscreen, and moving the mouse with a predetermined pattern.
  - 28. The method of claim 25, wherein the input is generated by at least one of the user'"'"'s tapping and stroking on the touchscreen respectively for each syllable or word spoken by the user, on a predetermined area with a predetermined number of fingers.
  - 29. The method of claim 25, wherein the voice signal is related to Chinese or Japanese language, and the integration of the voice signal and the control signal comprises preprocessing and discretizing into phonetic frames without performing an artificial Romanization.
  - 30. The method of claim 29, wherein the input further comprises an input of touching a touchscreen with a predetermined shape corresponding to tone levels of a tonal language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Jun Hyung Sung
Original Assignee
Jun Hyung Sung
Inventors
Sung, Jun Hyung

Granted Patent

US 8,352,260 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/203
CPC Class Codes

G10L 15/24 Speech recognition using no...

G10L 2015/025 Phonemes, fenemes or fenone...

MULTIMODAL UNIFICATION OF ARTICULATION FOR DEVICE INTERFACING

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

MULTIMODAL UNIFICATION OF ARTICULATION FOR DEVICE INTERFACING

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links