Electronic facial tracking and detection system and method and apparatus for automated speech recognition

US 4,975,960 A
Filed: 06/03/1985
Issued: 12/04/1990
Est. Priority Date: 06/03/1985
Status: Expired due to Term

First Claim

Patent Images

1. An apparatus for producing output indicating words spoken by a human speaker, said apparatus comprising:

means for detecting sounds, converting said sounds to electrical signals, analyzing said signals to detect for words, and then producing an electrical acoustic output signal representing at least one spoken word;

means for scanning said speaker'"'"'s face and producing electrical image signals, each said signal representing an image in a sequence of video images of said speaker;

means, responsive to each said image signal, for tracking said speaker'"'"'s mouth by tracking said speaker'"'"'s nostrils;

means for analyzing portions of said image signals, said portions defined by said means for tracking, to produce a video output signal representing at least one visual manifestation of at least one spoken word;

means for receiving and correlating said acoustic output signal and said video output signal to produce said output.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The apparatus includes circuitry for obtaining a video image of an individual'"'"'s face, circuitry for electronically locating and tracking a first feature, such as the nostrils, of the facial image for use as reference coordinates and circuitry responsive to the reference coordinates for locating and tracking a second facial feature, such as the mouth, of the facial image with respect to the first feature. By tracking the location of the nostrils, the apparatus can follow the movement of the mouth, and thereby automatically recognize speech. In a preferred embodiment, the video image is grayscale encoded and the raster lines are smoothed to eliminate noise. The transitions between gray levels of the smoothed image are encoded and the resulting transition code is used to form a contour map of the image from which region parameters are computed which can be compared against stored speech templates to recognize speech. In the preferred embodiment, acoustic speech recognition is combined with visual speech recognition to improve accuracy.

Citations

94 Claims

1. An apparatus for producing output indicating words spoken by a human speaker, said apparatus comprising:
- means for detecting sounds, converting said sounds to electrical signals, analyzing said signals to detect for words, and then producing an electrical acoustic output signal representing at least one spoken word;
  
  means for scanning said speaker'"'"'s face and producing electrical image signals, each said signal representing an image in a sequence of video images of said speaker;
  
  means, responsive to each said image signal, for tracking said speaker'"'"'s mouth by tracking said speaker'"'"'s nostrils;
  
  means for analyzing portions of said image signals, said portions defined by said means for tracking, to produce a video output signal representing at least one visual manifestation of at least one spoken word;
  
  means for receiving and correlating said acoustic output signal and said video output signal to produce said output.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The apparatus of claim 1, wherein:
    - said means for tracking is responsive to more than one of said image signals.
  - 3. The apparatus of claim 2, wherein:
    - said means for tracking, means for analyzing said portions, and said means for receiving and correlating are together comprised of;
      
      a video processor, at least one computer means operably connected to said video processor and said means for detecting sounds, and at least one memory means accessible by said at least one computer means.
  - 4. The apparatus of claim 3, wherein:
    - said video processor is comprised of components including a camera control unit operably connected to said means for scanning, a window control unit, a grayscale threshold circuit, a video memory, a smooth and transition code circuit, a video control unit, and a controller, said components operably connected.
  - 5. The apparatus of claim 1, wherein:
    - said means for tracking, said means for analyzing said portions, and said means for receiving and correlating are together comprised of;
      
      a video processor, at least one computer means operably connected to said video processor and said means for detecting sounds and at least one memory means assessable by said at least one computer means.
  - 6. The apparatus of claim 5, wherein:
    - said video processor is comprised of components including a camera control unit operably connected to said means for scanning, a window control unit, a grayscale threshold circuit, a video memory, a smooth and transition code circuit, a video control unit, and a controller, said components operably connected.

7. An apparatus for producing output indicating words spoken by a human speaker, said apparatus comprising:
- an acoustic speech recognizer in operative combination with a microphone, for detecting sounds, converting said sounds to electrical signals, analyzing said electrical signals to detect for words, and then producing an electrical acoustic output signal representing an audio manifestation of spoken speech;
  
  means for scanning said speaker'"'"'s face and producing electrical image signals, each said signal representing an image in a sequence of video images of said speaker;
  
  means for analyzing said image signals to produce respective location signals in response to a detection of said speaker'"'"'s nostrils;
  
  means, activatable in response to one of said location signals, for defining a portion of one of said image signals for mouth analysis, said portion representing a region on said speaker'"'"'s face located at a position determined by said speaker'"'"'s nostrils;
  
  means for analyzing portions of said video image signals, responsive to said means for defining, to produce a video output signal, said video output signal representing a visual manifestation of speech;
  
  means for receiving and correlating said acoustic output signal and said video output signal to produce said output.
- View Dependent Claims (8, 9, 10)
- - 8. The apparatus of claim 7, wherein:
    - said means for analyzing said image signals is responsive to said one of said image signals and is further responsive to a prior one of said image signals.
  - 9. The apparatus of claim 8, wherein said means for analyzing respective portions of said video images is comprised of:
    - memory means for storing a series of mouth information data sequences, said series of data sequence representing visual manifestations of spoken words;
      
      means for deriving mouth information data from said portions to form a second data sequence; and
      
      means for comparing each of said data sequences in said series with said second data sequence to produce said video output signals, wherein said means for comparing includes time warping means.
  - 10. The apparatus of claim 7, wherein said means for analyzing respective portions of said video images is comprised of:
    - memory means for storing a series of mouth information data sequences, said series of data sequences representing visual manifestations of spoken words;
      
      means for deriving mouth information data from said portions to form a second data sequence; and
      
      means for comparing each of said data sequences in said series with said second data sequence to produce said video output signals, wherein said means for comparing includes time warping means.

11. An apparatus for electronically detecting a speaker'"'"'s facial features, the apparatus comprising:
- means for producing electric signals corresponding to a sequence of video images of the speaker; and
  
  means, responsive to one of the signals corresponding to one image in the sequence, for finding the speaker'"'"'s nostrils and then using the nostrils to define a region of the image for analyzing the speaker'"'"'s mouth.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
- - 12. The apparatus of claim 11 wherein:
    - the region does not include the nostrils.
  - 13. The apparatus of claim 11, further comprising:
    - means for comparing formations of the mouth in the regions in the sequence of images with data in a first computer memory to produce video speech recognition output;
      
      means for detecting sound produced by the speaker, operatively connected to the means for comparing, to associate the detected sound produced by the speaker with data in a second computer memory to produce acoustic speech recognition output; and
      
      means, responsive to the video speech recognition output and the acoustic speech recognition output, for combining the outputs to recognize speech, wherein the data in at least one of the computer memories comprises speech syntax information.
  - 14. The apparatus of claim 11, further comprising:
    - means for lipreading, responsive to a sequence of the regions and information in memory means, to produce output representing video speech recognition data.
  - 15. The apparatus of claim 14, wherein the means for lipreading is comprised of:
    - means for defining mouth parameters, responsive to the regions in the sequence of video images.
  - 16. The apparatus of claim 14, wherein the means for lipreading is comprised of:
    - means for defining region parameters relating to the speaker'"'"'s mouth, responsive to the regions in the sequence of video images.
  - 17. The apparatus of claim 14, wherein the means for lipreading is comprised of:
    - means for contour coding, responsive to the regions in the sequence of video images.
  - 18. The apparatus of claim 14, wherein the means for lipreading is comprised of:
    - means for light intensity scale thresholding, responsive to the regions in the sequence of video images.
  - 19. The apparatus of claim 14, wherein the means for lipreading is comprised of:
    - means for raster smoothing, responsive to the regions in the sequence of video images, to reduce video noise.
  - 20. The apparatus of claim 14, wherein the means for lipreading is comprised of:
    - means for transition coding, responsive to the regions in the sequence of video images.
  - 21. The apparatus of claim 14, wherein the means for lipreading is comprised of:
    - means for deriving mouth information time sequences, responsive to the regions in the sequence of video images, the time sequences each having a distance adjusted to standardize the time sequences with the data in the first computer memory.
  - 22. The apparatus of claim 14, further comprising:
    - means for recognizing acoustic speech, to produce output representing acoustic speech recognition data; and
      
      means for combining said video speech recognition data with said acoustic speech recognition data to produce output indicating words spoken by said speaker.
  - 23. The apparatus of claim 22, wherein the means for lipreading is comprised of:
    - means for defining mouth parameters, responsive to the regions in the sequence of video images.
  - 24. The apparatus of claim 22, wherein the means for lipreading is comprised of:
    - means for defining region parameters relating to the speaker'"'"'s mouth, responsive to the regions in the sequence of video images.
  - 25. The apparatus of claim 22, wherein the means for lipreading is comprised of:
    - means for contour coding, responsive to the regions in the sequence of video images.
  - 26. The apparatus of claim 22, wherein the means for lipreading is comprised of:
    - means for light intensity scale thresholding, responsive to the regions in the sequence of video images.
  - 27. The apparatus of claim 22, wherein the means for lipreading is comprised of:
    - means for raster smoothing, responsive to the regions in the sequence of video images, to reduce video noise.
  - 28. The apparatus of claim 22, wherein the means for lipreading is comprised of:
    - means for transition coding, responsive to the regions in the sequence of video images.
  - 29. The apparatus of claim 22, wherein the means for lipreading is comprised of:
    - means for deriving mouth information time sequences, responsive to the regions in the sequence of video images, the time sequences each having a distance adjusted to standardize the time sequences with the data in the first computer memory.
  - 30. The apparatus of claim 11, wherein the means for finding and the using is comprised of:
    - means, responsive to the one of the signals, for forming a data window in a subsequent one of the images in the sequence and limiting the means for finding and then using to searching for the nostrils within the data window of the subsequent image.
  - 31. The apparatus of claim 30, further comprising:
    - means for comparing formations of the mouth in the regions in the sequence of images with data in a first computer memory to produce video speech recognition output;
      
      means for detecting sound produced by the speaker, operatively connected to the means for comparing, to associate the detected sound produced by the speaker with data in a second computer memory to produce acoustic speech recognition output; and
      
      means, responsive to the video speech recognition output and the acoustic speech recognition output, for combining the outputs to recognize speech, wherein the data in at least one of the computer memories comprises speech syntax information.
  - 32. The apparatus of claim 30, further comprising:
    - means for lipreading, responsive to a sequence of the regions and information in memory means, to produce output representing video speech recognition data.
  - 33. The apparatus of claim 32, wherein the means for lipreading is comprised of:
    - means for defining mouth parameters, responsive to the regions in the sequence of video images.
  - 34. The apparatus of claim 32, wherein the means for lipreading is comprised of:
    - means for defining region parameters relating to the speaker'"'"'s mouth, responsive to the regions in the sequence of video images.
  - 35. The apparatus of claim 32, wherein the means for lipreading is comprised of:
    - means for contour coding, responsive to the regions in the sequence of video images.
  - 36. The apparatus of claim 32, wherein the means for lipreading is comprised of:
    - means for light intensity scale thresholding, responsive to the regions in the sequence of video images.
  - 37. The apparatus of claim 32, wherein the means for lipreading is comprised of:
    - means for aster smoothing, responsive to the regions in the sequence of video images, to reduce video noise.
  - 38. The apparatus of claim 32, wherein the means for lipreading is comprised of:
    - means for transition coding, responsive to the regions in the sequence of video images.
  - 39. The apparatus of claim 32, wherein the means for lipreading is comprised of:
    - means for deriving mouth information time sequences, responsive to the regions in the sequence of video images, the time sequences each having a distance adjusted to standardize the time sequences with the data in the first computer memory.
  - 40. The apparatus of claim 32, further comprising:
    - means for recognizing acoustic speech, to produce output representing acoustic speech recognition data; and
      
      means for combining said video speech recognition data with said acoustic speech recognition data to produce output indicating words spoken by said speaker.
  - 41. The apparatus of claim 40, wherein the means for lipreading is comprised of:
    - means for defining mouth parameters, responsive to the regions in the sequence of video images.
  - 42. The apparatus of claim 40, wherein the means for lipreading is comprised of:
    - means for defining region parameters relating to the speaker'"'"'s mouth, responsive to the regions in the sequence of video images.
  - 43. The apparatus of claim 40, wherein the means for lipreading is comprised of:
    - means for contour coding, responsive to the regions in the sequence of video images.
  - 44. The apparatus of claim 40, wherein the means for lipreading is comprised of:
    - means for light intensity scale thresholding, responsive to the regions in the sequence of video images.
  - 45. The apparatus of claim 40, wherein the means for lipreading is comprised of:
    - means for raster smoothing, responsive to the regions in the sequence of video images, to reduce video noise.
  - 46. The apparatus of claim 40, wherein the means for lipreading is comprised of:
    - means for transition coding, responsive to the regions in the sequence of video images.
  - 47. The apparatus of claim 40, wherein the means for lipreading is comprised of:
    - means for deriving mouth information time sequences, responsive to the regions in the sequence of video images, the time sequences each having a distance adjusted to standardize the time sequences with the data in the first computer memory.

48. A method for producing output indicating words spoken by a human speaker, said method comprising steps of:
- detecting sounds, converting said sounds to electrical signals, analyzing said signals to detect for words, and then producing an electrical acoustic output signal representing at least one spoken word;
  
  scanning said speaker'"'"'s face and producing electrical image signals, each said signal representing an image in a sequence of video images of said speaker;
  
  tracking said speaker'"'"'s mouth by tracking said speaker'"'"'s nostrils;
  
  analyzing portions of said image signals, said portions defined by said means for tracking, to produce a video output signal representing at least one visual manifestation of at least one spoken word;
  
  receiving and correlating said acoustic output signal and said video output signal to produce said output.
- View Dependent Claims (49, 50, 51, 52, 53)
- - 49. The method of claim 48, wherein:
    - said tracking is responsive to more than one of said image signals.
  - 50. The method of claim 49, wherein:
    - said steps of tracking, analyzing said portions, and receiving and correlating are together carried out with a video processor, at least one computer means operably connected to said video processor and said means for detecting sounds and at least one memory means accessible by said at least one computer means and said video processor.
  - 51. The method of claim 50, wherein:
    - said steps of tracking, analyzing said portions, and receiving and correlating are together carried out with a video processor comprised of components including a camera control unit operably connected to said means for scanning, a window control unit, a grayscale threshold circuit, a video memory, a smooth and transition code circuit, a video control unit, and a controller, said components operably connected.
  - 52. The method of claim 49, wherein:
    - said steps of tracking, analyzing said portions, and receiving and correlating are carried out with a video processor comprised of components including a camera control unit operably connected to said means for scanning, a window control unit, a grayscale threshold circuit, a video memory, a smooth and transition code circuit, a video control unit, and a controller, said components operably connected.
  - 53. The method of claim 48, wherein:
    - said steps of tracking, analyzing said portions, and receiving and correlating are together carried out with a video processor, at least one computer means operably connected to said video processor and said means for detecting sounds, and at least one memory means accessible by said at least one computer means and said video processor.

54. A method for producing output indicating words spoken by a human speaker, said method comprising steps of:
- detecting sounds by means of an acoustic speech recognizer in operative combination with a microphone, converting said sounds to electrical signals, analyzing said electrical signals to detect for words, and then producing an electrical acoustic output signal representing an audio manifestation of spoken speech;
  
  scanning said speaker'"'"'s face and producing electrical image signals, each said signal representing an image in a sequence of video images of said speaker;
  
  analyzing said image signals to produce respective location signals in response to a detection of said speaker'"'"'s nostrils;
  
  defining a portion of one of said image signals for mouth analysis in response to one of said location signals, said portion representing a region on said speaker'"'"'s face located at a position determined by said speaker'"'"'s nostrils;
  
  analyzing portions of said video image signals, responsive to said step of defining, t produce a video output signal, said video output signal representing a visual manifestation of speech;
  
  receiving and correlating said acoustic output signal and said video output signal to produce said output.
- View Dependent Claims (55, 56, 57)
- - 55. The method of claim 54, wherein:
    - said step of analyzing said image signals is responsive to said one of said image signals and is further responsive to a prior one of said image signals.
  - 56. The method of claim 55, wherein said step of analyzing respective portions of said video images is comprised of:
    - storing a series of mouth information data sequences in memory means, said series of data sequences representing visual manifestations of spoken words;
      
      deriving mouth information data from said portions to form a derived data sequence; and
      
      comparing each of said data sequences in said series with said derived data sequence to produce said video output signals, wherein said step of comparing includes time warping to equalize the length of said derived data sequence with each of said data sequences in said series.
  - 57. The method of claim 54, wherein said step of analyzing respective portions of said video images is comprised of:
    - storing a series of mouth information data sequences in memory means, said series of data sequences representing visual manifestations of spoken words;
      
      deriving mouth information data from said portions to form a derived data sequence; and
      
      comparing each of said data sequences in said series with said derived data sequence to produce said video output signals, wherein said step of comparing includes time warping to equalize the length of said derived data sequence with each of said data sequences in said series.

58. A method for electronically detecting a speaker'"'"'s facial features, the method comprising steps of:
- producing electric signals corresponding to a sequence of video images of the speaker; and
  
  finding the speaker'"'"'s nostrils in one image in the sequence and then using the nostrils to automatically define a region of the image for analyzing the speaker'"'"'s mouth.
- View Dependent Claims (59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94)
- - 59. The method of claim 58 wherein:
    - the region, defined in the step of using the nostrils, does not include the nostrils.
  - 60. The method of claim 58, further comprising steps of:
    - comparing formations of the mouth in the regions in the sequence of images with data in a first computer memory to produce video speech recognition output;
      
      detecting sound produced by the speaker and associating the detected sound produced by the speaker with data in a second computer memory to produce acoustic speech recognition output; and
      
      combining the outputs to recognize speech, responsive to the video speech recognition output and the acoustic speech recognition output, wherein the data in at least one of the computer memories comprises speech syntax information.
  - 61. The method of claim 58, further comprising a step of:
    - lipreading, responsive to a sequence of the regions and information in memory means, to produce output representing video speech recognition data.
  - 62. The method of claim 61, wherein the step of lipreading is comprised of:
    - defining mouth parameters, responsive to the regions in the sequence of video images.
  - 63. The method of claim 61, wherein the step of lipreading is comprised of:
    - defining region parameters relating to the speaker'"'"'s mouth, responsive to the regions in the sequence of video images.
  - 64. The method of claim 61, wherein the step of lipreading is comprised of:
    - contour coding, responsive to the regions in the sequence of video images.
  - 65. The method of claim 61, wherein the step of lipreading is comprised of:
    - light intensity scale thresholding, responsive to the regions in the sequence of video images.
  - 66. The method of claim 61, wherein the step of lipreading is comprised of:
    - raster smoothing, responsive to the regions in the sequence of video images, to reduce video noise.
  - 67. The method of claim 61, wherein the step of lipreading is comprised of:
    - transition coding, responsive to the regions in the sequence of video images.
  - 68. The method of claim 61, wherein the step of lipreading is comprised of:
    - deriving mouth information time sequences, responsive to the regions in the sequence of video images, the time sequences each having a distance adjusted to standardize the time sequences with the data in the first computer memory.
  - 69. The method of claim 61, further comprising steps of:
    - recognizing acoustic speech, to produce output representing acoustic speech recognition data; and
      
      combining said video speech recognition data with said acoustic speech recognition data to produce output indicating words spoken by said speaker.
  - 70. The method of claim 69, wherein the step of lipreading is comprised of:
    - defining mouth parameters, responsive to the regions in the sequence of video images.
  - 71. The method of claim 69, wherein the step of lipreading is comprised of:
    - defining region parameters relating to the speaker'"'"'s mouth, responsive to the regions in the sequence of video images.
  - 72. The method of claim 69, wherein the step of lipreading is comprised of:
    - contour coding, responsive to the regions in the sequence of video images.
  - 73. The method of claim 69, wherein the step of lipreading is comprised of:
    - light intensity scale thresholding, responsive to the regions in the sequence of video images.
  - 74. The method of claim 69, wherein the step of lipreading is comprised of:
    - raster smoothing, responsive to the regions in the sequence of video images, to reduce video noise.
  - 75. The method of claim 69, wherein the step of lipreading is comprised of:
    - transition coding, responsive to the regions in the sequence of video images.
  - 76. The method of claim 69, wherein the step of lipreading is comprised of:
    - deriving mouth information time sequences, responsive to the regions in the sequence of video images, the time sequences each having a distance adjusted to standardize the time sequences with the data in the first computer memory.
  - 77. The method of claim 58, wherein:
    - the steps of finding and than using are together comprised of;
      
      forming, responsive to the one of the signals, a data window in a subsequent one of the images in the sequence and limiting the finding and then using steps to searching for the nostrils within the data window of the subsequent image.
  - 78. The method of claim 77, further comprising steps of:
    - comparing formations of the mouth in the regions in the sequence of images with data in a first computer memory to produce video speech recognition output;
      
      detecting sound produced by the speaker and associating the detected sound produced by the speaker with data in a second computer memory to produce acoustic speech recognition output; and
      
      combining the video speech recognition output and the acoustic speech recognition output to recognize speech with the data in at least one of the computer memories comprising speech syntax information.
  - 79. The method of claim 77, further comprising a step of:
    - lipreading, responsive to a sequence of the regions and information in memory means, to produce output representing video speech recognition data.
  - 80. The method of claim 79, wherein the step of lipreading is comprised of:
    - defining mouth parameters, responsive to the regions in the sequence of video images.
  - 81. The method of claim 79, wherein the step of lipreading is comprised of:
    - defining region parameters relating to the speaker'"'"'s mouth, responsive to the regions in the sequence of video images.
  - 82. The method of claim 71, wherein the step of lipreading is comprised of:
    - contour coding, responsive to the regions in the sequence of video images.
  - 83. The method of claim 79, wherein the step of lipreading is comprised of:
    - light intensity scale thresholding, responsive to the regions in the sequence of video images.
  - 84. The method of claim 79, wherein the step of lipreading is comprised of:
    - raster smoothing, responsive to the regions in the sequence of video images, to reduce video noise.
  - 85. The method of claim 79, wherein the step of lipreading is comprised of:
    - transition coding, responsive to the regions in the sequence of video images.
  - 86. The method of claim 79, wherein the step of lipreading is comprised of:
    - deriving mouth information time sequences, responsive to the regions in the sequence of video images, the time sequences each having a distance adjusted to standardize the time sequences with the data in the first computer memory.
  - 87. The method of claim 79, further comprising steps of:
    - recognizing acoustic speech, to produce output representing acoustic speech recognition data; and
      
      combining said video speech recognition data with said acoustic speech recognition data to produce output indicating words spoken by said speaker.
  - 88. The method of claim 87, wherein the step of lipreading is comprised of:
    - defining mouth parameters, responsive to the regions in the sequence of video images.
  - 89. The method of claim 87, wherein the step of lipreading is comprised of:
    - defining region parameters relating to the speaker'"'"'s mouth, responsive to the regions in the sequence of video images.
  - 90. The method of claim 87, wherein the step of lipreading is comprised of:
    - contour coding, responsive to the regions in the sequence of video images.
  - 91. The method of claim 87, wherein the step of lipreading is comprised of:
    - light intensity scale thresholding, responsive to the regions in the sequence of video images.
  - 92. The method of claim 87, wherein the step of lipreading is comprised of:
    - raster smoothing, responsive to the regions in the sequence of video images, to reduce video noise.
  - 93. The method of claim 87, wherein the step of lipreading is comprised of:
    - transition coding, responsive to the regions in the sequence of video images.
  - 94. The method of claim 87, wherein the step of lipreading is comprised of:
    - deriving mouth information time sequences, responsive to the regions in the sequence of video images, the time sequences each having a distance adjusted to standardize the time sequences with the data in the first computer memory.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Face2Face Animation, Inc. (VectorMAX Corporation)
Original Assignee
Eric D. Petajan
Inventors
Petajan, Eric D.
Primary Examiner(s)
NOT, DEFINED
Assistant Examiner(s)
Merecki, John A.

Application Number

US06/741,298
Time in Patent Office

2,010 Days
Field of Search

381/41-45, 364/513.5, 364/518, 364/521, 358/125-126, 358/96, 382/1, 382/2, 382/6, 382/10, 382/16, 382/19, 382/22-23, 382/25, 382/28, 382/30, 382/48
US Class Current

704/251
CPC Class Codes

G07C 9/37 using biometric data, e.g. ...

G10L 15/24 Speech recognition using no...

Electronic facial tracking and detection system and method and apparatus for automated speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

94 Claims

Specification

Solutions

Use Cases

Quick Links

Electronic facial tracking and detection system and method and apparatus for automated speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

94 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links