Method and apparatus for recognizing speech by lip reading

US 10,204,626 B2
Filed: 05/10/2018
Issued: 02/12/2019
Est. Priority Date: 11/26/2014
Status: Active Grant

First Claim

Patent Images

1. A vehicle component control device comprising:

an audio input device configured to receive a voice utterance including a plurality of words;

a video input device configured to receive video of lip motion of a user;

a memory portion;

a controller configured according to instructions in the memory portion to generate first data packets including an audio stream representative of the voice utterance and a video stream representative of the lip motion; and

a transceiver for sending the first data packets to a remote apparatus and receiving second data packets including machine code for controlling a vehicle component,wherein the machine code is assigned from configured dictation generated based upon the audio stream and the video stream from the remote apparatus,wherein in the configured dictation, at least one word in first dictation generated based upon the audio stream which has a predetermined characteristic has been corrected by a feature signal parameter sequence based upon the video stream,wherein the first data packets further include geographical data and the remote apparatus determines a location associated with the vehicle component control device based on the geographical data,wherein the machine code is assigned from the first dictation or the configured dictation based upon the location associated with the vehicle component control device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dictation device includes: an audio input device configured to receive a voice utterance including a plurality of words; a video input device configured to receive video of lip motion during the voice utterance; a memory portion; a controller configured according to instructions in the memory portion to generate first data packets including an audio stream representative of the voice utterance and a video stream representative of the lip motion; and a transceiver for sending the first data packets to a server end device and receiving second data packets including combined dictation based upon the audio stream and the video stream from the server end device. In the combined dictation, first dictation generated based upon the audio stream has been corrected by second dictation generated based upon the video stream.

Citations

16 Claims

1. A vehicle component control device comprising:
- an audio input device configured to receive a voice utterance including a plurality of words;
  
  a video input device configured to receive video of lip motion of a user;
  
  a memory portion;
  
  a controller configured according to instructions in the memory portion to generate first data packets including an audio stream representative of the voice utterance and a video stream representative of the lip motion; and
  
  a transceiver for sending the first data packets to a remote apparatus and receiving second data packets including machine code for controlling a vehicle component,wherein the machine code is assigned from configured dictation generated based upon the audio stream and the video stream from the remote apparatus,wherein in the configured dictation, at least one word in first dictation generated based upon the audio stream which has a predetermined characteristic has been corrected by a feature signal parameter sequence based upon the video stream,wherein the first data packets further include geographical data and the remote apparatus determines a location associated with the vehicle component control device based on the geographical data,wherein the machine code is assigned from the first dictation or the configured dictation based upon the location associated with the vehicle component control device.
- View Dependent Claims (2, 3)
- - 2. The vehicle component control device of claim 1, wherein the geographical data includes global positioning system (GPS) data.
  - 3. The vehicle component control device of claim 1, wherein a location where the machine code is assigned from the configured dictation is manually set by the user.

4. A vehicle component control device comprising:
- an audio input device that receives an audio signal representing a voice utterance;
  
  a video input device that receives a video signal representative of movement of a user;
  
  a controller configured according to instructions stored in a memory, the controller configured to;
  
  generate first dictation based on the audio signal;
  
  generate a feature signal parameter sequence based on the video signal;
  
  generate configured dictation based on the first dictation and the feature signal parameter sequence;
  
  determine a location associated with the vehicle component control device based on geographical data associated with the vehicle component control device; and
  
  assign machine code for controlling a vehicle component based on the first dictation or the configured dictation based upon the location associated with the vehicle component control device.
- View Dependent Claims (5, 6, 7, 8, 9, 10)
- - 5. The vehicle component control device of claim 4, wherein the video input device is disabled when a signal to brightness ratio is below a predetermined threshold.
  - 6. The vehicle component control device of claim 4, wherein the controller is further configured to combine the audio signal and the video signal into a Moving Picture Experts Group (MPEG) stream according to an MPEG format, and use synchronization data of the MPEG stream to determine a portion of the video signal that corresponds to a portion of the audio signal when generating the configured dictation.
  - 7. The vehicle component control device of claim 4, wherein:
    - the controller is configured to refer to a first set of conversion criteria when generating the first dictation; and
      
      the first set of conversion criteria includes pre-registered data representing a value associated with a user voice.
  - 8. The vehicle component control device of claim 4, wherein the configured dictation is generated based on a predetermined criteria which includes pre-registered data representing a user voice.
  - 9. The vehicle component control device of claim 4, wherein a predetermined setting is assigned by the user.
  - 10. The vehicle components control device of claim 4, wherein the video input device generates the feature signal parameter sequence based upon the video stream by:
    - extracting a sequence of image frames from a predetermined portion of the video stream;
      
      generating a local binary pattern (LBP) from a series of images;
      
      matching the LBP to a feature signal vector stored in the memory;
      
      determining a probability for each of a plurality of candidate prototype words generating the feature signal vector;
      
      selecting a candidate prototype word of the plurality of candidate prototype words of highest probability to be the feature signal parameter sequence.

11. A vehicle component control device comprising:
- an audio input device configured to receive a voice utterance including a plurality of words;
  
  a video input device configured to receive video of lip motion of a user;
  
  a controller configured to generate first data packets including an audio stream representative of the voice utterance and a video stream representative of the lip motion; and
  
  a transceiver for sending the first data packets to a remote apparatus and receiving second data packets including machine code for controlling a vehicle component, the machine code assigned from configured dictation generated based upon the audio stream and the video stream from the remote apparatus,wherein in the configured dictation, at least one word in first dictation generated based upon the audio stream which has a predetermined characteristic has been corrected by a feature signal parameter sequence based upon the video stream,wherein the vehicle component control device configured to receive global position data (GPS) data to determine a location of the vehicle components control device,wherein the machine code is assigned from the first dictation or the configured dictation based upon the location associated with the vehicle component control device.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The vehicle component control device of claim 11, further comprising a memory portion for storing pre-registered data representing a user voice.
  - 13. The vehicle component control device of claim 11, further comprising a memory portion storing an instruction for performing an audio based speech recognition algorithm to convert the audio stream into the first dictation.
  - 14. The vehicle component control device of claim 11, wherein the vehicle component controlled by the vehicle component control device includes one of an air conditioner, a radio, a vehicle navigation system and/or a windshield wiper.
  - 15. The vehicle component control device of claim 11, wherein the location is determined by GPS data included in the first data packets.
  - 16. The vehicle component control device of claim 11, wherein the predetermined characteristic further includes the at least one word has four or less syllables and is less than a predetermined length or time duration.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Original Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Inventors
Takayanagi, Yuichiro, Kusaka, Masashi
Primary Examiner(s)
Le, Thuykhanh

Application Number

US15/976,834
Publication Number

US 20180261222A1
Time in Patent Office

278 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/25   using position of the lips,...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

Method and apparatus for recognizing speech by lip reading

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for recognizing speech by lip reading

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links