Method and apparatus for recognizing speech by lip reading

US 10,424,301 B2
Filed: 09/10/2018
Issued: 09/24/2019
Est. Priority Date: 11/26/2014
Status: Active Grant

First Claim

Patent Images

1. A dictation device comprising:

an audio input device that receives an audio signal representing the voice utterance;

a video input device that receives a video signal representative of movement of a user; and

a controller configured according to instructions stored in a memory to;

generate first dictation based on the audio signal;

generate a feature signal parameter sequence based on the video signal;

generate configured dictation based on the first dictation and the feature signal parameter sequence;

determine a location associated with the dictation device; and

assign machine code for controlling a home appliance based on thefirst dictation or the configured dictation based upon the location associated with the dictation device.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dictation device includes: an audio input device configured to receive a voice utterance including a plurality of words; a video input device configured to receive video of lip motion during the voice utterance; a memory portion; a controller configured according to instructions in the memory portion to generate first data packets including an audio stream representative of the voice utterance and a video stream representative of the lip motion; and a transceiver for sending the first data packets to a server end device and receiving second data packets including combined dictation based upon the audio stream and the video stream from the server end device. In the combined dictation, first dictation generated based upon the audio stream has been corrected by second dictation generated based upon the video stream.

32 Citations

19 Claims

1. A dictation device comprising:
- an audio input device that receives an audio signal representing the voice utterance;
  
  a video input device that receives a video signal representative of movement of a user; and
  
  a controller configured according to instructions stored in a memory to;
  
  generate first dictation based on the audio signal;
  
  generate a feature signal parameter sequence based on the video signal;
  
  generate configured dictation based on the first dictation and the feature signal parameter sequence;
  
  determine a location associated with the dictation device; and
  
  assign machine code for controlling a home appliance based on thefirst dictation or the configured dictation based upon the location associated with the dictation device.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The dictation device of claim 1, wherein the first dictation or the configured dictation is prioritized based upon a predetermined setting set by the user.
  - 3. The dictation device of claim 1, wherein the controller is further configured to not include the video signal in first data packets when a signal to brightness ratio associated with the video signal is below a predetermined threshold.
  - 4. The dictation device of claim 1, wherein the controller is further configured to combine the audio signal and the video signal into a Moving Picture Experts Group (MPEG) stream according to an MPEG format, and use synchronization data of the MPEG stream to determine a portion of the video signal that corresponds to a portion of the audio signal when generating the configured dictation.
  - 5. The dictation device of claim 1, wherein:
    - the controller is configured to refer to a first set of conversion criteria when generating the first dictation; and
      
      the first set of conversion criteria includes pre-registered data representing a value associated with a user voice.
  - 6. The dictation device of claim 1, wherein the configured dictation is generated based on a predetermined criterion which includes pre-registered data representing a user voice.
  - 7. The dictation device of claim 1, wherein the video input device generates the feature signal parameter sequence based upon the video signal by:
    - extracting a sequence of image frames from a predetermined portion of the video signal;
      
      generating a local binary pattern (LBP) from a series of images;
      
      matching the LBP to a feature signal vector stored in the memory;
      
      determining a probability for each of a plurality of candidate prototype words generating the feature signal vector; and
      
      selecting a candidate prototype word of the plurality of candidate prototype words of highest probability to be the configured dictation.

8. A dictation device comprising:
- an audio input device that receives an audio signal representing a voice utterance;
  
  a video input device that receives a video signal representative of movement of a user; and
  
  a controller configured according to instructions stored in a memory to;
  
  generate first dictation based on the audio signal;
  
  generate a feature signal parameter sequence based on the video signal;
  
  generate configured dictation based on the first dictation and the feature signal parameter sequence;
  
  determine a location associated with the dictation device based on positional data obtained by the dictation device; and
  
  assign machine code for controlling a vehicle component based on thefirst dictation or the configured dictation based upon the location associated with the dictation device.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The dictation device of claim 8, wherein the controller is further configured to not include the video signal in first data packets when a signal to brightness ratio associated with the video signal is below a predetermined threshold.
  - 10. The dictation device of claim 8, wherein the controller is further configured to stop a performance of the video input device when a signal to brightness ratio associated with the video signal is below a predetermined threshold.
  - 11. The dictation device of claim 8, wherein the positional data includes global position data (GPS).
  - 12. The dictation device of claim 8, further comprising pre-registered data representing a user voice stored in the memory.
  - 13. The dictation device of claim 8, further comprising a memory portion storing an instruction for performing an audio based speech recognition algorithm to convert the audio signal into a first dictation data.
  - 14. The dictation device of claim 8, wherein the vehicle component controlled by the dictation device includes one of an air conditioner, a radio, a vehicle navigation system and a windshield wiper.

15. A dictation device comprising:
- an audio input device that receives an audio signal representing a voice utterance;
  
  a video input device that receives a video signal representative of movement of a user; and
  
  a controller configured according to instructions stored in a memory to;
  
  generate first dictation based on the audio signal;
  
  generate a feature signal parameter sequence based on the video signal;
  
  generate configured dictation based on the first dictation and the feature signal parameter sequence;
  
  determine a location associated with the dictation device based on geographical data obtained by the dictation device; and
  
  assign machine code for controlling an external device based on thefirst dictation or the configured dictationbased upon the location associated with the dictation device determined based on the geographical data obtained by the dictation device.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The dictation device of claim 15, wherein the controller is further configured to not include the video signal in first data packets when a signal to brightness ratio associated with the video signal is below a predetermined threshold.
  - 17. The dictation device of claim 15, wherein the controller is further configured to stop a performance of the video input device when a signal to brightness ratio associated with the video signal is below a predetermined threshold.
  - 18. The dictation device of claim 15, wherein the external device controlled by the dictation device includes one of an air conditioner, a radio, a vehicle navigation system and a windshield wiper.
  - 19. The dictation device of claim 15, wherein the external device controlled by the dictation device is a home appliance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Original Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Inventors
Takayanagi, Yuichiro, Kusaka, Masashi
Primary Examiner(s)
Le, Thuykhanh

Application Number

US16/126,410
Publication Number

US 20190027148A1
Time in Patent Office

379 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/25   using position of the lips,...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

Method and apparatus for recognizing speech by lip reading

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

32 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for recognizing speech by lip reading

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links