METHOD AND APPARATUS FOR USING IMAGE DATA TO AID VOICE RECOGNITION

US 20140350924A1
Filed: 01/27/2014
Published: 11/27/2014
Est. Priority Date: 05/24/2013
Status: Active Grant

First Claim

Patent Images

1. A method performed by a device for using image data to aid in voice recognition, the method comprising:

capturing image data of a vicinity of the device; and

adjusting, based on the image data, a set of parameters for voice recognition performed by the device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A device performs a method for using image data to aid voice recognition. The method includes the device capturing image data of a vicinity of the device and adjusting, based on the image data, a set of parameters for voice recognition performed by the device. The set of parameters for the device performing voice recognition include, but are not limited to: a trigger threshold of a trigger for voice recognition; a set of beamforming parameters; a database for voice recognition; and/or an algorithm for voice recognition, wherein the algorithm can include using noise suppression or using acoustic beamforming.

246 Citations

23 Claims

1. A method performed by a device for using image data to aid in voice recognition, the method comprising:
- capturing image data of a vicinity of the device; and
  
  adjusting, based on the image data, a set of parameters for voice recognition performed by the device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein the a set of parameters comprises at least one of:
    - a database for voice recognition;
      
      oran algorithm for voice recognition, wherein the algorithm includes at least one of;
      
      using noise suppression;
      
      orusing acoustic beamforming.
  - 3. The method of claim 1, wherein the a set of parameters comprises a set of beamforming parameters, and wherein adjusting the set of parameters comprises setting, based on a position of an individual captured in the image data, at least one of:
    - a direction of a microphone beamform;
      
      ora width of a microphone beamform.
  - 4. The method of claim 1 further comprising detecting lip movement from the captured image data, wherein adjusting the set of parameters based on the image data comprises at least one of:
    - adjusting the set of parameters based on the lip movement;
      
      oradjusting the set of parameters based on an individual identified as a speaker from the detected lip movement.
  - 5. The method of claim 4 wherein adjusting the set of parameters based on the lip movement comprises at least one of:
    - adjusting a microphone beamform based on the lip movement;
      
      oradjusting voice activity detection based on the lip movement.
  - 6. The method of claim 1 further comprising determining, from the image data, that the device is within a particular type of environment, wherein the set of parameters is adjusted based on the type of environment.
  - 7. The method of claim 6, wherein the set of parameters comprises a trigger threshold of a trigger for voice recognition and the type of environment is an interior of a motor vehicle, wherein the trigger threshold is adjusted to make the trigger less discriminating when the device is within the interior of a motor vehicle relative to when the device is within another type of environment.
  - 8. The method of claim 7 further comprising detecting, from the image data, a number of persons within the interior of the motor vehicle, wherein the trigger is made less discriminating upon detecting that there is only one person within the motor vehicle relative to detecting that there are multiple people within the motor vehicle.
  - 9. The method of claim 1 further comprising detecting, from the image data, a set of individuals in the vicinity of the device, wherein the set of parameters for voice recognition is adjusted based on the set of individuals.
  - 10. The method of claim 9, wherein the set of parameters comprises a trigger threshold of a trigger for voice recognition, wherein the trigger threshold is adjusted to make the trigger less discriminating when the detected set of individuals contains only a single person relative to when the set of individuals contains multiple persons.
  - 11. The method of claim 9, wherein the set of parameters comprises a trigger threshold of a trigger for voice recognition, further comprising identifying at least one authorized person within the set of individuals to trigger the voice recognition, wherein the trigger threshold is adjusted to make the trigger less discriminating when all persons of the set of individuals are identified as authorized persons relative to when fewer than all of the persons of the set of individuals are identified as authorized persons.
  - 12. The method of claim 9, wherein the set of parameters comprises a trigger threshold of a trigger for voice recognition, further comprising detecting that a person within the set of individuals is gazing at the device, wherein the trigger threshold is adjusted to make the trigger less discriminating when the person is gazing at the device relative to when no one within the set of individuals is detected gazing at the device.
  - 13. The method of claim 12 further comprising determining whether the person gazing at the device is an authorized person to trigger the voice recognition, wherein the trigger is made less discriminating only when the person gazing at the device is an authorized person.

14. A method performed by a device for using image data to aid in voice recognition, the method comprising:
- capturing image data;
  
  receiving first voice data spoken into the device from a first individual and second voice data spoken into the device from a second individual;
  
  associating the first voice data to the first individual and the second voice data to the second individual using the image data;
  
  translating, using a voice recognition process, the first voice data into a first written passage within a document and the second voice data into a second written passage within the document;
  
  associating the first written passage with the first individual using a first annotation within the document that identifies the first individual; and
  
  associating the second written passage with the second individual using a second annotation within the document that identifies the second individual.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 14, wherein the first and second individuals are included within a first group, which has a first set of privileges, the method further comprises:
    - receiving third voice data spoken into the device from a third individual;
      
      associating the third voice data to the third individual using the image data; and
      
      determining that the third individual is included within a second group, which has a second set of privileges.
  - 17. The method of claim 16, wherein the first set of privileges includes a first level of access to the document, and the second set of privileges includes a second level of access to the document, which prevents translation of the third voice data into a corresponding written passage for inclusion within the document.
  - 18. The method of claim 14 further comprising determining from the image data an end of the first voice data for translation into the first written passage and an end of the second voice data for translation into the second written passage.

15. The method of 14, wherein the first annotation comprises a first name, and the second annotation comprises a second name.

19. A device configured for using image data to aid in voice recognition, the device comprising:
- a set of cameras configured for capturing image data;
  
  at least one acoustic transducer configured for receiving voice data;
  
  a voice recognition module for processing the received voice data; and
  
  a processor configured for;
  
  detecting a set of individuals within the image data;
  
  determining from the image data whether at least one person within the set of individuals is gazing at the device; and
  
  adapting processing by the voice recognition module of the voice data based on whether the at least one individual is gazing at the device.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The device of claim 19, wherein the processor is further configured for determining whether the at least one individual gazing at the device includes an authorized user of the device, wherein the processor is configured for adapting processing by the voice recognition module by activating the voice recognition module to process voice data received into the at least one acoustic transducer only if the at least one user is determined to include an authorized user.
  - 21. The device of claim 19, wherein when at least one person within the set of individuals is determined to be gazing at the device, and the processor is configured for adapting processing by the voice recognition module by favoring voice data received from the at least one person gazing at the device relative to voice data received from other persons.
  - 22. The device of claim 21, wherein the processor being configured for favoring voice data received from the at least one person gazing at the device comprises the processor being configured for at least one of:
    - determining a direction to the at least one person and using beam forming to favor voice data received from that direction over voice data received from other directions;
      
      ordetermining a distance to the at least one person and using gradient discrimination to favor voice data received from that distance over voice data received from other distances.
  - 23. The device of claim 19, wherein the processor is further configured for determining that a first person of the set of individuals is a member of a first group with a first set of access privileges to the device and determining that a second person of the set of individuals is a member of a second group with a second set of access privileges to the device that is different from the first set of access privileges, wherein the processor being configured for adapting processing by the voice recognition module comprises the processor being configured for accepting a voice command from the first person but not the second person.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola Mobility LLC (Lenovo Group Ltd.)
Inventors
Zurek, Robert A., Schuster, Adrian M., Wu, Jincheng, Shau, Fu-Lin

Granted Patent

US 9,747,900 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

B60N 2/002   Seats provided with an occu...

G06F 3/013   Eye tracking input arrangem...

G06V 20/59   inside of a vehicle, e.g. r...

G06V 40/166   using acquisition arrangements

G06V 40/18   Eye characteristics, e.g. o...

G06V 40/19   Sensors therefor

G06V 40/20   Movements or behaviour, e.g...

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 15/25   using position of the lips,...

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0208   Noise filtering

G10L 25/78   Detection of presence or ab...

H04R 2430/20   Processing of the output si...

H04R 2460/07   Use of position data from w...

H04R 2499/11   Transducers incorporated or...

METHOD AND APPARATUS FOR USING IMAGE DATA TO AID VOICE RECOGNITION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

246 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND APPARATUS FOR USING IMAGE DATA TO AID VOICE RECOGNITION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

246 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links