Multisensory speech detection

US 10,714,120 B2
Filed: 06/25/2018
Issued: 07/14/2020
Est. Priority Date: 11/10/2008
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

detecting, by data processing hardware of a mobile device, movement of the mobile device from a first pose to a second pose, the second pose corresponding to the mobile device in a talking pose proximate to a part of a user of the mobile device;

in response to detecting the movement of the mobile device from the first pose to the second pose;

initiating, by the data processing hardware, execution of an audio recording process using a microphone of the mobile device; and

notifying, by the data processing hardware, the user of the mobile device when execution of the audio recording process starts by;

generating a visual notification that indicates to the user when execution of the audio recording process starts; and

displaying the visual notification on a user interface of the mobile device, wherein the visual notification comprises a microphone graphic;

receiving, at the data processing hardware, a speech utterance of the user captured by the microphone during execution of the audio recording process; and

generating, by the data processing hardware, a transcription of the speech utterance captured by the microphone during the audio recording process.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.

92 Citations

28 Claims

1. A method comprising:
- detecting, by data processing hardware of a mobile device, movement of the mobile device from a first pose to a second pose, the second pose corresponding to the mobile device in a talking pose proximate to a part of a user of the mobile device;
  
  in response to detecting the movement of the mobile device from the first pose to the second pose;
  
  initiating, by the data processing hardware, execution of an audio recording process using a microphone of the mobile device; and
  
  notifying, by the data processing hardware, the user of the mobile device when execution of the audio recording process starts by;
  
  generating a visual notification that indicates to the user when execution of the audio recording process starts; and
  
  displaying the visual notification on a user interface of the mobile device, wherein the visual notification comprises a microphone graphic;
  
  receiving, at the data processing hardware, a speech utterance of the user captured by the microphone during execution of the audio recording process; and
  
  generating, by the data processing hardware, a transcription of the speech utterance captured by the microphone during the audio recording process.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein receiving the speech utterance of the user comprises:
    - receiving audio input data captured by the microphone during execution of the audio recording process;
      
      determining whether the audio input data captured by the microphone exceeds a speech energy threshold; and
      
      when the audio input data captured by the microphone exceeds the speech energy threshold, detecting that the audio input data includes the speech utterance of the user.
  - 3. The method of claim 1, further comprising, after initiating execution of the audio recording process:
    - detecting, by the data processing hardware, movement of the mobile device away from the second pose; and
      
      in response to detecting the movement of the mobile device away from the second pose, ceasing, by the data processing hardware, execution of the audio recording process.
  - 4. The method of claim 3, wherein detecting movement of the mobile device away from the second pose comprises detecting that the mobile device is beyond a predetermined distance from the part of the user.
  - 5. The method of claim 4, wherein detecting that the mobile device is beyond the predetermined distance comprises determining that a distance between a proximity sensor of the mobile device and the part of the user exceeds the predetermined distance.
  - 6. The method of claim 1, further comprising, in response to initiating execution of the audio recording process:
    - determining, by the data processing hardware, a speech energy threshold for comparing to the speech utterance of the user received during execution of the audio recording process; and
      
      ceasing, by the data processing hardware, execution of the audio recording process when an energy of the speech utterance of the user received during the audio recording process is less than the speech energy threshold.
  - 7. The method of claim 1, wherein generating the transcription of the received audio data comprises:
    - determining when execution of the audio recording process has ceased; and
      
      generating the transcription of the received audio data when execution of the audio recording process has ceased.
  - 8. The method of claim 1, wherein detecting movement of the mobile device from the first pose to the second pose comprises detecting a proximity of the mobile device is less than a predetermined distance away from the part of the user.
  - 9. The method of claim 1, wherein detecting movement of the mobile device from the first pose to the second pose comprises:
    - detecting an instantaneous acceleration of the mobile device has exceeded an instantaneous acceleration threshold; and
      
      detecting a proximity of the mobile device is less than a predetermined distance away from the part of the user.
  - 10. The method of claim 9, further comprising, after detecting the instantaneous acceleration for the mobile device has exceeded the instantaneous acceleration threshold, activating, by the data processing hardware, a proximity sensor of the mobile device, the proximity sensor configured to detect a distance between the mobile device and an object.
  - 11. The method of claim 1, further comprising, in response to receiving the speech utterance, displaying a representation of the speech utterance of the user being recorded while the user is speaking during execution of the audio recording process.
  - 12. The method of claim 11, wherein displaying the representation of the speech utterance of the user being recorded comprises displaying a waveform graphic.
  - 13. The method of claim 1, further comprising, in response to initiating execution of the audio recording process:
    - generating, by the data processing hardware, an audio notification that indicates to the user that the audio recording process is executing; and
      
      outputting, by the data processing hardware, the audio notification through an audio output device of the mobile device.
  - 14. The method of claim 1, further comprising, in response to receiving the speech utterance of the user captured by the microphone during execution of the audio recording process:
    - generating, by the data processing hardware, a visual notification that indicates detection of the speech utterance of the user; and
      
      displaying, by the data processing hardware, the visual notification on a user interface of the mobile device.

15. A mobile device comprising:
- data processing hardware; and
  
  memory hardware in communication with the data processing hardware and storing instructions that when executed, cause the data processing hardware to perform operations comprising;
  
  detecting movement of the mobile device from a first pose to a second pose, the second pose corresponding to the mobile device in a talking pose proximate to a part of a user of the mobile device;
  
  in response to detecting the movement of the mobile device from the first pose to the second pose;
  
  initiating execution of an audio recording process using a microphone of the mobile device;
  
  notifying the user of the mobile device when execution of the audio recording process starts by;
  
  generating a visual notification that indicates to the user when execution of the audio recording process starts; and
  
  displaying the visual notification on a user interface of the mobile device, wherein the visual notification comprises a microphone graphic;
  
  receiving a speech utterance of the user captured by the microphone during execution of the audio recording process; and
  
  generating a transcription of the speech utterance captured by the microphone during the audio recording process.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 16. The mobile device of claim 15, wherein receiving the speech utterance of the user comprises:
    - receiving audio input data captured by the microphone during execution of the audio recording process;
      
      determining whether the audio input data captured by the microphone exceeds a speech energy threshold; and
      
      when the audio input data captured by the microphone exceeds the speech energy threshold, detecting that the audio input data includes the speech utterance of the user.
  - 17. The mobile device of claim 15, wherein the operations further comprise, after initiating execution of the audio recording process:
    - detecting movement of the mobile device away from the second pose; and
      
      in response to detecting the movement of the mobile device away from the second pose, ceasing execution of the audio recording process.
  - 18. The mobile device of claim 17, wherein detecting movement of the mobile device away from the second pose comprises detecting that the mobile device is beyond a predetermined distance from the part of the user.
  - 19. The mobile device of claim 17, wherein detecting that the mobile device is beyond the predetermined distance comprises determining that a distance between a proximity sensor of the mobile device and the part of the user exceeds the predetermined distance.
  - 20. The mobile device of claim 15, wherein the operations further comprise, in response to initiating execution of the audio recording process:
    - determining a speech energy threshold for comparing to the speech utterance of the user received during execution of the audio recording process; and
      
      ceasing execution of the audio recording process when an energy of the speech utterance of the user received during the audio recording process is less than the speech energy threshold.
  - 21. The mobile device of claim 15, wherein generating the transcription of the received audio data comprises:
    - determining when execution of the audio recording process has ceased; and
      
      generating the transcription of the received audio data when execution of the audio recording process has ceased.
  - 22. The mobile device of claim 15, wherein detecting movement of the mobile device from the first pose to the second pose comprises detecting a proximity of the mobile device is less than a predetermined distance away from the part of the user.
  - 23. The mobile device of claim 15, wherein detecting movement of the mobile device from the first pose to the second pose comprises:
    - detecting an instantaneous acceleration of the mobile device has exceeded an instantaneous acceleration threshold; and
      
      detecting a proximity of the mobile device is less than a predetermined distance away from the part of the user.
  - 24. The mobile device of claim 23, wherein the operations further comprise, after detecting the instantaneous acceleration for the mobile device has exceeded the instantaneous acceleration threshold, activating a proximity sensor of the mobile device, the proximity sensor configured to detect a distance between the mobile device and the part of the user.
  - 25. The mobile device of claim 15, further comprising, in response to receiving the speech utterance, displaying a representation of the speech utterance of the user being recorded while the user is speaking during execution of the audio recording process.
  - 26. The mobile device of claim 25, wherein displaying the representation of the speech utterance of the user being recorded comprises displaying a waveform graphic.
  - 27. The mobile device of claim 15, wherein the operations further comprise, in response to initiating execution of the audio recording process:
    - generating an audio notification that indicates to the user that the audio recording process is executing; and
      
      outputting the audio notification through an audio output device of the mobile device.
  - 28. The mobile device of claim 15, wherein the operations further comprise, in response to receiving the speech utterance of the user captured by the microphone during execution of the audio recording process:
    - generating a visual notification that indicates detection of the speech utterance of the user; and
      
      displaying the visual notification on a user interface of the mobile device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Burke, Dave, Lebeau, Michael J., Gianno, Konrad, Kristjansson, Trausti T., Jitkoff, John Nicholas, Senior, Andrew W.
Primary Examiner(s)
Chawan, Vijay B

Application Number

US16/017,580
Publication Number

US 20180308510A1
Time in Patent Office

750 Days
Field of Search

704270, 704233, 704231, 704235, 704275, 7042701, 345156, 345158, 345 1401
US Class Current
CPC Class Codes

G06F 3/0346   with detection of the devic...

G06F 3/167   Audio in a user interface, ...

G10L 15/10   using distance or distortio...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 25/21   the extracted parameters be...

G10L 25/78   Detection of presence or ab...

H04M 1/72454   according to context-relate...

H04M 2250/12   including a sensor for meas...

H04M 2250/74   with voice recognition mean...

H04R 1/08   Mouthpieces; Microphones; A...

H04W 4/026   using orientation informati...

Multisensory speech detection

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

92 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Multisensory speech detection

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

92 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links