Multisensory speech detection

US 10,020,009 B1
Filed: 12/28/2016
Issued: 07/10/2018
Est. Priority Date: 11/10/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, by a given mobile device, audio data corresponding to a user utterance;

while receiving the audio data corresponding to the user utterance, determining, by the given mobile device, that the given mobile device has changed position from a first pose to a second pose;

in response to determining that the given mobile device has changed position from the first pose to the second pose, determining endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose;

using the endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose, endpointing the received audio data;

generating, by an automated speech recognizer, a transcription of the endpointed audio data; and

providing, for output by the given mobile device, the transcription.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.

95 Citations

View as Search Results

21 Claims

1. A computer-implemented method comprising:
- receiving, by a given mobile device, audio data corresponding to a user utterance;
  
  while receiving the audio data corresponding to the user utterance, determining, by the given mobile device, that the given mobile device has changed position from a first pose to a second pose;
  
  in response to determining that the given mobile device has changed position from the first pose to the second pose, determining endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose;
  
  using the endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose, endpointing the received audio data;
  
  generating, by an automated speech recognizer, a transcription of the endpointed audio data; and
  
  providing, for output by the given mobile device, the transcription.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 21)
- - 2. The method of claim 1, wherein determining that tat the given mobile device has changed position from a first pose to a second pose comprises:
    - determining that an angle of the given mobile device relative to a reference plane has changed from a first angle to a second angle.
  - 3. The method of claim 1, wherein determining that the given mobile device has changed position from a first pose to a second pose comprises:
    - determining that a distance between the given mobile device and a user of the mobile device has changed from a first distance to a second distance.
  - 4. The method of claim 1, wherein determining endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose comprisesdetermining a speech energy threshold for endpointing audio data received by a mobile device changing from the first pose to the second pose.
  - 5. The method of claim 1, wherein:
    - the second pose is a walkie-talkie pose in which the mobile device operates in half-duplex, anddetermining the endpointing parameters comprises detecting the selection of a talk button on the mobile device.
  - 6. The method of claim 1, comprising:
    - in response to endpointing the audio data, generating a user interface indicating that the audio data has been endpointed; and
      
      providing, for display by the given mobile device, the user interface.
  - 7. The method of claim 1, comprising:
    - after generating the transcription, generating, by the given mobile device, a user interface that indicates a recommended pose; and
      
      providing, for display by the given mobile device, the user interface.
  - 21. The method of claim 1, comprising:
    - receiving, by the given mobile device, additional audio data corresponding to an additional user utterance;
      
      while receiving the additional audio data corresponding to the additional user utterance, determining, by the given mobile device, that the given mobile device has changed position from the second pose to a third pose;
      
      in response to determining that the mobile device has changed position from the second pose to the third pose, determining additional, different endpointing parameters for endpointing audio data received by a mobile device changing from the second pose to the third pose;
      
      using the additional endpointing parameters for endpointing audio data received by a mobile device changing from the second pose to the third pose, endpointing the received additional audio data;
      
      generating, by the automated speech recognizer, an additional transcription of the endpointed additional audio data; and
      
      providing, for output by the given mobile device, the additional transcription.

8. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by a given mobile device, audio data corresponding to a user utterance;
  
  while receiving the audio data corresponding to the user utterance, determining, by the given mobile device, that the given mobile device has changed position from a first pose to a second pose;
  
  in response to determining that the given mobile device has changed position from the first pose to the second pose, determining endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose;
  
  using the endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose, endpointing the received audio;
  
  generating, by an automated speech recognizer, a transcription of the endpointed audio data; and
  
  providing, for output by the given mobile device, the transcription.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein determining that given the mobile device has changed position from a first pose to a second pose comprises:
    - determining that an angle of the given mobile device relative to a reference plane has changed from a first angle to a second angle.
  - 10. The system of claim 8, wherein determining that the given mobile device has changed position from a first pose to a second pose comprises:
    - determining that a distance between the given mobile device and a user of the mobile device has changed from a first distance to a second distance.
  - 11. The system of claim 8, wherein determining the endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose comprisesdetermining a speech energy threshold for endpointing audio data received by a mobile device changing from the first pose to the second pose.
  - 12. The system of claim 8, wherein the operations further comprise:
    - the second pose is a walkie-talkie pose in which the mobile device operates in half-duplex, anddetermining the endpointing parameters comprises detecting the selection of a talk button on the mobile device.
  - 13. The system of claim 8, wherein the operations further comprise:
    - in response to endpointing the audio data, generating a user interface indicating that the audio data has been endpointed; and
      
      providing, for display by the given mobile device, the user interface.
  - 14. The system of claim 8, wherein the operations further comprise:
    - after generating the transcription, generating, by the given mobile device, a user interface that indicates a recommended pose; and
      
      providing, for display by the given mobile device, the user interface.

15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, by a given mobile device, audio data corresponding to a user utterance;
  
  while receiving the audio data corresponding to the user utterance, determining, by the given mobile device, that the given mobile device has changed position from a first pose to a second pose;
  
  in response to determining that the given mobile device has changed position from the first pose to the second pose, determining endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose;
  
  using the endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose, endpointing the received audio data;
  
  generating, by an automated speech recognizer, a transcription of the endpointed audio data; and
  
  providing, for output by the given mobile device, the transcription.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The medium of claim 15, wherein determining that the given mobile device has changed position from a first pose to a second pose comprises:
    - determining that an angle of the given mobile device relative to a reference plane has changed from a first angle to a second angle.
  - 17. The medium of claim 15, wherein determining that the given mobile device has changed position from a first pose to a second pose comprises:
    - determining that a distance between the given mobile device and a user of the mobile device has changed from a first distance to a second distance.
  - 18. The medium of claim 15, wherein determining the endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose comprisesdetermining a speech energy threshold for endpointing audio data received by a mobile device changing from the first pose to the second pose.
  - 19. The medium of claim 15, wherein the operations further comprise:
    - the second pose is a walkie-talkie pose in which the mobile device operates in half-duplex, anddetermining the endpointing parameters comprises detecting the selection of a talk button on the mobile device.
  - 20. The medium of claim 15, wherein the operations further comprise:
    - in response to endpointing the audio data, generating a user interface indicating that the audio data has been endpointed; and
      
      providing, for display by the given mobile device, the user interface.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Burke, Dave, LeBeau, Michael J., Gianno, Konrad, Kristjansson, Trausti T., Jitkoff, John Nicholas, Senior, Andrew W.
Primary Examiner(s)
Chawan, Vijay B

Application Number

US15/392,448
Time in Patent Office

559 Days
Field of Search

704246, 704270, 7042701, 704275, 704 2, 704 9, 704235, 345156, 345 22, 345419, 345633, 4554141, 455563, 455567, 455566, 348 51, 348734, 348745, 379 45
US Class Current
CPC Class Codes

G06F 3/0346   with detection of the devic...

G06F 3/167   Audio in a user interface, ...

G10L 15/10   using distance or distortio...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 25/21   the extracted parameters be...

G10L 25/78   Detection of presence or ab...

H04M 1/72454   according to context-relate...

H04M 2250/12   including a sensor for meas...

H04M 2250/74   with voice recognition mean...

H04R 1/08   Mouthpieces; Microphones; A...

H04W 4/026   using orientation informati...

Multisensory speech detection

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

95 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Multisensory speech detection

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

95 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links