Multisensory speech detection

US 8,862,474 B2
Filed: 09/14/2012
Issued: 10/14/2014
Est. Priority Date: 11/10/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

identifying, by a mobile computing device, a first pose with which the mobile computing device is being held by a user, the first pose being identified from among a plurality of predetermined poses;

initiating, after identifying the first pose, a first audio recording process using a microphone of the mobile computing device;

selecting, based on the first pose identified by the mobile computing device, a first set of one or more parameters from among a plurality of parameters, wherein the first set of one or more parameters define settings that are specific to the first pose and that differentiate between, at least, speech and background noise when the mobile computing device is being held in the first pose;

detecting that the user has started speaking during the first audio recording process;

determining, after the user has started speaking during the first audio recording process and using the first set of one or more parameters, whether the user has stopped speaking during the first audio recording process;

stopping, based on the determining whether the user has stopped speaking during the first audio recording process, the first audio recording process;

identifying, by the mobile computing device, a second pose with which the mobile computing device is being held by the user, the second pose being identified from among the plurality of predetermined poses, and the second pose being distinct from the first pose;

initiating, after identifying the second pose, a second audio recording process using the microphone of the mobile computing device;

selecting, based on the second pose identified by the mobile computing device, a second set of one or more parameters from among the plurality of parameters, wherein the second set of one or more parameters define settings that are specific to the second pose and that differentiate between, at least, speech and background noise when the mobile computing device is being held in the second pose;

detecting that the user has started speaking during the second audio recording process;

determining, after the user has started speaking during the second audio recording process and using the second set of one or more parameters, whether the user has stopped speaking during the second audio recording process; and

stopping, based on the determining whether the user has stopped speaking during the second audio recording process, the second audio recording process.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.

88 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- identifying, by a mobile computing device, a first pose with which the mobile computing device is being held by a user, the first pose being identified from among a plurality of predetermined poses;
  
  initiating, after identifying the first pose, a first audio recording process using a microphone of the mobile computing device;
  
  selecting, based on the first pose identified by the mobile computing device, a first set of one or more parameters from among a plurality of parameters, wherein the first set of one or more parameters define settings that are specific to the first pose and that differentiate between, at least, speech and background noise when the mobile computing device is being held in the first pose;
  
  detecting that the user has started speaking during the first audio recording process;
  
  determining, after the user has started speaking during the first audio recording process and using the first set of one or more parameters, whether the user has stopped speaking during the first audio recording process;
  
  stopping, based on the determining whether the user has stopped speaking during the first audio recording process, the first audio recording process;
  
  identifying, by the mobile computing device, a second pose with which the mobile computing device is being held by the user, the second pose being identified from among the plurality of predetermined poses, and the second pose being distinct from the first pose;
  
  initiating, after identifying the second pose, a second audio recording process using the microphone of the mobile computing device;
  
  selecting, based on the second pose identified by the mobile computing device, a second set of one or more parameters from among the plurality of parameters, wherein the second set of one or more parameters define settings that are specific to the second pose and that differentiate between, at least, speech and background noise when the mobile computing device is being held in the second pose;
  
  detecting that the user has started speaking during the second audio recording process;
  
  determining, after the user has started speaking during the second audio recording process and using the second set of one or more parameters, whether the user has stopped speaking during the second audio recording process; and
  
  stopping, based on the determining whether the user has stopped speaking during the second audio recording process, the second audio recording process.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The computer-implemented method of claim 1, wherein the first set of one or more parameters comprise a first set of one or more speech energy thresholds for audio signals received by the microphone, and the second set of one or more parameters comprise a second set of one or more speech energy thresholds for audio signals received by the microphone, the second set of one or more speech energy thresholds having at least one speech energy threshold that is not contained in the first set of one or more speech energy thresholds and the first set of or more speech energy thresholds having at least one speech energy threshold that is not contained in the second set of one or more speech energy thresholds.
  - 3. The computer-implemented method of claim 2, wherein the speech energy thresholds included in the first and second sets of one or more speech energy thresholds are inversely proportional to a distance from the mobile computing device to the user'"'"'s mouth when the mobile computing devices is held by the user in the pose.
  - 4. The computer-implemented method of claim 1, wherein the first pose that is identified by the mobile computing device comprises a telephone pose that indicates that the mobile computing device is being held up to the user'"'"'s ear.
  - 5. The computer-implemented method of claim 4, wherein the second pose that is identified by the mobile computing device comprises a walkie-talkie pose that indicates that the mobile computing device is being held in front of and within a threshold distance of the user'"'"'s face, and without the mobile device being held up to the user'"'"'s ear.
  - 6. The computer-implemented method of claim 4, wherein the second pose that is identified by the mobile computing device comprises a personal digital assistant (PDA) pose that indicates that the mobile computing device is being held at least a threshold distance away from the user'"'"'s body.
  - 7. The computer-implemented method of claim 6, further comprising:
    - before initiating the second audio recording process and based on the second pose being the PDA pose, determining whether the user has provided input indicating an intention for the second audio recording process to start; and
      
      wherein the second audio recording process is initiated in response to the user being determined to have provided the input.
  - 8. The computer-implemented method of claim 7, wherein the input comprises a particular button on the mobile computing device having been pressed.
  - 9. The computer-implemented method of claim 7, wherein the input comprises a particular gesture from among a plurality of gestures being detected by the mobile computing device.
  - 10. The computer-implemented method of claim 1, further comprising:
    - in response to initiating the first audio recording process, outputting an indication on the mobile computing device that indicates that the first audio recording process has started.
  - 11. The computer-implemented method of claim 10, further comprising:
    - in response to stopping the first audio recording process, outputting another indication on the mobile computing device that indicates that the first audio recording process has stopped.
  - 12. The computer-implemented method of claim 1, wherein the first pose is identified based on information provided by one or more sensors of the mobile computing device, the information indicating one or more of:
    - a detected movement of the mobile computing device, proximity of the mobile computing device to another physical object, and an angle at which the mobile computing device is being held.
  - 13. The computer-implemented method of claim 12, wherein the one or more sensors include one or more of:
    - an accelerometer, a proximity sensor, and a camera.
  - 14. The computer-implemented method of claim 1, further comprising:
    - after stopping the first audio recording process, causing audio signals recorded during the first audio recording process to be converted to text; and
      
      outputting, by the mobile computing device, information based on the converted text.

15. A mobile computing device comprising:
- one or more computer processors;
  
  a pose identifier that is programmed to identify a first pose with which the mobile computing device is being held by a user and a second pose with which the mobile device is being held by the user, the first and second poses being identified from among a plurality of predetermined poses, and the first pose being distinct from the second pose;
  
  a microphone that is programmed to initiate a first audio recording process after identification of the first pose and a second audio recording process after identification of the second pose;
  
  a speech detector that is programmed i) to select a first set of one or more parameters from among a plurality of parameters based on the first pose, wherein the first set of one or more parameters define settings that are specific to the first pose to differentiate between, at least, speech and background noise when the mobile computing device is being held in the first pose, ii) to detect, using the first set of one or more parameters, that the user has started speaking during the first audio recording process, iii) to select a second set of one or more parameters from among the plurality of parameters based on the second pose, wherein the second set of one or more parameters define settings that are specific to the second pose to differentiate between, at least, speech and background noise when the mobile computing device is being held in the second pose, and iv) to detect, using the second set of one or more parameters, that the user has started speaking during the second audio recording process; and
  
  a speech endpointer that is programmed i) to determine, after the user has started speaking and using the first set of one or more parameters, whether the user has stopped speaking during the first audio recording process, ii) to stop the first audio recording process based on determining that the user has stopped speaking during the first audio recording process, iii) to determine, after the user has started speaking and using the second set of one or more parameters, whether the user has stopped speaking during the second audio recording process, and iv) to stop the second audio recording process based on determining that the user has stopped speaking during the second audio recording process.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The mobile computing device of claim 15, wherein the first set of one or more parameters comprise a first set of one or more speech energy thresholds for audio signals received by the microphone, and the second set of one or more parameters comprise a second set of one or more speech energy thresholds for audio signals received by the microphone, the second set of one or more speech energy thresholds having at least one speech energy threshold that is not contained in the first set of one or more speech energy thresholds and the first set of or more speech energy thresholds having at least one speech energy threshold that is not contained in the second set of one or more speech energy thresholds.
  - 17. The mobile computing device of claim 16, wherein the speech energy thresholds included in the first and second sets of one or more speech energy thresholds are inversely proportional to a distance from the mobile computing device to the user'"'"'s mouth when the mobile computing devices is held by the user in the pose.
  - 18. The mobile computing device of claim 15, wherein the first pose that is identified by the mobile computing device comprises a telephone pose that indicates that the mobile computing device is being held up to the user'"'"'s ear.
  - 19. The mobile computing device of claim 18, wherein the second pose that is identified by the mobile computing device comprises a walkie-talkie pose that indicates that the mobile computing device is being held in front of and within a threshold distance of the user'"'"'s face, and without the mobile device being held up to the user'"'"'s ear.

20. A computer program product embodied in a computer readable storage device storing instructions that, when executed, cause one or more computing devices to perform operations comprising:
- identifying a first pose with which the mobile computing device is being held by a user, the first pose being identified from among a plurality of predetermined poses;
  
  initiating, after identifying the first pose, a first audio recording process using a microphone of the mobile computing device;
  
  selecting, based on the first pose identified by the mobile computing device, a first set of one or more parameters from among a plurality of parameters, wherein the first set of one or more parameters define settings that are specific to the first pose and that differentiate between, at least, speech and background noise when the mobile computing device is being held in the first pose;
  
  detecting that the user has started speaking during the first audio recording process;
  
  determining, after the user has started speaking during the first audio recording process and using the first set of one or more parameters, whether the user has stopped speaking during the first audio recording process;
  
  stopping, based on the determining whether the user has stopped speaking during the first audio recording process, the first audio recording process;
  
  identifying a second pose with which the mobile computing device is being held by the user, the second pose being identified from among the plurality of predetermined poses, and the second pose being distinct from the first pose;
  
  initiating, after identifying the second pose, a second audio recording process using the microphone of the mobile computing device;
  
  selecting, based on the second pose identified by the mobile computing device, a second set of one or more parameters from among the plurality of parameters, wherein the second set of one or more parameters define settings that are specific to the second pose and that differentiate between, at least, speech and background noise when the mobile computing device is being held in the second pose;
  
  detecting that the user has started speaking during the second audio recording process;
  
  determining, after the user has started speaking during the second audio recording process and using the second set of one or more parameters, whether the user has stopped speaking during the second audio recording process; and
  
  stopping, based on the determining whether the user has stopped speaking during the second audio recording process, the second audio recording process.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Burke, Dave, LeBeau, Michael J., Gianno, Konrad, Kristjansson, Trausti, Jitkoff, John Nicholas, Senior, Andrew W.
Primary Examiner(s)
Breene, John
Assistant Examiner(s)
BAILEY, COREY M

Application Number

US13/618,720
Publication Number

US 20130013315A1
Time in Patent Office

760 Days
Field of Search

704/233, 700/67, 700/69, 700/70, 700/83, 700/84, 700/85
US Class Current

704/270
CPC Class Codes

G06F 3/0346   with detection of the devic...

G06F 3/167   Audio in a user interface, ...

G10L 15/10   using distance or distortio...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 25/21   the extracted parameters be...

G10L 25/78   Detection of presence or ab...

H04M 1/72454   according to context-relate...

H04M 2250/12   including a sensor for meas...

H04M 2250/74   with voice recognition mean...

H04R 1/08   Mouthpieces; Microphones; A...

H04W 4/026   using orientation informati...

Multisensory speech detection

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

88 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multisensory speech detection

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

88 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links