Multisensory speech detection
First Claim
Patent Images
1. A computer-implemented method comprising:
- receiving, by a given mobile device, audio data corresponding to a user utterance;
while receiving the audio data corresponding to the user utterance, determining, by the given mobile device, that the given mobile device has changed position from a first pose to a second pose;
in response to determining that the given mobile device has changed position from the first pose to the second pose, determining endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose;
using the endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose, endpointing the received audio data;
generating, by an automated speech recognizer, a transcription of the endpointed audio data; and
providing, for output by the given mobile device, the transcription.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.
95 Citations
21 Claims
-
1. A computer-implemented method comprising:
-
receiving, by a given mobile device, audio data corresponding to a user utterance; while receiving the audio data corresponding to the user utterance, determining, by the given mobile device, that the given mobile device has changed position from a first pose to a second pose; in response to determining that the given mobile device has changed position from the first pose to the second pose, determining endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose; using the endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose, endpointing the received audio data; generating, by an automated speech recognizer, a transcription of the endpointed audio data; and providing, for output by the given mobile device, the transcription. - View Dependent Claims (2, 3, 4, 5, 6, 7, 21)
-
-
8. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, by a given mobile device, audio data corresponding to a user utterance; while receiving the audio data corresponding to the user utterance, determining, by the given mobile device, that the given mobile device has changed position from a first pose to a second pose; in response to determining that the given mobile device has changed position from the first pose to the second pose, determining endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose; using the endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose, endpointing the received audio; generating, by an automated speech recognizer, a transcription of the endpointed audio data; and providing, for output by the given mobile device, the transcription. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving, by a given mobile device, audio data corresponding to a user utterance; while receiving the audio data corresponding to the user utterance, determining, by the given mobile device, that the given mobile device has changed position from a first pose to a second pose; in response to determining that the given mobile device has changed position from the first pose to the second pose, determining endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose; using the endpointing parameters for endpointing audio data received by a mobile device changing from the first pose to the second pose, endpointing the received audio data; generating, by an automated speech recognizer, a transcription of the endpointed audio data; and providing, for output by the given mobile device, the transcription. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification