ROBUST END-POINTING OF SPEECH SIGNALS USING SPEAKER RECOGNITION
First Claim
Patent Images
1. A method for identifying a start-point or an end-point of a spoken user request,the method comprising:
- at an electronic device;
receiving a stream of audio comprising the spoken user request;
determining a first likelihood that the stream of audio comprises user speech;
determining a second likelihood that the stream of audio comprises user speech spoken by an authorized user of the electronic device; and
identifying the start-point or the end-point of the spoken user request based at least in part on the first likelihood and the second likelihood.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and processes for robust end-pointing of speech signals using speaker recognition are provided. In one example process, a stream of audio having a spoken user request can be received. A first likelihood that the stream of audio includes user speech can be determined. A second likelihood that the stream of audio includes user speech spoken by an authorized user can be determined. A start-point or an end-point of the spoken user request can be determined based at least in part on the first likelihood and the second likelihood.
-
Citations
24 Claims
-
1. A method for identifying a start-point or an end-point of a spoken user request,
the method comprising: at an electronic device; receiving a stream of audio comprising the spoken user request; determining a first likelihood that the stream of audio comprises user speech; determining a second likelihood that the stream of audio comprises user speech spoken by an authorized user of the electronic device; and identifying the start-point or the end-point of the spoken user request based at least in part on the first likelihood and the second likelihood. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A method for identifying a start-point or an end-point of a spoken user request, the method comprising:
at an electronic device; receiving a stream of audio comprising the spoken user request; determining a first likelihood that the stream of audio comprises user speech based at least in part on an energy level of the stream of audio; in response to the energy level exceeding a threshold energy level for longer than a threshold duration, performing speaker authentication on the stream of audio to determine a second likelihood that the stream of audio comprises speech spoken by an authorized user of the electronic device; and identifying the start-point or the end-point of the spoken user request based at least in part on the first likelihood and the second likelihood. - View Dependent Claims (10, 11, 12, 13)
-
14. A method for identifying a start-point or an end-point of a spoken user request, the method comprising:
at an electronic device; receiving a signal to begin recording an audio input, wherein the audio input comprises the spoken user request; determining a baseline energy level of the audio input based on an energy level of a first portion of the audio input; determining a first likelihood that the audio input comprises user speech based on an energy level of a second portion of the audio input; in response to the baseline energy level exceeding a threshold energy level, performing speaker authentication on the second portion of the audio input to determine a second likelihood that the audio input comprises speech spoken by an authorized user of the electronic device; and identifying the start-point or the end-point of the spoken user request based at least in part on the first likelihood and the second likelihood. - View Dependent Claims (15, 16, 17, 18)
-
19. A non-transitory computer-readable storage medium comprising instructions for causing one or more processors to:
-
receive a stream of audio comprising the spoken user request; determine a first likelihood that the stream of audio comprises user speech; determine a second likelihood that the stream of audio comprises user speech spoken by an authorized user; and identify the start-point or the end-point of the spoken user request based at least in part on the first likelihood and the second likelihood. - View Dependent Claims (20, 21)
-
-
22. An electronic device comprising:
-
one or more processors; memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving a stream of audio comprising the spoken user request; determining a first likelihood that the stream of audio comprises user speech; determining a second likelihood that the stream of audio comprises user speech spoken by an authorized user; and identifying the start-point or the end-point of the spoken user request based at least in part on the first likelihood and the second likelihood. - View Dependent Claims (23, 24)
-
Specification