Voice trigger for a digital assistant

US 10,199,051 B2
Filed: 02/07/2014
Issued: 02/05/2019
Est. Priority Date: 02/07/2013
Status: Active Grant

First Claim

Patent Images

1. A method for operating a voice trigger, performed at an electronic device including one or more processors and memory storing instructions for execution by the one or more processors, the method comprising:

determining, based on comparing an amount of light detected on at least a front surface of the electronic device to a threshold amount of light, whether to operate the voice trigger in a standby mode or in a listening mode;

in accordance with a determination to operate the voice trigger in the listening mode;

receiving a sound input;

generating an input representation of the sound input, wherein the input representation represents audio signatures of the sound input;

determining whether at least a portion of the sound input corresponds to a predetermined type of sound;

upon a determination that at least a portion of the sound input corresponds to the predetermined type, determining whether the sound input includes predetermined content based on comparing of the input representation of the sound input to one or more reference representations;

upon a determination that the sound input includes the predetermined content, generating a control signal comprising instructions to initiate a speech-based service; and

initiating the speech-based service based on the control signal; and

in accordance with a determination to operate the voice trigger in the standby mode, forgoing initiating the speech-based service based on received sound input.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for operating a voice trigger is provided. In some implementations, the method is performed at an electronic device including one or more processors and memory storing instructions for execution by the one or more processors. The method includes receiving a sound input. The sound input may correspond to a spoken word or phrase, or a portion thereof. The method includes determining whether at least a portion of the sound input corresponds to a predetermined type of sound, such as a human voice. The method includes, upon a determination that at least a portion of the sound input corresponds to the predetermined type, determining whether the sound input includes predetermined content, such as a predetermined trigger word or phrase. The method also includes, upon a determination that the sound input includes the predetermined content, initiating a speech-based service, such as a voice-based digital assistant.

2949 Citations

48 Claims

1. A method for operating a voice trigger, performed at an electronic device including one or more processors and memory storing instructions for execution by the one or more processors, the method comprising:
- determining, based on comparing an amount of light detected on at least a front surface of the electronic device to a threshold amount of light, whether to operate the voice trigger in a standby mode or in a listening mode;
  
  in accordance with a determination to operate the voice trigger in the listening mode;
  
  receiving a sound input;
  
  generating an input representation of the sound input, wherein the input representation represents audio signatures of the sound input;
  
  determining whether at least a portion of the sound input corresponds to a predetermined type of sound;
  
  upon a determination that at least a portion of the sound input corresponds to the predetermined type, determining whether the sound input includes predetermined content based on comparing of the input representation of the sound input to one or more reference representations;
  
  upon a determination that the sound input includes the predetermined content, generating a control signal comprising instructions to initiate a speech-based service; and
  
  initiating the speech-based service based on the control signal; and
  
  in accordance with a determination to operate the voice trigger in the standby mode, forgoing initiating the speech-based service based on received sound input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, wherein said determining whether the sound input corresponds to a predetermined type of sound is performed by a first sound detector, wherein said determining whether the sound input includes predetermined content is performed by a second sound detector, and wherein the first sound detector consumes less power while operating than the second sound detector.
  - 3. The method of claim 2, wherein the second sound detector is initiated in response to a determination by the first sound detector that the sound input corresponds to the predetermined type.
  - 4. The method of claim 2, wherein the second sound detector is operated for at least a predetermined amount of time after the determination by the first sound detector that the sound input corresponds to the predetermined type.
  - 5. The method of claim 1, wherein the predetermined type is a human voice and the predetermined content is one or more words.
  - 6. The method of claim 1, wherein the predetermined content is one or more predetermined phonemes.
  - 7. The method of claim 6, wherein the one or more predetermined phonemes constitute at least one word.
  - 8. The method of claim 1, further comprising, prior to determining whether the sound input corresponds to the predetermined type of sound, determining whether the sound input satisfies a predetermined condition.
  - 9. The method of claim 8, wherein the predetermined condition is an amplitude threshold.
  - 10. The method of claim 8, wherein said determining whether the sound input satisfies a predetermined condition is performed by a third sound detector, wherein the third sound detector consumes less power while operating than a first sound detector, the first sound detector being configured to determine whether the sound input corresponds to the predetermined type of sound.
  - 11. The method of claim 1, further comprising:
    - storing at least a portion of the sound input in memory; and
      
      providing the portion of the sound input to the speech-based service once the speech-based service is initiated.
  - 12. The method of claim 1, further comprising determining whether the sound input corresponds to a voice of a particular user.
  - 13. The method of claim 12, wherein the speech-based service is initiated upon a determination that the sound input includes the predetermined content and that the sound input corresponds to the voice of the particular user.
  - 14. The method of claim 13, wherein the speech-based service is initiated in a limited access mode upon a determination that the sound input includes the predetermined content and that the sound input does not correspond to the voice of the particular user.
  - 15. The method of claim 13, further comprising, upon a determination that the sound input corresponds to the voice of the particular user, outputting a voice prompt including a name of the particular user.
  - 16. The method of claim 1, further comprising:
    - determining whether the electronic device is in a predetermined orientation; and
      
      upon a determination that the electronic device is in the predetermined orientation, activating a predetermined mode of the voice trigger.

17. A non-transitory computer-readable storage medium, storing one or more programs for execution by one or more processors of an electronic device, the one or more programs including instructions for:
- determining, based on comparing an amount of light detected on at least a front surface of the electronic device to a threshold amount of light, whether to operate a voice trigger in a standby mode or in a listening mode;
  
  in accordance with a determination to operate the voice trigger in the listening mode;
  
  receiving a sound input;
  
  generating an input representation of the sound input, wherein the input representation represents audio signatures of the sound input;
  
  determining whether at least a portion of the sound input corresponds to a predetermined type of sound;
  
  upon a determination that at least a portion of the sound input corresponds to the predetermined type, determining whether the sound input includes predetermined content based on comparing of the input representation of the sound input to one or more reference representations;
  
  upon a determination that the sound input includes the predetermined content, generating a control signal comprising instructions to initiate a speech based service; and
  
  initiating the speech-based service based on the control signal; and
  
  in accordance with a determination to operate the voice trigger in the standby mode, forgoing initiating the speech-based service based on received sound input.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 20. The non-transitory computer-readable storage medium of claim 17, wherein said determining whether the sound input corresponds to a predetermined type of sound is performed by a first sound detector, wherein said determining whether the sound input includes predetermined content is performed by a second sound detector, and wherein the first sound detector consumes less power while operating than the second sound detector.
  - 21. The non-transitory computer-readable storage medium of claim 20, wherein the second sound detector is initiated in response to a determination by the first sound detector that the sound input corresponds to the predetermined type.
  - 22. The non-transitory computer-readable storage medium of claim 20, wherein the second sound detector is operated for at least a predetermined amount of time after the determination by the first sound detector that the sound input corresponds to the predetermined type.
  - 23. The non-transitory computer-readable storage medium of claim 17, wherein the predetermined type is a human voice and the predetermined content is one or more words.
  - 24. The non-transitory computer-readable storage medium of claim 17, wherein the predetermined content is one or more predetermined phonemes.
  - 25. The non-transitory computer-readable storage medium of claim 24, wherein the one or more predetermined phonemes constitute at least one word.
  - 26. The non-transitory computer-readable storage medium of claim 17, the one or more programs further including instructions for, prior to determining whether the sound input corresponds to the predetermined type of sound, determining whether the sound input satisfies a predetermined condition.
  - 27. The non-transitory computer-readable storage medium of claim 26, wherein the predetermined condition is an amplitude threshold.
  - 28. The non-transitory computer-readable storage medium of claim 26, wherein said determining whether the sound input satisfies a predetermined condition is performed by a third sound detector, wherein the third sound detector consumes less power while operating than a first sound detector, the first sound detector being configured to determine whether the sound input corresponds to the predetermined type of sound.
  - 29. The non-transitory computer-readable storage medium of claim 17, the one or more programs further including instructions for:
    - storing at least a portion of the sound input in memory; and
      
      providing the portion of the sound input to the speech-based service once the speech-based service is initiated.
  - 30. The non-transitory computer-readable storage medium of claim 17, the one or more programs further including instructions for determining whether the sound input corresponds to a voice of a particular user.
  - 31. The non-transitory computer-readable storage medium of claim 30, wherein the speech-based service is initiated upon a determination that the sound input includes the predetermined content and that the sound input corresponds to the voice of the particular user.
  - 32. The non-transitory computer-readable storage medium of claim 31, wherein the speech-based service is initiated in a limited access mode upon a determination that the sound input includes the predetermined content and that the sound input does not correspond to the voice of the particular user.
  - 33. The non-transitory computer-readable storage medium of claim 31, wherein the one or more programs include further instructions for, upon a determination that the sound input corresponds to the voice of the particular user, outputting a voice prompt including a name of the particular user.
  - 34. The non-transitory computer-readable storage medium of claim 17, wherein the one or more programs include further instructions for:
    - determining whether the electronic device is in a predetermined orientation; and
      
      upon a determination that the electronic device is in the predetermined orientation, activating a predetermined mode of the voice trigger.

18. An electronic device, comprising:
- a sound receiving unit configured to receive sound input; and
  
  a processing unit coupled to the sound receiving unit, the processing unit configured to;
  
  determine, based on comparing an amount of light detected on at least a front surface of the electronic device to a threshold amount of light, whether to operate a voice trigger in a standby mode or in a listening mode;
  
  in accordance with a determination to operate the voice trigger in the listening mode;
  
  generate an input representation of the sound input, wherein the input representation represents audio signatures of the sound input;
  
  determine whether at least a portion of the sound input corresponds to a predetermined type of sound;
  
  upon a determination that at least a portion of the sound input corresponds to the predetermined type, determine whether the sound input includes predetermined content based on comparing of the input representation of the sound input to one or more reference representations;
  
  upon a determination that the sound input includes the predetermined content, generate a control signal comprising instructions to initiate a speech-based service; and
  
  initiate the speech-based service based on the control signal; and
  
  in accordance with a determination to operate the voice trigger in the standby mode, forgo initiating the speech-based service based on received sound input.
- View Dependent Claims (19, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
- - 19. The electronic device of claim 18, the processing unit further configured to, prior to determining whether the sound input corresponds to the predetermined type of sound, determine whether the sound input satisfies a predetermined condition.
  - 35. The electronic device of claim 19, wherein the predetermined condition is an amplitude threshold.
  - 36. The electronic device of claim 19, wherein said determining whether the sound input satisfies a predetermined condition is performed by a third sound detector, wherein the third sound detector consumes less power while operating than a first sound detector, the first sound detector being configured to determine whether the sound input corresponds to the predetermined type of sound.
  - 37. The electronic device of claim 18, wherein said determining whether the sound input corresponds to a predetermined type of sound is performed by a first sound detector, wherein said determining whether the sound input includes predetermined content is performed by a second sound detector, and wherein the first sound detector consumes less power while operating than the second sound detector.
  - 38. The electronic device of claim 37, wherein the second sound detector is initiated in response to a determination by the first sound detector that the sound input corresponds to the predetermined type.
  - 39. The electronic device of claim 37, wherein the second sound detector is operated for at least a predetermined amount of time after the determination by the first sound detector that the sound input corresponds to the predetermined type.
  - 40. The electronic device of claim 18, wherein the predetermined type is a human voice and the predetermined content is one or more words.
  - 41. The electronic device of claim 18, wherein the predetermined content is one or more predetermined phonemes.
  - 42. The electronic device of claim 41, wherein the one or more predetermined phonemes constitute at least one word.
  - 43. The electronic device of claim 18, the processing unit further configured to:
    - store at least a portion of the sound input in memory; and
      
      provide the portion of the sound input to the speech-based service once the speech-based service is initiated.
  - 44. The electronic device of claim 18, the processing unit further configured to determine whether the sound input corresponds to a voice of a particular user.
  - 45. The electronic device of claim 44, wherein the speech-based service is initiated upon a determination that the sound input includes the predetermined content and that the sound input corresponds to the voice of the particular user.
  - 46. The electronic device of claim 45, wherein the speech-based service is initiated in a limited access mode upon a determination that the sound input includes the predetermined content and that the sound input does not correspond to the voice of the particular user.
  - 47. The electronic device of claim 45, the processing unit further configured to:
    - upon a determination that the sound input corresponds to the voice of the particular user, output a voice prompt including a name of the particular user.
  - 48. The electronic device of claim 18, the processing unit further configured to:
    - determine whether the electronic device is in a predetermined orientation; and
      
      upon a determination that the electronic device is in the predetermined orientation, activate a predetermined mode of the voice trigger.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Binder, Justin, Post, Samuel D., Tackin, Onur, Gruber, Thomas R.
Primary Examiner(s)
Hang, Vu B

Application Number

US14/175,864
Publication Number

US 20140222436A1
Time in Patent Office

1,824 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/02   Feature extraction for spee...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 17/24   the user being prompted to ...

G10L 2015/223   Execution procedure of a sp...

G10L 21/16   Transforming into a non-vis...

G10L 25/51   for comparison or discrimin...

G10L 25/84   for discriminating voice fr...

Voice trigger for a digital assistant

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

2949 Citations

48 Claims

Specification

Solutions

Use Cases

Quick Links

Voice trigger for a digital assistant

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

2949 Citations

48 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links