Method for Voice Activation of a Software Agent from Standby Mode

US 20140214429A1
Filed: 01/10/2014
Published: 07/31/2014
Est. Priority Date: 01/25/2013
Status: Abandoned Application

First Claim

Patent Images

1. A method for voice activation of a software agent, in particular of a personal assistant system from a standby mode, comprising:

providing a microphone (2), an output device (3, 4), an audio buffer (6), and a hardware infrastructure which is able to execute a primary voice recognition process (8), a secondary voice recognition process (7) and a dialog system (9),continually buffering an audio recording (11) picked up by said microphone (2) in said audio buffer (6), so that said audio buffer (6) always contains the audio recording (11) of the most recent past, andinputting said audio recording (11) picked up by said microphone (2) to said secondary voice recognition process (7), which, on recognizing a keyword (18) or a phrase from a previously defined keyword- and phrase-catalog starts or activates (12) from an inactive state said primary voice recognition process (8) which converts the entire or most recent content (21, 17) of said audio buffer (6) as well as the subsequent live transmission (22) to text (13) and inputs this text (13) to said dialog system (9) which likewise starts or is activated (20) from an inactive state and analyzes the content of said text (13) as to whether it contains a question, a message or a request made by the user to said software agent, in which case, if it is answered in the affirmative, said dialog system (9) triggers an appropriate action or generates an appropriate reply (14) and contacts the user via said output device (3, 4) and otherwise, if said text (13) does not contain any relevant or any evaluable content, said dialog system (9) and at the latest then also said primary voice recognition process (8) return to the inactive state or terminate and again return the control to said secondary voice recognition process (7),whereby the interplay between said secondary voice recognition process (7) and said primary voice recognition process (8) helps to maximize the idle time of said primary voice recognition process (8) while the user still can ask said software agent complex questions in standby mode and he gets instant and final replies or actions without further interposed interaction steps such that the user has the impression that said software agent listens with the same attention in the standby mode as during regular operation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for voice activation of a software agent from a standby mode. In one embodiment, an audio recording (2) is buffered in an audio buffer (6) and at the same time, the audio recording is input to a secondary voice recognition process (7) which is economical in terms of energy and has an increased false positive rate. When a keyword is recognized, a primary voice recognition process (8) is activated from an inactive state, which converts the audio buffer to text and inputs it to a dialog system (9) which analyzes as to whether there is a relevant question made by the user. If this is the case, the user gets an acoustic reply (3), and if this is not the case, the dialog system and the primary voice recognition process immediately return to the inactive state and transfer the control to the secondary voice recognition process.

362 Citations

20 Claims

1. A method for voice activation of a software agent, in particular of a personal assistant system from a standby mode, comprising:
- providing a microphone (2), an output device (3, 4), an audio buffer (6), and a hardware infrastructure which is able to execute a primary voice recognition process (8), a secondary voice recognition process (7) and a dialog system (9),continually buffering an audio recording (11) picked up by said microphone (2) in said audio buffer (6), so that said audio buffer (6) always contains the audio recording (11) of the most recent past, andinputting said audio recording (11) picked up by said microphone (2) to said secondary voice recognition process (7), which, on recognizing a keyword (18) or a phrase from a previously defined keyword- and phrase-catalog starts or activates (12) from an inactive state said primary voice recognition process (8) which converts the entire or most recent content (21, 17) of said audio buffer (6) as well as the subsequent live transmission (22) to text (13) and inputs this text (13) to said dialog system (9) which likewise starts or is activated (20) from an inactive state and analyzes the content of said text (13) as to whether it contains a question, a message or a request made by the user to said software agent, in which case, if it is answered in the affirmative, said dialog system (9) triggers an appropriate action or generates an appropriate reply (14) and contacts the user via said output device (3, 4) and otherwise, if said text (13) does not contain any relevant or any evaluable content, said dialog system (9) and at the latest then also said primary voice recognition process (8) return to the inactive state or terminate and again return the control to said secondary voice recognition process (7),whereby the interplay between said secondary voice recognition process (7) and said primary voice recognition process (8) helps to maximize the idle time of said primary voice recognition process (8) while the user still can ask said software agent complex questions in standby mode and he gets instant and final replies or actions without further interposed interaction steps such that the user has the impression that said software agent listens with the same attention in the standby mode as during regular operation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The method of claim 1, further comprising scanning said audio buffer (6) backwards, beginning at the position in time of the recognized keyword (18) or phrase until a period is found which can be interpreted as a speech pause (16), the most recent content (17) of said audio buffer (6), beginning at the position with the recognized speech pause (16), being handed over to said primary voice recognition process (8).
  - 3. The method of claim 2 wherein said primary voice recognition process (8) remains in the inactive state, if no speech pause (16) is found in said audio buffer (6) in a range beginning at said position in time of the recognized keyword (18) or phrase up to the oldest entries.
  - 4. The method of claim 1 wherein after activation (12) via a keyword (18) or phrase, said primary voice recognition process (8) is executed with high priority and completed after a short time (23, 24), whereby said audio buffer (6) is promptly empty in order to again process live audio data (22) as soon as possible, which minimizes the time the user has to wait for the reply (14) or action.
  - 5. The method of claim 1 wherein said secondary voice recognition process (7) has an increased false positive rate on recognition of keywords (18) and/or phrases, whereby said secondary voice recognition process (7) can be implemented in an especially energy-saving design, correcting every false positive error of said secondary voice recognition process (7) by said primary voice recognition process (8).
  - 6. The method of claim 1 wherein said secondary voice recognition process (7)a) runs as a software on a processor operating with low power consumption, orb) is executed on a digital signal processor, which is optimized for low power consumption, orc) is implemented as a FPGA or ASIC, which is optimized for low power consumption, ord) is implemented as a hardware circuit (25), which is optimized for low power consumption.
  - 7. The method of claim 1 wherein said primary voice recognition process (8) and said secondary voice recognition process (7) run on the same single core or multi-core processor (27), the secondary voice recognition process (7) running in a resource-saving mode of operation, in particular, with low power consumption.
  - 8. The method of claim 1 wherein said primary voice recognition process (8) and said dialog system (9) run on an external server (28) or on a server network, the entire or the most recent content (21, 17) of said audio buffer (6) being transferred via a network (29) and/or radio network to said server (28) or server network.
  - 9. The method of claim 8, further comprising switching said software agent to an anticipatory standby mode as soon as the presence of the user is detected by means of a sensor, while the entire or the most recent content (21, 17) of said audio buffer (6) and/or the live transmission (22) of said audio recording (11) is continually transferred via said network (29) to said external server (28) or server network and buffered there,whereby, in case of voice activation (12) said primary voice recognition process (8) can access the buffered audio recording (11) almost latency-free.
  - 10. The method of claim 9 wherein said sensor is a user interface for user input and/or an acceleration- and/or position-sensor measuring movement or changes in position and/or a light sensors measuring changes in the brightness and/or a satellite navigation sensor measuring changes in position and/or a camera for face recognition,whereby by means of said sensor the user'"'"'s activity is monitored and hence the user'"'"'s presence is detected.
  - 11. The method of claim 1, further comprising intensifying the monitoring of said audio recording (11) for keywords (18) and/or phrases by said secondary voice recognition process (7) as soon as the presence of the user is detected by means of a sensor, whereby said software agent switches to an anticipatory standby mode and is prepared for user input.
  - 12. The method of claim 11 wherein said sensor is a user interface for user input and/or an acceleration- and/or position-sensor measuring movement or changes in position and/or a light sensors measuring changes in the brightness and/or a satellite navigation sensor measuring changes in position and/or a camera for face recognition,whereby by means of said sensor the user'"'"'s activity is monitored and hence the user'"'"'s presence is detected.
  - 13. The method of claim 1 wherein said keyword- and phrase-catalog can be modified, expanded and/or reduced by the user by means of a user interface (4).
  - 14. The method of claim 1 wherein said keyword- and phrase-catalog contains question words, questioning phrases, requests and/or commands.
  - 15. The method of claim 1 wherein said keyword- and phrase-catalog contains nouns relating to topics on which information is available in the database of said dialog system.
  - 16. The method of claim 1 wherein said keyword- and phrase-catalog contains product names, nicknames and/or generic terms.
  - 17. The method of claim 1, further comprising outputting an optical, acoustic and/or haptic signal to the user by means of an output device (3, 4) as soon as a keyword (18) or a phrase is recognized by said secondary voice recognition process (7).
  - 18. The method of claim 17, further comprising outputting a further distinguishable optical, acoustic and/or haptic signal to the user by means of said output device (3, 4) in case said audio buffer (6) converted by said primary voice recognition process (8) and/or said text (13) analyzed by said dialog system (9) does not contain any relevant or any evaluable content.
  - 19. The method of claim 1 wherein said primary voice recognition process (8) can distinguish different speakers by their voice by means of an acoustic model, and wherein said secondary voice recognition process (7) cannot distinguish different speakers,whereby said secondary voice recognition process (7) triggers the execution of said primary voice recognition process (8) as soon as a keyword (18) or a phrase from any speaker is detected by said secondary voice recognition process (7), said primary voice recognition process (8) establishing from the speaker'"'"'s voice whether he/she is entitled to utilize said software agent by means of said acoustic model and if there is no entitlement, said primary voice recognition process (8) is terminating or returning to the inactive state, and again passing on the control to said secondary voice recognition process (7).
  - 20. The method of claim 1 wherein in case said dialog system (9) is not competent for a question, message or request in said audio recording (11), converted to text (13) by said primary voice recognition process (8), said dialog system (9) stores the context and/or the topic and/or the keywords (18) or phrases on a storage means so that the stored information is taken into consideration on one of the subsequent reactivations of said dialog system (9).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Inodyn Newmedia GmbH
Original Assignee
Inodyn Newmedia GmbH
Inventors
Pantel, Lothar

Application Number

US14/152,780
Publication Number

US 20140214429A1
Time in Patent Office

Days
Field of Search
US Class Current

704/275
CPC Class Codes

G06F 1/3206   Monitoring of events, devic...

G06F 3/167   Audio in a user interface, ...

G10L 13/00   Speech synthesis; Text to s...

G10L 15/22   Procedures used during a sp...

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 21/16   Transforming into a non-vis...

H04W 52/0225   using monitoring of externa...

Y02D 30/70   in wireless communication n...

Method for Voice Activation of a Software Agent from Standby Mode

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

362 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Method for Voice Activation of a Software Agent from Standby Mode

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

362 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others