Architecture for multi-domain utterance processing

US 9,070,366 B1
Filed: 12/19/2012
Issued: 06/30/2015
Est. Priority Date: 12/19/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a computer-readable memory storing executable instructions; and

one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;

receive data regarding an utterance of a user;

generate a transcription of the utterance using automatic speech recognition;

process the transcription with a first natural language understanding (“

NLU”

) module to produce a first plurality of interpretations of a requested action in the transcription, wherein the first NLU module is associated with a first domain of actions, and wherein at least a first interpretation of the first plurality of interpretations is associated with a first score indicative of whether the first interpretation corresponds to the requested action in the transcription;

process the transcription with a second NLU module to produce a second plurality of interpretations of the requested action in the transcription, wherein the second NLU module is associated with a second domain of actions, and wherein at least a second interpretation of the second plurality of interpretations is associated with a second score indicative of whether the second interpretation corresponds to the requested action in the transcription;

select, from the first plurality of interpretations or the second plurality of interpretations, a selected interpretation based at least in part on a score associated with the selected interpretation, wherein the score corresponds to one of the first score or the second score; and

generate a response based at least partly on the selected interpretation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.

347 Citations

29 Claims

1. A system comprising:
- a computer-readable memory storing executable instructions; and
  
  one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;
  
  receive data regarding an utterance of a user;
  
  generate a transcription of the utterance using automatic speech recognition;
  
  process the transcription with a first natural language understanding (“
  
  NLU”
  
  ) module to produce a first plurality of interpretations of a requested action in the transcription, wherein the first NLU module is associated with a first domain of actions, and wherein at least a first interpretation of the first plurality of interpretations is associated with a first score indicative of whether the first interpretation corresponds to the requested action in the transcription;
  
  process the transcription with a second NLU module to produce a second plurality of interpretations of the requested action in the transcription, wherein the second NLU module is associated with a second domain of actions, and wherein at least a second interpretation of the second plurality of interpretations is associated with a second score indicative of whether the second interpretation corresponds to the requested action in the transcription;
  
  select, from the first plurality of interpretations or the second plurality of interpretations, a selected interpretation based at least in part on a score associated with the selected interpretation, wherein the score corresponds to one of the first score or the second score; and
  
  generate a response based at least partly on the selected interpretation.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to:
    - process the transcription with the first NLU module and the second NLU module in parallel.
  - 3. The system of claim 1, wherein the one or more processors are programmed by the executable instructions to process the transcription in the first NLU module by performing named entity recognition on the transcription to recognize at least one named entity in the transcription, wherein the at least one named entity is associated with an actionable intent of the user in making the utterance.
  - 4. The system of claim 1, wherein the score associated with the selected interpretation is determined based at least partly on previously received data regarding a previous utterance.
  - 5. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to:
    - determine a hint based at least partly on previously received data regarding a previous utterance,wherein the score associated with the selected interpretation is determined based at least partly on the hint.
  - 6. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to:
    - determine a hint based at least partly on a previously generated response,wherein the score associated with the selected interpretation is determined based at least partly on the hint.
  - 7. The system of claim 1, wherein the first domain comprises one of:
    - phone dialing, shopping, getting directions, playing music, or performing a search.

8. A computer-implemented method comprising:
- under control of one or more computing devices configured with specific computer-executable instructions,receiving text corresponding to a request of a user;
  
  processing the text in a first natural language understanding (“
  
  NLU”
  
  ) module to generate a first interpretation of the transcription, and in a second NLU module to generate a second interpretation of the transcription,wherein first NLU module is associated with a first domain, and the second NLU module is associated with a second domain,wherein the first interpretation is associated with a first score indicative of whether the first interpretation corresponds to an action requested by the user, andwherein the second interpretation is associated with a second score indicative of whether the second interpretation corresponds to the action requested by the user;
  
  selecting the first interpretation based at least partly on the first score and the second score; and
  
  generating a response based at least partly on the first interpretation.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 9. The computer-implemented method of claim 8, wherein processing the transcription in the first NLU module is performed in parallel with processing the transcription in the second NLU module.
  - 10. The computer-implemented method of claim 8, wherein processing the transcription in the second NLU module is initiated before completion of processing by the first NLU module.
  - 11. The computer-implemented method of claim 8, wherein processing the text in at least one of the first NLU module and the second NLU module is based at least partly on a previous response.
  - 12. The computer-implemented method of claim 8, further comprising determining a hint based at least partly on a previously received text, wherein processing the text in the first and second NLU modules is based at least partly on the hint.
  - 13. The computer-implemented method of claim 8, further comprising determining a hint based at least partly on a previous response, wherein processing the text in the first and second NLU modules is based at least partly on the hint.
  - 14. The computer-implemented method of claim 8, wherein processing the text in the first NLU module comprises performing named entity recognition on the text to recognize one or more named entities, and wherein the one or more named entities are associated with an actionable intent of the user in making the request.
  - 15. The computer-implemented method of claim 8, wherein processing the text in the first NLU module comprises generating a plurality of interpretations, the plurality of interpretations comprising the first interpretation.
  - 16. The computer-implemented method of claim 8, wherein the text is generated by an automatic speech recognition module based at least partly on a user utterance.
  - 17. The computer-implemented method of claim 8, further comprising ranking the first interpretation and the second interpretation based at least partly the first score and the second score.
  - 18. The computer-implemented method of claim 8, wherein generating the response comprises one of:
    - generating a text-to-speech presentation, generating an executable command, initiating a data stream to the client device, or performing an action.

19. One or more non-transitory computer readable media comprising executable code that, when executed, cause one or more computing devices to perform a process comprising:
- receiving text corresponding to a request of a user;
  
  processing the text in a first natural language understanding (“
  
  NLU”
  
  ) module to generate a first interpretation of the transcription, and in a second NLU module to generate a second interpretation of the transcription,wherein first NLU module is associated with a first domain, and the second NLU module is associated with a second domain, andwherein the first interpretation is associated with a first score indicative of whether the first interpretation corresponds to an action requested by the user, and the second interpretation is associated with a second score indicative of whether the second interpretation corresponds to the action requested by the user;
  
  selecting the first interpretation based at least partly on the first score and the second score; and
  
  generating a response based at least partly on the first interpretation.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 20. The one or more non-transitory computer readable media of claim 19, wherein processing the transcription in the first NLU module is performed in parallel with processing the transcription in the second NLU modules in parallel.
  - 21. The one or more non-transitory computer readable media of claim 19, wherein processing of the transcription in the second NLU module is initiated before completion of processing by the first NLU module.
  - 22. The one or more non-transitory computer readable media of claim 19, initiating wherein processing the text in at least one of the first NLU module and the second NLU module is based at least partly on a previous response.
  - 23. The one or more non-transitory computer readable media of claim 19, the process further comprising determining a hint based at least partly on a previously received text, wherein processing the text in at least one of the first NLU module and the second NLU module is based at least partly on the hint.
  - 24. The one or more non-transitory computer readable media of claim 19, the process further comprising determining a hint based at least partly on a previous response, wherein processing the text in at least one of the first NLU module and the second NLU module is based at least partly on the hint.
  - 25. The one or more non-transitory computer readable media of claim 19, wherein processing the text in the first NLU module comprises performing named entity recognition on the text to recognize one or more named entities, and wherein the one or more named entities are associated with an actionable intent of the user in making the request.
  - 26. The one or more non-transitory computer readable media of claim 19, wherein processing the text in the first NLU module comprises generating a plurality of interpretations, the plurality of interpretations comprising the first interpretation.
  - 27. The one or more non-transitory computer readable media of claim 19, wherein the text is generated by an automatic speech recognition module based at least partly on a user utterance.
  - 28. The one or more non-transitory computer readable media of claim 19, the process further comprising ranking the first interpretation and the second interpretation based at least partly the first score and the second score.
  - 29. The one or more non-transitory computer readable media of claim 19, wherein generating the response comprises one of:
    - generating a text-to-speech presentation, generating an executable command, initiating a data stream to the client device, or performing an action.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Mathias, Lambert, Kiss, Imre Attila, Thomas, Ryan Paul, Shi, Ying, Deramat, Frederic Johan Georges
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US13/720,909
Time in Patent Office

923 Days
Field of Search

704/9, 704231-257, 704270-275
US Class Current

1/1
CPC Class Codes

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/295   Named entity recognition

G06F 40/35   Discourse or dialogue repre...

G06F 40/40   Processing or translation o...

G06F 40/56   Natural language generation

G10L 13/08   Text analysis or generation...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Architecture for multi-domain utterance processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

347 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Architecture for multi-domain utterance processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

347 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links