Evaluating transcriptions with a semantic parser

US 8,868,409 B1
Filed: 01/16/2014
Issued: 10/21/2014
Est. Priority Date: 01/16/2014
Status: Active Grant

First Claim

Patent Images

1. A method performed by data processing apparatus, the method comprising:

providing, over a network, audio data for an utterance;

receiving, at a client device and over the network, information that indicates (i) candidate transcriptions for the utterance and (ii) semantic information for the candidate transcriptions;

using a semantic parser at the client device to evaluate each of at least a plurality of the candidate transcriptions; and

selecting one of the plurality of the candidate transcriptions based on at least (i) the received semantic information and (ii) the output of the semantic parser at the client device for the plurality of candidate transcriptions that are evaluated.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some implementations, audio data for an utterance is provided over a network. At a client device and over the network, information is received that indicates candidate transcriptions for the utterance and semantic information for the candidate transcriptions. A semantic parser is used at the client device to evaluate each of at least a plurality of the candidate transcriptions. One of the candidate transcriptions is selected based on at least the received semantic information and the output of the semantic parser for the plurality of candidate transcriptions that are evaluated.

Citations

25 Claims

1. A method performed by data processing apparatus, the method comprising:
- providing, over a network, audio data for an utterance;
  
  receiving, at a client device and over the network, information that indicates (i) candidate transcriptions for the utterance and (ii) semantic information for the candidate transcriptions;
  
  using a semantic parser at the client device to evaluate each of at least a plurality of the candidate transcriptions; and
  
  selecting one of the plurality of the candidate transcriptions based on at least (i) the received semantic information and (ii) the output of the semantic parser at the client device for the plurality of candidate transcriptions that are evaluated.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein receiving information that indicates semantic information for the candidate transcription comprises receiving information indicating first semantic scores for one or more terms in the candidate transcriptions;
    - wherein using the semantic parser at the client device comprises determining second semantic scores for one or more terms in the candidate transcriptions; and
      
      wherein selecting one of the plurality of candidate transcriptions comprises selecting one of the plurality of candidate transcriptions based on the first semantic scores and the second semantic scores.
  - 3. The method of claim 2, wherein the first semantic scores and the second semantic scores indicate likelihoods that terms belong to one or more semantic categories.
  - 4. The method of claim 3, wherein the first semantic scores indicate likelihoods that terms belong to first semantic categories;
    - andwherein the second semantic scores indicate likelihoods that terms belong to second semantic categories, including one or more semantic categories not included in the first semantic categories.
  - 5. The method of claim 2, wherein determining the second semantic scores comprises determining the second semantic scores at the client device using information that is available at the client device and is not available to a device that generated the first semantic scores.
  - 6. The method of claim 1, wherein using a semantic parser to evaluate each of at least a plurality of the candidate transcriptions comprises receiving a semantic parser confidence score for each of the plurality of candidate transcriptions;
    - andwherein selecting one of the plurality of the candidate transcriptions comprises selecting one of the plurality of candidate transcriptions based on the semantic parser confidence scores.
  - 7. The method of claim 1, wherein selecting one of the plurality of the candidate transcriptions comprises:
    - determining, based on output of the semantic parser for the plurality of candidate transcriptions that are evaluated, that one of the plurality of candidate transcriptions represents a voice action; and
      
      selecting the candidate transcription determined to represent a voice action.
  - 8. The method of claim 7, further comprising:
    - determining a type of action indicated by the selected candidate transcription;
      
      determining likely semantic categories for words in the selected candidate transcription; and
      
      performing an action according to the determined type of action and the likely semantic categories.
  - 9. The method of claim 1, wherein receiving the information that indicates multiple candidate transcriptions for the utterance comprises receiving information indicating a list of candidate transcriptions.
  - 10. The method of claim 1, wherein receiving the information that indicates multiple candidate transcriptions for the utterance comprises receiving a speech recognition lattice indicating multiple possible transcriptions for the utterance.

11. A non-transitory computer storage medium encoded with instructions that, when executed by a user device, cause the user device to perform operations comprising:
- providing, over a network, audio data for an utterance;
  
  receiving, at the user device and over the network, information that indicates (i) candidate transcriptions for the utterance and (ii) semantic information for the candidate transcriptions;
  
  using a semantic parser at the user device to evaluate each of at least a plurality of the candidate transcriptions; and
  
  selecting one of the plurality of the candidate transcriptions based on at least (i) the received semantic information and (ii) the output of the semantic parser at the user device for the plurality of candidate transcriptions that are evaluated.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The non-transitory computer storage medium of claim 11, wherein receiving information that indicates semantic information for the candidate transcription comprises receiving information indicating first semantic scores for one or more terms in the candidate transcriptions;
    - wherein using the semantic parser at the user device comprises determining second semantic scores for one or more terms in the candidate transcriptions; and
      
      wherein selecting one of the plurality of candidate transcriptions comprises selecting one of the plurality of candidate transcriptions based on the first semantic scores and the second semantic scores.
  - 13. The non-transitory computer storage medium of claim 12, wherein the first semantic scores and the second semantic scores indicate likelihoods that terms belong to one or more semantic categories.
  - 14. The non-transitory computer storage medium of claim 13, wherein the first semantic scores indicate likelihoods that terms belong to first semantic categories;
    - andwherein the second semantic scores indicate likelihoods that terms belong to second semantic categories, including one or more semantic categories not included in the first semantic categories.
  - 15. The non-transitory computer storage medium of claim 12, wherein determining the second semantic scores comprises determining the second semantic scores at the user device using information that is available at the user device and is not available to a device that generated the first semantic scores.

16. A system comprising:
- a user device and one or more storage devices storing instructions that are operable, when executed by the user device, to cause the user device to perform operations comprising;
  
  providing, over a network, audio data for an utterance;
  
  receiving, at the user device and over the network, information that indicates (i) candidate transcriptions for the utterance and (ii) semantic information for the candidate transcriptions;
  
  using a semantic parser at the user device to evaluate each of at least a plurality of the candidate transcriptions; and
  
  selecting one of the plurality of the candidate transcriptions based on at least (i) the received semantic information and (ii) the output of the semantic parser at the user device for the plurality of candidate transcriptions that are evaluated.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, wherein receiving information that indicates semantic information for the candidate transcription comprises receiving information indicating first semantic scores for one or more terms in the candidate transcriptions;
    - wherein using the semantic parser at the user device comprises determining second semantic scores for one or more terms in the candidate transcriptions; and
      
      wherein selecting one of the plurality of candidate transcriptions comprises selecting one of the plurality of candidate transcriptions based on the first semantic scores and the second semantic scores.
  - 18. The system of claim 17, wherein the first semantic scores and the second semantic scores indicate likelihoods that terms belong to one or more semantic categories.
  - 19. The system of claim 18, wherein the first semantic scores indicate likelihoods that terms belong to first semantic categories;
    - andwherein the second semantic scores indicate likelihoods that terms belong to second semantic categories, including one or more semantic categories not included in the first semantic categories.
  - 20. The system of claim 17, wherein determining the second semantic scores comprises determining the second semantic scores at the user device using information that is available at the user device and is not available to a device that generated the first semantic scores.

21. A method performed by data processing apparatus, the method comprising:
- providing, from a client device to a server system over a network, audio data for an utterance;
  
  receiving, at the client device and from the server system over the network, information that indicates (i) candidate transcriptions for the utterance and (ii) semantic information comprising output of a first semantic parser for at least one of the candidate transcriptions;
  
  using a second semantic parser at the client device to evaluate each of at least a plurality of the candidate transcriptions; and
  
  selecting one of the plurality of the candidate transcriptions based on at least (i) the received semantic information comprising output of the first semantic parser for at least one of the candidate transcriptions and (ii) the output of the second semantic parser at the client device for the plurality of candidate transcriptions that are evaluated.
- View Dependent Claims (22, 23, 24, 25)
- - 22. The method of claim 21, wherein using the second semantic parser at the client device to evaluate each of at least a plurality of the candidate transcriptions comprises accessing information that is available to the client device and is not available to the server system.
  - 23. The method of claim 22, wherein accessing the information that is available to the client device and is not available to the server system comprises accessing calendar information, contact list information, or application list information that is not available to the server system.
  - 24. The method of claim 22, wherein accessing the information that is available to the client device and is not available to the server system comprises accessing information stored at the client device.
  - 25. The method of claim 22, wherein accessing the information that is available to the client device and is not available to the server system comprises accessing information that is remotely accessible by the client device and is not available to the server system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Mengibar, Pedro J. Moreno, Biadsy, Fadi, Casado, Diego Melendo
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US14/157,020
Time in Patent Office

278 Days
Field of Search

704/235, 704/246, 704/250, 704/251, 704/252, 704/9, 704/10
US Class Current

704/9
CPC Class Codes

G06F 40/30   Semantic analysis

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

Evaluating transcriptions with a semantic parser

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Evaluating transcriptions with a semantic parser

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links