Iteratively learning coreference embeddings of noun phrases using feature representations that include distributed word representations of the noun phrases

US 9,514,098 B1
Filed: 12/26/2013
Issued: 12/06/2016
Est. Priority Date: 12/09/2013
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method useful for modifying a search query issued by a client device, comprising:

identifying, by one or more computer systems, distributed word representations for a plurality of noun phrases, the distributed word representations indicative of syntactic and semantic features of the noun phrases;

determining, by one or more of the computer systems for each of one or more of the noun phrases and based on labeled data, at least one training pair of a referring feature representation and an antecedent feature representation, wherein;

the referring feature representation for the at least one training pair for a given noun phrase of the one or more noun phrases includes the distributed word representation for the given noun phrase, andthe antecedent feature representation for the at least one training pair for the given noun phrase includes the distributed word representation for the given noun phrase augmented by one or more antecedent features, wherein the one or more antecedent features include a parse tree distance for the given noun phrase as a candidate antecedent noun phrase in the labeled data, the parse tree distance being a parse tree based distance between the given noun phrase as the candidate antecedent noun phrase and a corresponding referring noun phrase;

wherein the referring feature representations are m-dimensional space vectors, the antecedent feature representations are n-dimensional space vectors, and wherein the m-dimensional space vectors vary in length from the n-dimensional space vectors;

learning, by one or more of the computer systems, coreference embeddings of the referring and antecedent feature representations of the noun phrases, the learning comprising iteratively embedding the m-dimensional space vectors and the n-dimensional space vectors into a common k-dimensional space;

identifying, by one or more of the computer systems after the learning of the coreference embeddings, a first text segment and a second text segment associated with the first text segment, wherein the second text segment is a search query issued by a client device of a user;

identifying, by one or more of the computer systems in the first text segment, an occurrence of one or more candidate antecedent noun phrases;

identifying, by one or more of the computer systems in the second text segment, an occurrence of the given noun phrase;

determining, by one or more of the computer systems for the given noun phrase, distance measures, in the common k-dimensional space, between the given noun phrase and the one or more candidate antecedent noun phrases based on inner products of the coreference embeddings in the common k-dimensional space;

determining, by one or more of the computer systems, for a candidate noun phrase of the candidate antecedent noun phrases, a score for the candidate noun phrase as an antecedent for the given noun phrase based on the distance measure between the given noun phrase and the candidate noun phrase;

selecting, by one or more of the computer systems, the candidate noun phrase as the antecedent for the given noun phrase based on the determined score;

modifying, by one or more of the computer systems, the search query issued by the client device, wherein modifying the search query comprises replacing the given noun phrase with the selected candidate noun phrase in response to selecting the candidate noun phrase as the antecedent for the given noun phrase; and

providing, by one or more of the computer systems in response to the search query issued by the client device, search results that are responsive to the modified query that replaces the given noun phrase with the selected candidate noun phrase.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus related to determining coreference resolution using distributed word representations. Distributed word representations, indicative of syntactic and semantic features, may be identified for one or more noun phrases. For each of the one or more noun phrases, a referring feature representation and an antecedent feature representation may be determined, where the referring feature representation includes the distributed word representation, and the antecedent feature representation includes the distributed word representation augmented by one or more antecedent features. In some implementations the referring feature representation may be augmented by one or more referring features. Coreference embeddings of the referring and antecedent feature representations of the one or more noun phrases may be learned. Distance measures between two noun phrases may be determined based on the coreference embeddings.

Citations

18 Claims

1. A computer implemented method useful for modifying a search query issued by a client device, comprising:
- identifying, by one or more computer systems, distributed word representations for a plurality of noun phrases, the distributed word representations indicative of syntactic and semantic features of the noun phrases;
  
  determining, by one or more of the computer systems for each of one or more of the noun phrases and based on labeled data, at least one training pair of a referring feature representation and an antecedent feature representation, wherein;
  
  the referring feature representation for the at least one training pair for a given noun phrase of the one or more noun phrases includes the distributed word representation for the given noun phrase, andthe antecedent feature representation for the at least one training pair for the given noun phrase includes the distributed word representation for the given noun phrase augmented by one or more antecedent features, wherein the one or more antecedent features include a parse tree distance for the given noun phrase as a candidate antecedent noun phrase in the labeled data, the parse tree distance being a parse tree based distance between the given noun phrase as the candidate antecedent noun phrase and a corresponding referring noun phrase;
  
  wherein the referring feature representations are m-dimensional space vectors, the antecedent feature representations are n-dimensional space vectors, and wherein the m-dimensional space vectors vary in length from the n-dimensional space vectors;
  
  learning, by one or more of the computer systems, coreference embeddings of the referring and antecedent feature representations of the noun phrases, the learning comprising iteratively embedding the m-dimensional space vectors and the n-dimensional space vectors into a common k-dimensional space;
  
  identifying, by one or more of the computer systems after the learning of the coreference embeddings, a first text segment and a second text segment associated with the first text segment, wherein the second text segment is a search query issued by a client device of a user;
  
  identifying, by one or more of the computer systems in the first text segment, an occurrence of one or more candidate antecedent noun phrases;
  
  identifying, by one or more of the computer systems in the second text segment, an occurrence of the given noun phrase;
  
  determining, by one or more of the computer systems for the given noun phrase, distance measures, in the common k-dimensional space, between the given noun phrase and the one or more candidate antecedent noun phrases based on inner products of the coreference embeddings in the common k-dimensional space;
  
  determining, by one or more of the computer systems, for a candidate noun phrase of the candidate antecedent noun phrases, a score for the candidate noun phrase as an antecedent for the given noun phrase based on the distance measure between the given noun phrase and the candidate noun phrase;
  
  selecting, by one or more of the computer systems, the candidate noun phrase as the antecedent for the given noun phrase based on the determined score;
  
  modifying, by one or more of the computer systems, the search query issued by the client device, wherein modifying the search query comprises replacing the given noun phrase with the selected candidate noun phrase in response to selecting the candidate noun phrase as the antecedent for the given noun phrase; and
  
  providing, by one or more of the computer systems in response to the search query issued by the client device, search results that are responsive to the modified query that replaces the given noun phrase with the selected candidate noun phrase.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the one or more antecedent features further include one or more additional features indicative of one or more of a type of entity, a type of mention, number of words in the noun phrase, and a gender associated with the noun phrase.
  - 3. The method of claim 1, wherein the referring feature representation is augmented with at least one referring feature, wherein the at least one referring feature is indicative of semantic features of the given noun phrase as a referring noun phrase.
  - 4. The method of claim 3, wherein the referring feature is indicative of one or more of a type of entity, a type of mention, number of words in the noun phrase, and a gender associated with the noun phrase.
  - 5. The method of claim 1, wherein the antecedent feature representation for the given noun phrase includes the referring feature representation for the given noun phrase augmented by the one or more antecedent features.
  - 6. The method of claim 1, wherein identifying the distributed word representations for each of one or more noun phrases further includes:
    - identifying a language for the one or more noun phrases; and
      
      determining the distributed word representations based on the language.
  - 7. The method of claim 1, wherein learning the coreference embeddings is based on optimizing a loss function, the loss function indicative of a number of incorrect candidate antecedent noun phrases for the given noun phrase.
  - 8. The method of claim 1, wherein the first text segment is a prior search query issued by the client device of the user prior to the search query.

9. A system useful for modifying a search query issued by a client device, the system including memory and one or more processors operable to execute instructions stored in the memory, comprising instructions to:
- identify distributed word representations for one or more noun phrases, the distributed word representations indicative of syntactic and semantic features of the one or more noun phrases;
  
  determine, for each of the one or more noun phrases and based on labeled data, at least one training pair of a referring feature representation and an antecedent feature representation, wherein;
  
  the referring feature representation for the at least one training pair for a given noun phrase of the one or more noun phrases includes the distributed word representation for the given noun phrase, andthe antecedent feature representation for the at least one training pair for the given noun phrase includes the distributed word representation for the given noun phrase augmented by one or more antecedent features, wherein the one or more antecedent features include a parse tree distance for the given noun phrase as a candidate antecedent noun phrase in the labeled data, the parse tree distance being a parse tree based distance between the given noun phrase as the candidate antecedent noun phrase and a corresponding referring noun phrase;
  
  wherein the referring feature representations are m-dimensional space vectors, the antecedent feature representations are n-dimensional space vectors, and wherein the m-dimensional space vectors vary in length from the n-dimensional space vectors;
  
  learn coreference embeddings of the referring and antecedent feature representations of the one or more noun phrases based on iteratively embedding the m-dimensional space vectors and the n-dimensional space vectors into a common k-dimensional space;
  
  identify, after the learning of the coreference embeddings, a first text segment and a second text segment associated with the first text segment, wherein the second text segment is a search query issued by a client device of a user;
  
  identify, in the first text segment, an occurrence of one or more candidate antecedent noun phrases;
  
  identify, in the second text segment, an occurrence of the given noun phrase;
  
  determine, for the given noun phrase, distance measures, in the common k-dimensional space, between the given noun phrase and the one or more candidate antecedent noun phrases based on inner products of the coreference embeddings in the common k-dimensional space;
  
  determine, for a candidate noun phrase of the candidate antecedent noun phrases, a score for the candidate noun phrase as an antecedent for the given noun phrase based on the distance measure between the given noun phrase and the candidate noun phrase;
  
  select the candidate noun phrase as the antecedent for the given noun phrase based on the determined score;
  
  modify the search query issued by the client device, wherein modifying the search query comprises replacing the given noun phrase with the selected candidate noun phrase in response to selecting the candidate noun phrase as the antecedent for the given noun phrase; and
  
  provide, in response to the search query issued by the client device, search results that are responsive to a modified query that replaces the given noun phrase with the selected candidate noun phrase.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the one or more antecedent features further include one or more additional features indicative of one or more of a type of entity, a type of mention, number of words in the noun phrase, and a gender associated with the noun phrase.
  - 11. The system of claim 9, wherein the referring feature representation is augmented with at least one referring feature, wherein the at least one referring feature is indicative of semantic features of the given noun phrase as a referring noun phrase.
  - 12. The system of claim 11, wherein the referring feature is indicative of one or more of a type of entity, a type of mention, number of words in the noun phrase, and a gender associated with the noun phrase.
  - 13. The system of claim 9, wherein the antecedent feature representation for the given noun phrase includes the referring feature representation for the given noun phrase augmented by the one or more antecedent features.
  - 14. The system of claim 9, wherein the instructions to identify the distributed word representations for each of one or more noun phrases further include instructions to:
    - identify a language for the one or more noun phrases; and
      
      determine the distributed word representations based on the language.
  - 15. The system of claim 9, wherein the instructions to learn the coreference embeddings are based on instructions to optimize a loss function, the loss function indicative of a number of incorrect candidate antecedent noun phrases for the given noun phrase.
  - 16. The system of claim 9, wherein the first text segment is a prior search query issued by the client device of the user prior to the search query.

17. A non-transitory computer readable storage medium storing computer instructions executable by a processor, including instructions that are useful for modifying a search query issued by a client device and that are to:
- identify distributed word representations for one or more noun phrases, the distributed word representations indicative of syntactic and semantic features of the one or more noun phrases;
  
  determine, for each of the one or more noun phrases and based on labeled data, at least one training pair of a referring feature representation and an antecedent feature representation, wherein;
  
  the referring feature representation for the at least one training pair for a given noun phrase of the one or more noun phrases includes the distributed word representation for the given noun phrase, andthe antecedent feature representation for the at least one training pair for the given noun phrase includes the distributed word representation for the given noun phrase augmented by one or more antecedent features, wherein the one or more antecedent features include a parse tree distance for the given noun phrase as a candidate antecedent noun phrase in the labeled data, the parse tree distance being a parse tree based distance between the given noun phrase as the candidate antecedent noun phrase and a corresponding referring noun phrase;
  
  wherein the referring feature representations are m-dimensional space vectors, the antecedent feature representations are n-dimensional space vectors, and wherein the m-dimensional space vectors vary in length from the n-dimensional space vectors;
  
  learn coreference embeddings of the referring and antecedent feature representations of the one or more noun phrases based on iteratively embedding the m-dimensional space vectors and the n-dimensional space vectors into a common k-dimensional space;
  
  identify, after the learning of the coreference embeddings, a first text segment and a second text segment associated with the first text segment, wherein the second text segment is a search query issued by a client device of a user;
  
  identify, in the first text segment, an occurrence of one or more candidate antecedent noun phrases;
  
  identify, in the second text segment, an occurrence of the given noun phrase;
  
  determine, for the given noun phrase, distance measures, in the common k-dimensional space, between the given noun phrase and the one or more candidate antecedent noun phrases based on inner products of the coreference embeddings in the common k-dimensional space;
  
  determine, for a candidate noun phrase of the candidate antecedent noun phrases, a score for the candidate noun phrase as an antecedent for the given noun phrase based on the distance measure between the given noun phrase and the candidate noun phrase;
  
  select the candidate noun phrase as the antecedent for the given noun phrase based on the determined score;
  
  modify the search query issued by the client device, wherein modifying the search query comprises replacing the given noun phrase with the selected candidate noun phrase in response to selecting the candidate noun phrase as the antecedent for the given noun phrase; and
  
  provide, in response to the search query issued by the client device, search results that are responsive to a modified query that replaces the given noun phrase with the selected candidate noun phrase.
- View Dependent Claims (18)
- - 18. The non-transitory computer readable storage medium of claim 17, wherein the computer instructions further include instructions to:
    - identify a language for the one or more noun phrases; and
      
      determine the distributed word representations based on the language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Liu, Jingyi, Pereira, Fernando Carlos das Neves, Al-Rfou, Rami, Subramanya, Amarnag, Chen, Kai, Ponte, Jay
Primary Examiner(s)
Goddard, Tammy Paige
Assistant Examiner(s)
Yehl, Walter

Application Number

US14/141,182
Time in Patent Office

1,076 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 40/10   Text processing natural lan...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/295   Named entity recognition

Iteratively learning coreference embeddings of noun phrases using feature representations that include distributed word representations of the noun phrases

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Iteratively learning coreference embeddings of noun phrases using feature representations that include distributed word representations of the noun phrases

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links