Cross-lingual discriminative learning of sequence models with posterior regularization

US 9,779,087 B2
Filed: 12/13/2013
Issued: 10/03/2017
Est. Priority Date: 12/13/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

obtaining, at a computing device having one or more processors, (i) an aligned bi-text for a source language and a target language, the aligned bi-text comprising a plurality of source-target sentence pairs, and (ii) a supervised sequence model for the source language;

labeling, at the computing device, each word of a source side of the aligned bi-text using the supervised sequence model to obtain a labeled source side of the aligned bi-text;

projecting, at the computing device, labels from the labeled source side to a target side of the aligned bi-text to obtain a labeled target side of the aligned bi-text, wherein each label of the labeled source and target sides of the aligned bi-text is a named entity type tag for a particular word;

filtering, at the computing device, the labeled target side of the aligned bi-text for the target language to obtain a filtered target side of the aligned bi-text for training a sequence model for the target language for a named entity segmentation system, wherein the filtering comprises discarding any particular source-target sentence pair when (i) a threshold amount of tokens of the particular source-target sentence pair are unaligned or (ii) a source named entity of the particular source-target sentence pair is not aligned with a target sentence token;

training, at the computing device, the sequence model for the target language using posterior regularization with soft constraints on the filtered target side to learn a set of parameters for the target language;

obtaining, at the computing device, a trained sequence model for the target language using the set of parameters for the target language, the trained sequence model being configured to model a probability distribution over possible labels for text in the target language;

receiving, at the computing device, an input text in the target language;

analyzing, at the computing device, the input text using the trained sequence model for the target language; and

generating, at the computing device, an output based on the analyzing of the input text using the trained sequence model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method can include obtaining (i) an aligned bi-text for a source language and a target language, and (ii) a supervised sequence model for the source language. The method can include labeling a source side of the aligned bi-text using the supervised sequence model and projecting labels from the labeled source side to a target side of the aligned bi-text to obtain a labeled target side of the aligned bi-text. The method can include filtering the labeled target side based on a task of a natural language processing (NLP) system configured to utilize a sequence model for the target language to obtain a filtered target side of the aligned bi-text. The method can also include training the sequence model for the target language using posterior regularization with soft constraints on the filtered target side to obtain a trained sequence model for the target language.

32 Citations

View as Search Results

14 Claims

1. A computer-implemented method, comprising:
- obtaining, at a computing device having one or more processors, (i) an aligned bi-text for a source language and a target language, the aligned bi-text comprising a plurality of source-target sentence pairs, and (ii) a supervised sequence model for the source language;
  
  labeling, at the computing device, each word of a source side of the aligned bi-text using the supervised sequence model to obtain a labeled source side of the aligned bi-text;
  
  projecting, at the computing device, labels from the labeled source side to a target side of the aligned bi-text to obtain a labeled target side of the aligned bi-text, wherein each label of the labeled source and target sides of the aligned bi-text is a named entity type tag for a particular word;
  
  filtering, at the computing device, the labeled target side of the aligned bi-text for the target language to obtain a filtered target side of the aligned bi-text for training a sequence model for the target language for a named entity segmentation system, wherein the filtering comprises discarding any particular source-target sentence pair when (i) a threshold amount of tokens of the particular source-target sentence pair are unaligned or (ii) a source named entity of the particular source-target sentence pair is not aligned with a target sentence token;
  
  training, at the computing device, the sequence model for the target language using posterior regularization with soft constraints on the filtered target side to learn a set of parameters for the target language;
  
  obtaining, at the computing device, a trained sequence model for the target language using the set of parameters for the target language, the trained sequence model being configured to model a probability distribution over possible labels for text in the target language;
  
  receiving, at the computing device, an input text in the target language;
  
  analyzing, at the computing device, the input text using the trained sequence model for the target language; and
  
  generating, at the computing device, an output based on the analyzing of the input text using the trained sequence model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, wherein training the sequence model using posterior regularization with soft constraints includes optimizing an objective function using stochastic projected gradients for parameters for the sequence model and optimal dual variables, wherein the objective function is defined as:
  - 3. The computer-implemented method of claim 2, wherein optimizing the objective function further comprises calculating partial derivatives with respect to both θ
    - and λ
      
      .
  - 4. The computer-implemented method of claim 3, wherein calculating the partial derivative with respect to θ
    - includes;
      
      determining expectations of the model features f given the current distribution p_θand the constraint distribution q; and
      
      performing tractable inference by assuming that a function ƒ
      
      (x,y) of the model features f factorizes according to smaller parts as follows;
  - 5. The computer-implemented method of claim 4, wherein calculating the partial derivative with respect to λ
    - includes;
      
      determining expectations of the constraint features φ
      
      ; and
      
      performing tractable inference by ensuring that φ
      
      also factorizes according to the same structure as the model features f.
  - 6. The computer-implemented method of claim 5, wherein optimizing the objective function further comprises using stochastic projected gradients for both partial derivatives and, for each training sentence, the gradient for θ
    - and λ
      
      is calculated.
  - 7. The computer-implemented method of claim 1, wherein the source language is a resource-rich language having greater than an amount of labeled training data required to train the supervised sequence model for the source language, and wherein the target language is a resource-poor language less than an amount of labeled training data required to train the sequence model for the target language.
  - 8. The computer-implemented method of claim 1, wherein:
    - the input text is a question; and
      
      the generated output is an answer to the question.

9. A computing device comprising:
- a non-transitory computer-readable medium having a set of instructions stored thereon; and
  
  one or more processors configured execute the set of instructions, which causes the computing device to perform operations comprising;
  
  obtaining (i) an aligned bi-text for a source language and a target language, the aligned bi-text comprising a plurality of source-target sentence pairs, and (ii) a supervised sequence model for the source language;
  
  labeling each word of a source side of the aligned bi-text using the supervised sequence model to obtain a labeled source side of the aligned bi-text;
  
  projecting labels from the labeled source side to a target side of the aligned bi-text to obtain a labeled target side of the aligned bi-text, wherein each label of the labeled source and target sides of the aligned bi-text is named entity type tag for a particular word;
  
  filtering the labeled target side to obtain a filtered target side of the aligned bi-text for training a sequence model for the target language for a named entity segmentation system, wherein the filtering comprises discarding any particular source-target sentence pair when (i) the particular source-target sentence pair comprises a named entity having a confidence level less than a confidence threshold or (ii) the particular source-target sentence pair comprises no named entities;
  
  training the sequence model for the target language using posterior regularization with soft constraints on the filtered target side to learn a set of parameters for the target language;
  
  using the set of parameters for the target language, obtaining a trained sequence model for the target language, the trained sequence model being configured to model a probability distribution over possible labels for text in the target language;
  
  receiving an input text in the target language;
  
  analyzing the input text using the trained sequence model for the target language; and
  
  generating an output based on the analyzing of the input text using the trained sequence model.
- View Dependent Claims (10, 11, 12)
- - 10. The computing device of claim 9, wherein training the sequence model using posterior regularization with soft constraints includes optimizing an objective function using stochastic projected gradients for parameters for the sequence model and optimal dual variables, wherein the objective function is defined as:
  - 11. The computing device of claim 9, wherein the source language is a resource-rich language having greater than an amount of labeled training data required to train the supervised sequence model for the source language, and wherein the target language is a resource-poor language having less than an amount of labeled training data required to train the sequence model for the target language.
  - 12. The computing device of claim 9, wherein:
    - the input text is a question; and
      
      the generated output is an answer to the question.

13. A non-transitory, computer-readable medium having instructions stored thereon that, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising:
- obtaining (i) an aligned bi-text for a source language and a target language, the aligned bi-text comprising a plurality of source-target sentence pairs, and (ii) a supervised sequence model for the source language, the source language being a resource-rich language having greater than an amount of labeled training data required to train the supervised sequence model, the target language being a resource-poor language having less than an amount of labeled training data required to train the sequence model for the target language;
  
  labeling a source side of the aligned bi-text using the supervised sequence model to obtain a labeled source side of the aligned bi-text;
  
  projecting labels from the labeled source side to a target side of the aligned bi-text to obtain a labeled target side of the aligned bi-text, wherein every label in the labeled source and target sides of the aligned bi-text is a named entity type tag for a particular word;
  
  filtering the labeled target side to obtain a filtered target side of the aligned bi-text for training a sequence model for the target language for a named entity segmentation system, wherein the filtering comprises discarding any particular source-target sentence pair when (i) a threshold amount of tokens of the particular source-target sentence pair are unaligned or (ii) a source named entity of the particular source-target sentence pair is not aligned with a target sentence token;
  
  training the sequence model for the target language using posterior regularization with soft constraints on the filtered target side to learn a set of parameters for the target language;
  
  obtaining a trained sequence model for the target language using the set of parameters for the target language, the trained sequence model being configured to model a probability distribution over possible labels for text in the target language;
  
  receiving an input text in the target language;
  
  analyzing the input text using the trained sequence model for the target language; and
  
  generating an output based on the analyzing of the input text using the trained sequence model.
- View Dependent Claims (14)
- - 14. The computer-readable medium of claim 13, wherein training the sequence model using posterior regularization with soft constraints includes optimizing an objective function using stochastic projected gradients for parameters for the sequence model and optimal dual variables, wherein the objective function is defined as:

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Das, Dipanjan, Ganchev, Kuzman
Primary Examiner(s)
He, Jialong

Application Number

US14/105,973
Publication Number

US 20150169549A1
Time in Patent Office

1,390 Days
Field of Search

704 1- 10
US Class Current
CPC Class Codes

G06F 40/237   Lexical tools

G06F 40/45   Example-based machine trans...

G06F 40/58   Use of machine translation,...

Cross-lingual discriminative learning of sequence models with posterior regularization

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

32 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Cross-lingual discriminative learning of sequence models with posterior regularization

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links