Adapter for allowing both online and offline training of a text to text system

US 7,624,020 B2
Filed: 09/09/2005
Issued: 11/24/2009
Est. Priority Date: 09/09/2005
Status: Active Grant

- Alert
- Pin

First Claim

Patent Images

1. A computer implemented method, comprising:

first carrying out a first generic training using at least one corpus of language information based at least in part on Internet information, using a first generic training operation to obtain a first generic parameter set;

second carrying out a second domain specific training using a fast train module associated with a domain specific corpus, said fast train module including a second domain specific training operation which operates faster than said first generic training operation, and which is less accurate than said first generic training operation, to obtain a second domain specific parameter set;

merging said first generic parameter set and said second domain specific parameter set into a merged parameter set, and using said merged parameter set for a text to text operation, wherein said merging comprises a weighted merge between said first generic parameter set and said second domain specific parameter set; and

using said second domain specific parameter set to adapt said first generic parameter set to carry out said to text operation, wherein said using comprises using partial information from the first generic training and partial information from the second domain specific training, forming an original table and an override table, and using both said original table and said override table as part of said text to text operation.

View all claims

2 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

An adapter for a text to text training. A main corpus is used for training, and a domain specific corpus is used to adapt the main corpus according to the training information in the domain specific corpus. The adaptation is carried out using a technique that may be faster than the main training. The parameter set from the main training is adapted using the domain specific part.

Citations

24 Claims

1. A computer implemented method, comprising:
- first carrying out a first generic training using at least one corpus of language information based at least in part on Internet information, using a first generic training operation to obtain a first generic parameter set;
  
  second carrying out a second domain specific training using a fast train module associated with a domain specific corpus, said fast train module including a second domain specific training operation which operates faster than said first generic training operation, and which is less accurate than said first generic training operation, to obtain a second domain specific parameter set;
  
  merging said first generic parameter set and said second domain specific parameter set into a merged parameter set, and using said merged parameter set for a text to text operation, wherein said merging comprises a weighted merge between said first generic parameter set and said second domain specific parameter set; and
  
  using said second domain specific parameter set to adapt said first generic parameter set to carry out said to text operation, wherein said using comprises using partial information from the first generic training and partial information from the second domain specific training, forming an original table and an override table, and using both said original table and said override table as part of said text to text operation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. A computer implemented method as in claim 1, wherein said text to text operation is a translation between first and second languages.
  - 3. A computer implemented method as in claim 1, wherein said merging comprises an adaptive training merge between said first generic parameter set and said second domain specific parameter set.
  - 4. A computer implemented method as in claim 1, wherein said weighted merge is sensitive to frequency of specified terms in the corpus.
  - 5. A computer implemented method as in claim 1, wherein said second domain specific training operation uses parameters from said first generic training operation.
  - 6. A computer implemented method as in claim 5, wherein said second domain specific training operation uses a basic seed probability from the first generic training operation.
  - 7. A computer implemented method as in claim 1, wherein said merging uses an adaptive merging.
  - 8. A computer implemented method as in claim 7, wherein said adaptive merging uses a merge which is proportional to a frequency of a specified term in a training database.
  - 9. A computer implemented method as in claim 1, wherein said merging comprises adding indications of counts.
  - 10. A computer implemented method as in claim 1, wherein said merging comprises adding information that represent counts related to alignment.
  - 11. A computer implemented method as in claim 1, wherein said first carrying out is carried out at a first location, and said second carrying out is carried out at a second location, different than said first location.
  - 12. A computer implemented method as in claim 1, wherein the override table includes precomputed versions of specified formulas.
  - 13. A computer implemented method as in claim 1, wherein the partial information includes probabilities.
  - 14. A computer implemented method as in claim 1, wherein the partial information includes counts.

15. An apparatus, comprising:
- a first training computer at a first location, carrying out a first generic training using at least one corpus of information based at least in part on Internet information, using a first generic training operation to obtain a first generic parameter set; and
  
  a second training computer, at a second location, different than the first location, carrying out a second domain specific training using a fast train module associated with a domain specific corpus that has different information than said at least one corpus, said fast train module including a second domain specific training operation which operates faster than said first generic training operation, and which is less accurate than said first generic training operation, to obtain a second domain specific parameter set, and using said first generic parameter set and said second domain specific parameter set together for a text to text operation,wherein said second training computer also operates to merge said first generic parameter set and said second domain specific parameter set into a merged parameter set, to use said merged parameter set for said text to text operation, and to carry out a weighted merge between said first generic parameter set and said second domain specific parameter set, andwherein said training second computer uses partial information from the first generic training and partial information from the second domain specific training, forms an original table and an override table, and uses both said original table and said override table as part of said text to text operation.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. An apparatus as in claim 15, wherein said text to text operation is a translation between first and second languages.
  - 17. An apparatus as in claim 15, wherein said second training computer carries out an adaptive training merge between said first generic parameter set and said second domain specific parameter set.
  - 18. An apparatus as in claim 15, wherein said override table represents information which is present in both the at least one corpus and the domain specific corpus.
  - 19. An apparatus as in claim 15, wherein the override table includes precomputed versions of specified formulas.
  - 20. An apparatus as in claim 15, wherein the partial information includes probabilities.

21. An apparatus, comprising:
- a training part including at least one computer, which carries out a first generic training for a text to text operation using at least one corpus of training information based at least in part on Internet information, to obtain a first generic parameter set and at a different time than first generic training, carrying out a second domain specific training using a fast train module associated with a domain specific corpus that has different information than said at least one corpus, said fast train module including a second domain specific training operation which operates faster than said first generic training operation, and which is less accurate than said first generic training operation, to obtain a second domain specific parameter set and using said second domain specific parameter set to adapt said first generic parameter set to create an adapted parameter set, and to use the adapted parameter set for a text to text operation,wherein said at least one training computer merges said first generic parameter set and said second domain specific parameter set into a merged parameter set, and uses said merged parameter set for said text to text operation, and carries out a weighted merge between said first generic parameter set and said second domain specific parameter set, andwherein said at least one training computer uses partial information from the first generic training and partial information from the second domain specific training, forms an original table and an override table, and uses both said original table and said override table as part of said text to text operation.
- View Dependent Claims (22, 23, 24)
- - 22. An apparatus as in claim 21, wherein said text to text operation is a translation between first and second languages.
  - 23. An apparatus as in claim 21, wherein said training computer carries out an adaptive training merge between the first generic parameter set and said second domain specific parameter set.
  - 24. An apparatus as in claim 21, wherein said weighted merge is sensitive to frequency of specified terms in the corpus.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
SDL Inc (RWS Holdings Plc)
Original Assignee
Language Weaver, Inc. (RWS Holdings Plc)
Inventors
Yamada, Kenji, Langmead, Greg, Knight, Kevin
Primary Examiner(s)
DeCady; Albert
Assistant Examiner(s)
Stevens; Thomas H

Application Number

US11/223,823
Publication Number

US 20070094169A1
Time in Patent Office

1,537 Days
Field of Search

706/15, 706/10, 704/8, 704/237, 704/256.2, 704/258, 704/277
US Class Current

704/277
CPC Class Codes

G06F 40/42 Data-driven translation

Adapter for allowing both online and offline training of a text to text system

First Claim

2 Assignments

Litigations

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Adapter for allowing both online and offline training of a text to text system

First Claim

2 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links