Task parallelization in a text-to-text system

US 7,389,222 B1
Filed: 04/26/2006
Issued: 06/17/2008
Est. Priority Date: 08/02/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

dividing a corpus of information among multiple work units and carrying out a text-to text operation in each of said work units; and

maintaining a single parameter table for all the work carried out in all the work units, wherein said parameter table is a probability table with probabilities of word to word translation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Parallelization of word alignment for a text-to-text operation. The training data is divided into multiple groups, and training is carried out of each group on separate processors. Different techniques can be carried out to increase the speed of the processing. The hookups can be done only once for all of multiple different iterations. Moreover, parallel operations can apply only to the counts, since this may be the most time-consuming part.

124 Citations

View as Search Results

20 Claims

1. A method comprising:
- dividing a corpus of information among multiple work units and carrying out a text-to text operation in each of said work units; and
  
  maintaining a single parameter table for all the work carried out in all the work units, wherein said parameter table is a probability table with probabilities of word to word translation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A method as in claim 1, wherein said text-to-text determination operation is a word alignment.
  - 3. A method as in claim 1, wherein said text to text operation that is carried out in each of said work units forms a table of counts based on probabilities of hookups for word to word pairing.
  - 4. A method as in claim 1, wherein said text to text operation is carried out in multiple computing iterations.
  - 5. A method as in claim 4, wherein at least one subsequent iteration uses a parameter table from a previous iteration.
  - 6. A method as in claim 4, wherein said multiple computing iterations include a first iteration which computes word-to-word hookup information, and a subsequent iteration which uses said hookup information from the first iteration.
  - 7. A method as in claim 1, wherein said text-to-text operation uses a model 1 algorithm.
  - 8. A method as in claim 1, wherein said dividing comprises a random division of information.
  - 9. A method as in claim 1, wherein said dividing comprises sorting information, and selecting units of information based on said sorting.
  - 10. A method as in claim 1, further comprising monitoring said each of said work units to detect work units that are requiring longer calculation times than other work units.
  - 11. A method as in claim 1, wherein said carrying out a determination comprises doing an initialization, and subsequently doing multiple iterations beyond said initialization.
  - 12. A method as in claim 11, further comprising carrying out alignment after said iterations.

13. A computer system, comprising:
- a master computer, connected to a corpus of training information about text-to-text operations, having a plurality of work unit computers, having separate processors from said master computer, and said master computer running a routine that maintains a table of information related to training based on said corpus, a routine that provides separated portions of said corpus and said work unit computers, and accumulates information indicative of training each of said work unit computers and maintains said table of information, wherein said table of information includes a probability of word to word translation.
- View Dependent Claims (14, 15, 16, 17)
- - 14. A computer system as in claim 13, wherein said master computer also processes at least one of said separated portions of said corpus.
  - 15. A computer system as in claim 13, wherein said training in said working unit computers comprises a word alignment operation.
  - 16. A computer system as in claim 13, wherein said training in said work unit computers comprises multiple computing iterations based on the same data.
  - 17. A computer system as in claim 13, further comprising using a parameter table from a previous iteration in a subsequent iteration.

18. A method, comprising:
- dividing a training corpus into at least a plurality of groups;
  
  carrying out a training operation for a text to text application substantially simultaneously on each of said plurality of groups, using separate processors for each of said groups and using a single table of information indicative of word probabilities, for each of said groups, and using said training operation to update said single probability table based on training information obtained from each of said groups, wherein said single probability table comprises probabilities of word to word translations.
- View Dependent Claims (19, 20)
- - 19. A method as in claim 18, wherein said training operation is a word alignment.
  - 20. A method as in claim 18, wherein said training operation is a computation of counts.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SDL PLC (RWS Holdings Plc)
Original Assignee
Language Weaver, Inc. (RWS Holdings Plc)
Inventors
Langmead, Greg, Marcu, Daniel, Yamada, Kenji, Knight, Kevin
Primary Examiner(s)
Hudspeth; David
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US11/412,307
Time in Patent Office

783 Days
Field of Search

None
US Class Current

704/2
CPC Class Codes

G06F 40/45 Example-based machine trans...

Task parallelization in a text-to-text system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

124 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Task parallelization in a text-to-text system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

124 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others