Identifying documents which form translated pairs, within a document collection
First Claim
Patent Images
1. A method, comprising:
- obtaining a group of documents;
determining reduced size versions of said documents; and
comparing said reduced size versions, to determine documents that represent similar information; and
using said documents that represent similar information for training for a text-to-text application.
2 Assignments
0 Petitions
Accused Products
Abstract
A training system for text to text application. The training system finds groups of documents, and identifies automatically similar documents in the groups which are similar. The automatically identified documents can then be used for training of the text to text application. The comparison uses reduced size versions of the documents in order to minimize the amount of processing.
116 Citations
27 Claims
-
1. A method, comprising:
-
obtaining a group of documents;
determining reduced size versions of said documents; and
comparing said reduced size versions, to determine documents that represent similar information; and
using said documents that represent similar information for training for a text-to-text application. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a database, including a group of documents;
a processor that determines reduced size versions of said documents and compares said reduced size versions, to determine documents within the group that represent similar information; and
a text to text application module, using said documents that represent similar information for training for a text-to-text application. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method, comprising:
-
obtaining a first group of documents in a first language, and a second group of documents in a second language;
carrying out a rough translation of said documents in said second language, to form a third group of translated documents, that have been translated to said first language;
determining reduced size versions of said first and third groups of documents; and
comparing said reduced size versions, to determine documents that represent similar information. - View Dependent Claims (20, 21, 22)
-
-
23. A method, comprising:
-
obtaining a group of documents that includes documents that are in at least a first language and a second language;
Determining reduced size versions of at least some of said documents; and
comparing said reduced size versions, to determine a first document in said first language that represents similar information to a second document in a second language. - View Dependent Claims (24, 25, 26, 27)
-
Specification