Optimization of text-based training set selection for language processing modules
First Claim
1. A method of selecting a database from a corpus, the method comprising:
- defining a size of a database;
calculating a coefficient for at least one pair in a set of pairs; and
executing a function to select each entry to be saved in the database until a number of entries of the database equals the size of the database.
9 Assignments
0 Petitions
Accused Products
Abstract
A device and a method provide for selection of a database from a corpus using an, optimization function. The method includes defining a size of a database, calculating a distance using a distance function for each pair in a set of pairs, and executing an optimization function using the distance to select each entry saved in the database until the number of saved entries equals the size of the database. Each pair in the set of pairs includes either two entries selected from a corpus or one entry selected from a set of previously selected entries and another entry selected from a set of a remaining portion of the corpus. The distance function may be a Levenshtein distance function or a generalized Levenshtein distance function.
18 Citations
32 Claims
-
1. A method of selecting a database from a corpus, the method comprising:
-
defining a size of a database;
calculating a coefficient for at least one pair in a set of pairs; and
executing a function to select each entry to be saved in the database until a number of entries of the database equals the size of the database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product for selecting a database from a corpus, the computer code configured to:
-
define a size of a database;
calculate a coefficient for at least one pair in a set of pairs; and
execute a function to select each entry to be saved in the database until a number of entries of the database equals the size of the database. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A device configured for selecting a database from a corpus, the device comprising a processor and computer code stored into memory, the computer code configured to:
-
define a size of a database;
calculate a coefficient for at least one pair in a set of pairs; and
execute a function to select each entry to be saved in the database until a number of entries of the database equals the size of the database. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A system for processing linguistic inputs to determine an output, the system comprising:
-
defining a size of a database;
calculating a coefficient for at least one pair in a set of pairs; and
executing a function to select each entry to be saved in the database until a number of entries of the database equals the size of the database. - View Dependent Claims (24, 25, 26, 27)
-
-
28. A module configured for selecting a database from a corpus, the module configured to:
-
define a size of a database;
calculate a coefficient for at least one pair in a set of pairs; and
execute a function to select each entry to be saved in the database until a number of entries of the database equals the size of the database. - View Dependent Claims (29, 30, 31, 32)
-
Specification