Methods, devices and systems for data augmentation to improve fraud detection
First Claim
1. A computer-implemented method for modifying an original electronic text document of a corpus of electronic text documents, comprising:
- receiving the original electronic text document in a computer having a memory;
repeatedly translating the received original electronic text document, using at least one machine translation engine, wherein each translated electronic text document is used as a basis for a subsequent translation into another language;
re-translating a last-translated electronic text document back into an original language of the original electronic text document;
transforming the re-translated electronic text document by selecting at least one word therein and substituting a respective synonym for each selected word to generate a synonym-replaced electronic text document;
transforming the synonym-replaced electronic text document by selecting at least one word therein and substituting a respective misspelled word for each selected word to generate an modified electronic text document;
computing a similarity measure between the original electronic text document and the modified electronic text document;
determining whether the computed similarity measure is at least as great as a predetermined similarity threshold; and
storing the modified electronic text document in the memory if the computed similarity threshold is greater than or equal to the predetermined similarity threshold and not storing the modified electronic text document in the memory if the computed similarity threshold is less than the predetermined similarity threshold.
5 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method of generating an augmented electronic text document comprises establishing a directed multigraph where each vertex is associated with a separate language and is connected to at least one other one of the vertices by an oriented edge indicative of a machine translation engine'"'"'s ability to translate between languages associated with the vertices connected by the oriented edge with acceptable performance. The directed multigraph is then traversed starting at a predetermined origin vertex associated with an original language of the original electronic text document by randomly selecting an adjacent vertex pointed to by an oriented edge connected to the predetermined origin vertex and causing a machine translation engine to translate the original electronic text document from the original language to a language associated with the selected vertex. The directed multigraph is then further traversed as allowed by the oriented edges from the intermediate vertex to successive other next-adjacent connected vertices, each time machine translating a previously-translated electronic text document into a language associated with a randomly-selected next-adjacent vertex until the predetermined origin vertex is selected and the previously translated electronic text document is re-translated into the original language and designated as the augmented electronic text document.
60 Citations
16 Claims
-
1. A computer-implemented method for modifying an original electronic text document of a corpus of electronic text documents, comprising:
-
receiving the original electronic text document in a computer having a memory; repeatedly translating the received original electronic text document, using at least one machine translation engine, wherein each translated electronic text document is used as a basis for a subsequent translation into another language; re-translating a last-translated electronic text document back into an original language of the original electronic text document; transforming the re-translated electronic text document by selecting at least one word therein and substituting a respective synonym for each selected word to generate a synonym-replaced electronic text document; transforming the synonym-replaced electronic text document by selecting at least one word therein and substituting a respective misspelled word for each selected word to generate an modified electronic text document; computing a similarity measure between the original electronic text document and the modified electronic text document; determining whether the computed similarity measure is at least as great as a predetermined similarity threshold; and storing the modified electronic text document in the memory if the computed similarity threshold is greater than or equal to the predetermined similarity threshold and not storing the modified electronic text document in the memory if the computed similarity threshold is less than the predetermined similarity threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computing device comprising:
-
at least one processor; at least one data storage device coupled to the at least one processor; a network interface coupled to the at least one processor and to a computer network; a plurality of processes spawned by the at least one processor to modify an original electronic text document of a corpus of electronic text documents, the processes including processing logic for; repeatedly translating the original electronic text document, using at least one machine translation engine, wherein each translated text document is used as a basis for a subsequent translation into another language; re-translating a last-translated electronic text document back into an original language of the original electronic text document; transforming the re-translated electronic text document by selecting at least one word therein and substituting a respective synonym for each selected word to generate a synonym-replaced electronic text document; transforming the synonym-replaced electronic text document by selecting at least one word therein and substituting a respective misspelled word for each selected word to generate an modified electronic text document; computing a similarity measure between the original electronic text document and the modified electronic text document; determining whether the computed similarity measure is at least as great as a predetermined similarity threshold; and storing the modified text in the data storage device if the computed similarity measure is greater than or equal to the predetermined similarity threshold and discarding and not storing the modified electronic text document in the data storage device if the computed similarity measure is less than the predetermined similarity threshold. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification