DOMAIN SPECIFIC REPRESENTATION OF DOCUMENT TEXT FOR ACCELERATED NATURAL LANGUAGE PROCESSING
1 Assignment
0 Petitions
Accused Products
Abstract
Provided are techniques for a domain specific representation of document text for accelerated natural language processing. A document is selected from a set of documents to be analyzed. A character stream from the document is converted into a token stream based on tokenization rules. Irrelevant tokens are removed from the token stream. The tokens remaining in the token stream are converted into an integer domain representation based on a domain specific ontology dictionary. The integer domain representation are stored to a Graphics Processing Unit (GPU) processing queue of each of one or more GPUs. Then, a result set is received from the one or more GPUs.
13 Citations
30 Claims
-
1-10. -10. (canceled)
-
11. A computer program product, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable by one of a Central Processing Unit (CPU) processor and at least one Graphics Processing Unit (GPU) processor to perform:
-
selecting a document from a set of documents to be analyzed; converting a character stream from the document into a token stream based on tokenization rules; removing irrelevant tokens from the token stream; converting tokens remaining in the token stream into an integer domain representation based on a domain specific ontology dictionary; storing the integer domain representation to a Graphics Processing Unit (GPU) processing queue of each of one or more GPUs; and receiving a result set from the one or more GPUs. - View Dependent Claims (12, 13, 14, 15, 21, 22, 23, 24, 25)
-
-
16. A computer system, comprising:
-
one or more Central Processing Unit (CPU) processors and Graphics Processing Unit (GPU) processors, one or more computer-readable memories and one or more computer-readable, tangible storage devices; and program instructions, stored on at least one of the one or more computer-readable, tangible storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to perform; selecting a document from a set of documents to be analyzed; converting a character stream from the document into a token stream based on tokenization rules; removing irrelevant tokens from the token stream; converting tokens remaining in the token stream into an integer domain representation based on a domain specific ontology dictionary; storing the integer domain representation to a Graphics Processing Unit (GPU) processing queue of each of one or more GPUs; and receiving a result set from the one or more GPUs. - View Dependent Claims (17, 18, 19, 20, 26, 27, 28, 29, 30)
-
Specification