×

Domain specific representation of document text for accelerated natural language processing

  • US 9,898,447 B2
  • Filed: 06/22/2015
  • Issued: 02/20/2018
  • Est. Priority Date: 06/22/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer program product, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable by one of a Central Processing Unit (CPU) processor and at least one Graphics Processing Unit (GPU) processor to perform:

  • selecting a document from a set of documents to be analyzed;

    converting a character stream from the document into a token stream based on tokenization rules;

    removing irrelevant tokens from the token stream;

    converting tokens remaining in the token stream into an integer domain representation based on a domain specific ontology dictionary, wherein each of the tokens is mapped to an integer based on mappings in the domain specific ontology dictionary;

    storing the integer domain representation to a Graphics Processing Unit (GPU) processing queue of each of one or more GPUs;

    receiving a result set from the one or more GPUs, wherein the result set includes tuples of

         1) a specific pattern and

         2) an offset from a beginning of the integer domain representation, and wherein the result set is generated using a compiled super Regular Expression (REGEX) that is compiled using the domain specific ontology dictionary;

    storing the result set into an index for use in processing a search query; and

    persisting the integer domain representation once for the document for processing of the document with a different compiled super REGEX.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×