×

Identifying glossary terms from natural language text documents

  • US 9,460,078 B2
  • Filed: 11/27/2013
  • Issued: 10/04/2016
  • Est. Priority Date: 12/06/2012
  • Status: Active Grant
First Claim
Patent Images

1. A device, comprising:

  • one or more processors to;

    receive, using an input component, a request to process text of a document to identify glossary terms included in the text;

    determine, using the one or more processors and based on the request, a plurality of sections of the text to process; and

    process a first section, of the plurality of sections, in parallel with a second section, of the plurality of sections, to identify the glossary terms included in the text,when processing the first section in parallel with the second section, the one or more processors are, for each of the first section and the second section, to;

    determine a linguistic unit analysis technique based on a file format of a file that includes the text;

    perform, using the linguistic unit analysis technique, a linguistic unit analysis on a linguistic unit, included in the text, to generate a plurality of ambiguous linguistic units from the linguistic unit, 

    the one or more processors, when performing the linguistic unit analysis on the linguistic unit to generate the plurality of ambiguous linguistic units, being to;



    perform at least one of;



    a coordinating conjunction analysis that generates the plurality of ambiguous linguistic units from the linguistic unit when the linguistic unit includes a coordinating conjunction, 

    an adjectival modifier analysis that generates the plurality of ambiguous linguistic units from the linguistic unit when the linguistic unit includes an adjective, or 

    a headword analysis that generates the plurality of ambiguous linguistic units from the linguistic unit when the linguistic unit includes an abstract noun;

    resolve the plurality of ambiguous linguistic units to generate a set of potential glossary terms that includes a subset of the plurality of ambiguous linguistic units;

    perform a glossary term analysis on the set of potential glossary terms to generate a set of glossary terms that includes a subset of the set of potential glossary terms;

    identify a set of included terms, of the set of potential glossary terms, that are included in the set of glossary terms;

    identify a set of excluded terms, of the set of potential glossary terms, that are excluded from the set of glossary terms;

    determine a semantic relatedness score between at least one excluded term, of the set of excluded terms, and at least one included term, of the set of included terms;

    selectively add the at least one excluded term to the set of glossary terms to form a final set of glossary terms based on the semantic relatedness score; and

    output, using an output component, the final set of glossary terms for the document for presentation via a user interface.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×