IDENTIFYING GLOSSARY TERMS FROM NATURAL LANGUAGE TEXT DOCUMENTS
First Claim
1. A device, comprising:
- one or more processors to;
obtain text of a document to be analyzed to identify glossary terms included in the text;
perform a linguistic unit analysis on a linguistic unit, included in the text, to generate a plurality of ambiguous linguistic units from the linguistic unit;
resolve the plurality of ambiguous linguistic units to generate a set of potential glossary terms that includes a subset of the plurality of ambiguous linguistic units;
perform a glossary term analysis on the set of potential glossary terms to generate a set of glossary terms that includes a subset of the set of potential glossary terms;
identify a set of included terms, of the set of potential glossary terms, that are included in the set of glossary terms;
identify a set of excluded terms, of the set of potential glossary terms, that are excluded from the set of glossary terms;
determine a semantic relatedness score between at least one excluded term, of the set of excluded terms, and at least one included term, of the set of included terms;
selectively add the excluded linguistic term to the set of glossary terms to form a final set of glossary terms based on the semantic relatedness score; and
output the final set of glossary terms for the document.
1 Assignment
0 Petitions
Accused Products
Abstract
A device may obtain text to be analyzed to identify glossary terms. The device may analyze a linguistic unit to generate multiple linguistic units related to the linguistic unit. The device may analyze the multiple linguistic units to generate potential glossary terms. The device may perform a glossary term analysis on the potential glossary terms to generate glossary terms that include a subset of the potential glossary terms. The device may identify included terms that are included in the glossary terms. The device may identify excluded terms that are excluded from the glossary terms. The device may determine a semantic relatedness score between at least one excluded term and at least one included term. The device may selectively add the excluded linguistic term to the glossary terms to form a final set of glossary terms based on the semantic relatedness score, and may output the final set of glossary terms.
-
Citations
20 Claims
-
1. A device, comprising:
one or more processors to; obtain text of a document to be analyzed to identify glossary terms included in the text; perform a linguistic unit analysis on a linguistic unit, included in the text, to generate a plurality of ambiguous linguistic units from the linguistic unit; resolve the plurality of ambiguous linguistic units to generate a set of potential glossary terms that includes a subset of the plurality of ambiguous linguistic units; perform a glossary term analysis on the set of potential glossary terms to generate a set of glossary terms that includes a subset of the set of potential glossary terms; identify a set of included terms, of the set of potential glossary terms, that are included in the set of glossary terms; identify a set of excluded terms, of the set of potential glossary terms, that are excluded from the set of glossary terms; determine a semantic relatedness score between at least one excluded term, of the set of excluded terms, and at least one included term, of the set of included terms; selectively add the excluded linguistic term to the set of glossary terms to form a final set of glossary terms based on the semantic relatedness score; and output the final set of glossary terms for the document. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A computer-readable medium storing instructions, the instructions comprising:
one or more instructions that, when executed by one or more processors, cause the one or more processors to; obtain text to be analyzed to identify glossary terms included in the text; perform a linguistic unit analysis on a linguistic unit, included in the text, to generate a plurality of linguistic units related to the linguistic unit; analyze the plurality of linguistic units to generate a set of potential glossary terms that includes a subset of the plurality of linguistic units; perform a glossary term analysis on the set of potential glossary terms to generate a set of glossary terms that includes a subset of the set of potential glossary terms; identify a set of included terms, of the set of potential glossary terms, that are included in the set of glossary terms; identify a set of excluded terms, of the set of potential glossary terms, that are excluded from the set of glossary terms; determine a semantic relatedness score between at least one excluded term, of the set of excluded terms, and at least one included term, of the set of included terms; selectively add the excluded linguistic term to the set of glossary terms to form a final set of glossary terms based on the semantic relatedness score; and output the final set of glossary terms. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
14. A method, comprising:
-
obtaining, by a device, text to be analyzed to identify glossary terms included in the text; performing, by the device, a linguistic unit analysis on a linguistic unit, included in the text, to generate a plurality of ambiguous linguistic units from the linguistic unit; analyzing, by the device, the plurality of ambiguous linguistic units to generate a set of potential glossary terms that includes a subset of the plurality of ambiguous linguistic units; performing, by the device, a glossary term analysis on the set of potential glossary terms to generate a set of glossary terms that includes a subset of the set of potential glossary terms; identifying, by the device, a set of included terms, of the set of potential glossary terms, that are included in the set of glossary terms; identifying, by the device, a set of excluded terms, of the set of potential glossary terms, that are excluded from the set of glossary terms; determining, by the device, a semantic relatedness score between an excluded term, of the set of excluded terms, and an included term, of the set of included terms; selectively adding, by the device, the excluded linguistic term to the set of glossary terms to form a final set of glossary terms based on the semantic relatedness score; and outputting, by the device, the final set of glossary terms. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification