Generating a domain ontology using word embeddings
First Claim
Patent Images
1. A device, comprising:
- one or more processors to;
generate a set of distributed word vectors from a list of terms determined from a text using a vector model associated with generating the set of distributed word vectors,the set of distributed word vectors representing a plurality of real numbers for each term in the list of terms;
determine a quantity of term clusters, to be generated to form an ontology of terms in the text, based on the set of distributed word vectors and using a statistical technique;
generate term clusters, representing concepts of the ontology of terms, based on the quantity of term clusters and using a recursive divisive clustering technique;
perform a frequency analysis for terms included in the ontology of terms;
determine non-hierarchical relationships or attributes for relationships between the terms included in the ontology of terms based on the frequency analysis; and
output the term clusters, and data identifying the non-hierarchical relationships or attributes for relationships, to permit another device to analyze a set of documents using the term clusters.
1 Assignment
0 Petitions
Accused Products
Abstract
A device may receive a text, from a text source, in association with a request to generate an ontology for the text. The device may generate a set of word vectors from a list of terms determined from the text. The device may determine a quantity of term clusters to be generated to form the ontology based on the set of word vectors. The device may generate term clusters based on the quantity of term clusters, attributes, and/or non-hierarchical relationships. The term clusters may be associated with concepts of the ontology. The device may provide the term clusters for display via a user interface associated with a device.
18 Citations
20 Claims
-
1. A device, comprising:
one or more processors to; generate a set of distributed word vectors from a list of terms determined from a text using a vector model associated with generating the set of distributed word vectors, the set of distributed word vectors representing a plurality of real numbers for each term in the list of terms; determine a quantity of term clusters, to be generated to form an ontology of terms in the text, based on the set of distributed word vectors and using a statistical technique; generate term clusters, representing concepts of the ontology of terms, based on the quantity of term clusters and using a recursive divisive clustering technique; perform a frequency analysis for terms included in the ontology of terms; determine non-hierarchical relationships or attributes for relationships between the terms included in the ontology of terms based on the frequency analysis; and output the term clusters, and data identifying the non-hierarchical relationships or attributes for relationships, to permit another device to analyze a set of documents using the term clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A non-transitory computer-readable medium storing instructions, the instructions comprising:
one or more instructions that, when executed by one or more processors, cause the one or more processors to; receive a text, from a text source, in association with a request to generate an ontology for the text; generate a set of distributed word vectors from a list of terms determined from the text, the set of distributed word vectors representing a plurality of real numbers for each term in the list of terms; determine a quantity of term clusters to be generated to form the ontology based on the set of distributed word vectors; generate term clusters based on the quantity of term clusters and using a recursive divisive clustering technique, the term clusters being associated with concepts of the ontology; perform a frequency analysis for terms included in the ontology; determine non-hierarchical relationships or attributes for relationships between the terms included in the ontology based on the frequency analysis; and provide the term clusters, and data identifying the non-hierarchical relationships or attributes for relationships, for display via a user interface associated with a device. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
16. A method, comprising:
-
generating, by a device, a set of distributed word vectors from a list of terms determined from a text, the set of distributed word vectors representing a plurality of real numbers for each term in the list of terms; determining, by the device, a quantity of term clusters, to be generated to form an ontology of terms in the text, based on the set of distributed word vectors; generating, by the device, term clusters based on the quantity of term clusters and using a recursive divisive clustering technique; determining, by the device, term sub-clusters associated with the term clusters; generating, by the device, a hierarchy of term clusters for the ontology of terms based on the term clusters and the term sub-clusters; performing, by the device, a frequency analysis for terms included in the ontology of terms; determining, by the device, non-hierarchical relationships or attributes for relationships between the terms included in the ontology of terms based on the frequency analysis; and providing, by the device, the term clusters, data identifying the non-hierarchical relationships or attributes for relationships, and the term sub-clusters to permit processing of another text. - View Dependent Claims (17, 18, 19, 20)
-
Specification