Generating distributed word embeddings using structured information
First Claim
1. A method for generating a vector representation of a set of natural language text in a natural language processing system, the method comprising:
- receiving, by the natural language processing system, a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes metadata and corresponding contextual information indicating a relationship between the metadata and the first set of natural language text;
determining, by the natural language processing system, a substitute set of natural language text, wherein the substitute set of natural language text includes the first set of natural language text, the metadata, and the corresponding contextual information indicating the relationship between the metadata and the first set of natural language text;
generating, by the natural language processing system, a first vector representation of the substitute set of natural language text; and
comparing, by the natural language processing system, the generated first vector representation to a second vector representation to determine, in the natural language processing system, an amount of similarity between the first set of natural language text and a second set of natural language text represented by the second vector representation,wherein;
the first set of natural language text is a verb,the contextual information corresponding to the metadata includes a dependency parse tree,the dependency parse tree includes a root node and a plurality of nodes that depend from the root node,the root node represents the first set of natural language text,the plurality of nodes that depend from the root node represent context features of the first set of natural language text, andthe generating of the first vector representation of the substitute set of natural language text includes adding vector representations for the context features represented by the plurality of nodes that depend from the root node.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer program that uses structured information, such as syntactic and semantic information, as context for representing words and/or phrases as vectors, by performing the following steps: (i) receiving a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes metadata and corresponding contextual information indicating a relationship between the metadata and the first set of natural language text; and (ii) generating a first vector representation for the first set of natural language text utilizing the metadata and its corresponding contextual information.
-
Citations
6 Claims
-
1. A method for generating a vector representation of a set of natural language text in a natural language processing system, the method comprising:
-
receiving, by the natural language processing system, a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes metadata and corresponding contextual information indicating a relationship between the metadata and the first set of natural language text; determining, by the natural language processing system, a substitute set of natural language text, wherein the substitute set of natural language text includes the first set of natural language text, the metadata, and the corresponding contextual information indicating the relationship between the metadata and the first set of natural language text; generating, by the natural language processing system, a first vector representation of the substitute set of natural language text; and comparing, by the natural language processing system, the generated first vector representation to a second vector representation to determine, in the natural language processing system, an amount of similarity between the first set of natural language text and a second set of natural language text represented by the second vector representation, wherein; the first set of natural language text is a verb, the contextual information corresponding to the metadata includes a dependency parse tree, the dependency parse tree includes a root node and a plurality of nodes that depend from the root node, the root node represents the first set of natural language text, the plurality of nodes that depend from the root node represent context features of the first set of natural language text, and the generating of the first vector representation of the substitute set of natural language text includes adding vector representations for the context features represented by the plurality of nodes that depend from the root node. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification