Generating feature vectors from RDF graphs
First Claim
1. A method of preparing feature vectors suitable for machine learning or machine classification from a Resource Description Framework graph of a document relevant to a topic of interest, the method comprising:
- receiving a set of identified key-attributes that are node names of interest, root-attributes of interest, additional-attributes of interest, and connection information identifying connected nodes to search for at least some of the additional-attributes of interest, the additional-attributes of interest determined from information external to the documentgenerating a plurality of responsive node-feature vectors for the document represented as the Resource Description Framework graph, including;
collecting the root-attributes of interest from a root node of the document;
querying for and receiving responsive nodes in the Resource Description Framework graph that include the key-attributes;
for each responsive node, creating a responsive node-feature vector, wherein the responsive node-feature vector includes;
from the root node, at least some of the collected root-attributes;
from the responsive node, the additional-attributes of interest present in the responsive node; and
as directed by the connection information, from nodes connected to the responsive node by a single edge, the additional-attributes of interest present in the connected nodes; and
applying the plurality of responsive node-feature vectors as a training set for the machine learning or the machine classification to produce computer instructions configured to determine that another document is relevant to the topic of interest.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology disclosed describes systems and methods for generating feature vectors from resource description framework (RDF) graphs. Machine learning tasks frequently operate on vectors of features. Available systems for parsing multiple documents often generate RDF graphs. Once a set of interesting features to be considered has been established, the disclosed technology describes systems and methods for generating feature vectors from the RDF graphs for the documents. In one example setting, a machine learning system can use generated feature vectors to determine how interesting a news article might be, or to learn information-of-interest about a specific subject reported in multiple articles. In another example setting, viable interview candidates for a particular job opening can be identified using feature vectors generated from a resume database, using the disclosed systems and methods for generating feature vectors from RDF graphs.
160 Citations
25 Claims
-
1. A method of preparing feature vectors suitable for machine learning or machine classification from a Resource Description Framework graph of a document relevant to a topic of interest, the method comprising:
-
receiving a set of identified key-attributes that are node names of interest, root-attributes of interest, additional-attributes of interest, and connection information identifying connected nodes to search for at least some of the additional-attributes of interest, the additional-attributes of interest determined from information external to the document generating a plurality of responsive node-feature vectors for the document represented as the Resource Description Framework graph, including; collecting the root-attributes of interest from a root node of the document; querying for and receiving responsive nodes in the Resource Description Framework graph that include the key-attributes; for each responsive node, creating a responsive node-feature vector, wherein the responsive node-feature vector includes; from the root node, at least some of the collected root-attributes; from the responsive node, the additional-attributes of interest present in the responsive node; and as directed by the connection information, from nodes connected to the responsive node by a single edge, the additional-attributes of interest present in the connected nodes; and applying the plurality of responsive node-feature vectors as a training set for the machine learning or the machine classification to produce computer instructions configured to determine that another document is relevant to the topic of interest. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system that prepares feature vectors suitable for machine learning or machine classification from a Resource Description Framework graph of a document relevant to a topic of interest, the system including:
a processor, memory coupled to the processor, and computer instructions loaded into the memory that, when executed, cause the processor to implement a process that includes; receipt of a set of identified key-attributes that are node names of interest, root-attributes of interest, additional-attributes of interest, and connection information identifying connected nodes to search for at least some of the additional-attributes of interest, the additional-attributes of interest determined from information external to the document generation of a plurality of responsive node-feature vectors for the document represented as the Resource Description Framework graph, including; collection of the root-attributes of interest from a root node of the document; a query for and receipt of responsive nodes in the Resource Description Framework graph that include the key-attributes; for each responsive node, creating a responsive node-feature vector, wherein the responsive node-feature vector includes; from the root node, at least some of the collected root-attributes; from the responsive node, the additional-attributes of interest present in the responsive node; and as directed by the connection information, from nodes connected to the responsive node by a single edge, the additional-attributes of interest present in the connected nodes; and application of the plurality of responsive node-feature vectors as a training set for the machine learning or the machine classification to produce computer instructions configured to determine that another document is relevant to the topic of interest. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
20. A tangible non-transitory computer readable storage medium loaded with initial computer instructions that, when executed, cause a computer system to prepare feature vectors suitable for machine learning or machine classification from a Resource Description Framework graph of a document relevant to a topic of interest, the initial computer instructions including instructions for:
-
receipt of a set of identified key-attributes that are node names of interest, root-attributes of interest, additional-attributes of interest, and connection information identifying connected nodes to search for at least some of the additional-attributes of interest, the additional-attributes of interest determined from information external to the document generation of a plurality of responsive node-feature vectors for the document represented as the Resource Description Framework graph, including; collection of the root-attributes of interest from a root node of the document; a query for and receipt of responsive nodes in the Resource Description Framework graph that include the key-attributes; for each responsive node, creation of a responsive node-feature vector, wherein the responsive node-feature vector includes; from the root node, at least some of the collected root-attributes; from the responsive node, the additional-attributes of interest present in the responsive node; and as directed by the connection information, from nodes connected to the responsive node by a single edge, the additional-attributes of interest present in the connected nodes; and application of the plurality of responsive node-feature vectors as a training set for the machine learning or the machine classification to produce computer instructions configured to determine that another document is relevant to the topic of interest. - View Dependent Claims (21, 22, 23, 24, 25)
-
Specification