Computer-Implemented Systems and Methods for Taxonomy Development
First Claim
1. A computer-implemented method for generating a set of classifiers, comprising:
- determining, using one or more data processors, one or more locations of instances of a topic term in a collection of documents;
identifying, using the one or more data processors, one or more topic term phrases by parsing words in the collection of documents, wherein a topic term phrase includes one or more words that appear within a topic threshold distance of a topic term;
identifying, using the one or more data processors, one or more sentiment terms within a topic term phrase;
identifying, using the one or more data processors, one or more candidate classifiers by parsing words in the one or more topic term phrases, wherein a candidate classifier is a word that appears within a sentiment threshold distance of a sentiment term;
generating, using the one or more data processors, a colocation matrix including a plurality of rows, wherein a candidate classifier is associated with a row, and wherein the colocation matrix is generated using the locations of the candidate classifiers as they appear within the collection of documents;
identifying, using the one or more data processors, a seed row, wherein the seed row is selected from among the plurality of rows, and wherein the seed row is associated with a particular attribute;
determining, using the one or more data processors, distance metrics by comparing rows of the colocation matrix to the seed row; and
generating, using the one or more data processors, a set of classifiers for the particular attribute, wherein classifiers in the set of classifiers are selected using the distance metrics.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for generating a set of classifiers. A location is determined for each instance of a topic term in a collection of documents. One or more topic term phrases are identified, and one or more sentiment terms within each topic term phrase. Candidate classifiers are identified by parsing words in the one or more topic term phrases, and a colocation matrix is generated. A seed row of the colocation associated with a particular attribute is identified, and distance metrics are determined by comparing each row of the colocation matrix to the seed row. A set of classifiers are generated for the particular attribute, where classifiers in the set of classifiers are selected using the distance metrics.
-
Citations
25 Claims
-
1. A computer-implemented method for generating a set of classifiers, comprising:
-
determining, using one or more data processors, one or more locations of instances of a topic term in a collection of documents; identifying, using the one or more data processors, one or more topic term phrases by parsing words in the collection of documents, wherein a topic term phrase includes one or more words that appear within a topic threshold distance of a topic term; identifying, using the one or more data processors, one or more sentiment terms within a topic term phrase; identifying, using the one or more data processors, one or more candidate classifiers by parsing words in the one or more topic term phrases, wherein a candidate classifier is a word that appears within a sentiment threshold distance of a sentiment term; generating, using the one or more data processors, a colocation matrix including a plurality of rows, wherein a candidate classifier is associated with a row, and wherein the colocation matrix is generated using the locations of the candidate classifiers as they appear within the collection of documents; identifying, using the one or more data processors, a seed row, wherein the seed row is selected from among the plurality of rows, and wherein the seed row is associated with a particular attribute; determining, using the one or more data processors, distance metrics by comparing rows of the colocation matrix to the seed row; and generating, using the one or more data processors, a set of classifiers for the particular attribute, wherein classifiers in the set of classifiers are selected using the distance metrics. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A computer-implemented system for generating a set of classifiers, comprising:
-
one or more data processors; a computer-readable medium encoded with instructions for commanding the one or more data processors to execute steps including; determining one or more locations of instances of a topic term in a collection of documents; identifying one or more topic term phrases by parsing words in the collection of documents, wherein a topic term phrase includes one or more words that appear within a topic threshold distance of a topic term; identifying one or more sentiment terms within a topic term phrase; identifying one or more candidate classifiers by parsing words in the one or more topic term phrases, wherein a candidate classifier is a word that appears within a sentiment threshold distance of a sentiment term; generating a colocation matrix including a plurality of rows, wherein a candidate classifier is associated with a row, and wherein the colocation matrix is generated using the locations of the candidate classifiers as they appear within the collection of documents; identifying a seed row, wherein the seed row is selected from among the plurality of rows, and wherein the seed row is associated with a particular attribute; determining distance metrics by comparing rows of the colocation matrix to the seed row; and generating a set of classifiers for the particular attribute, wherein classifiers in the set of classifiers are selected using the distance metrics.
-
-
25. A computer-readable medium encoded with instructions for commanding one or more data processors to execute method for generating a set of classifiers, the method comprising:
-
determining one or more locations of instances of a topic term in a collection of documents; identifying one or more topic term phrases by parsing words in the collection of documents, wherein a topic term phrase includes one or more words that appear within a topic threshold distance of a topic term; identifying one or more sentiment terms within a topic term phrase; identifying one or more candidate classifiers by parsing words in the one or more topic term phrases, wherein a candidate classifier is a word that appears within a sentiment threshold distance of a sentiment term; generating a colocation matrix including a plurality of rows, wherein a candidate classifier is associated with a row, and wherein the colocation matrix is generated using the locations of the candidate classifiers as they appear within the collection of documents; identifying a seed row, wherein the seed row is selected from among the plurality of rows, and wherein the seed row is associated with a particular attribute; determining distance metrics by comparing rows of the colocation matrix to the seed row; and generating a set of classifiers for the particular attribute, wherein classifiers in the set of classifiers are selected using the distance metrics.
-
Specification