COMPUTER-IMPLEMENTED SYSTEMS AND METHODS FOR TAXONOMY DEVELOPMENT
First Claim
1. A system, comprising:
- one or more processors;
one or more non-transitory computer readable storage mediums containing instructions to cause the one or more processors to perform operations including;
identifying a term within a document;
determining a pre-defined threshold distance;
identifying a plurality of additional terms in the document, wherein the plurality of additional terms are located within the pre-defined threshold distance of the term;
calculating a distance between the term and an additional term of the plurality of additional terms;
determining a corresponding weight for the calculated distance, wherein determining the corresponding weight uses a proximity weighting scheme;
calculating a score for the additional term using the calculated distance and the corresponding weight;
generating a colocation matrix including a plurality of rows, wherein the colocation matrix is generated using the term, the plurality of additional terms, and the score; and
determining a classifier for the document using the colocation matrix.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for generating a set of classifiers. A term is identified within a document and a pre-defined threshold distance is determined. A plurality of additional terms in the document are identified, the additional terms being located within the pre-defined threshold distance of the time. A distance between the term and an additional term of the plurality of additional terms is calculated. A corresponding weight for the calculated distance is determined using a proximity weighting scheme. A score for the additional term is calculated using the calculated distance and the corresponding weight. A colocation matrix is generated and a classifier determined using the colocation matrix.
-
Citations
20 Claims
-
1. A system, comprising:
-
one or more processors; one or more non-transitory computer readable storage mediums containing instructions to cause the one or more processors to perform operations including; identifying a term within a document; determining a pre-defined threshold distance; identifying a plurality of additional terms in the document, wherein the plurality of additional terms are located within the pre-defined threshold distance of the term; calculating a distance between the term and an additional term of the plurality of additional terms; determining a corresponding weight for the calculated distance, wherein determining the corresponding weight uses a proximity weighting scheme; calculating a score for the additional term using the calculated distance and the corresponding weight; generating a colocation matrix including a plurality of rows, wherein the colocation matrix is generated using the term, the plurality of additional terms, and the score; and determining a classifier for the document using the colocation matrix. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product, tangibly embodied in a non-transitory machine readable storage medium, including instructions operable to cause a data processing apparatus to:
-
identify a term within a document; determine a pre-defined threshold distance; identify a plurality of additional terms in the document, wherein the plurality of additional terms are located within the pre-defined threshold distance of the term; calculate a distance between the term and an additional term of the plurality of additional terms; determine a corresponding weight for the calculated distance, wherein determining the corresponding weight uses a proximity weighting scheme; calculate a score for the additional term using the calculated distance and the corresponding weight; generate a colocation matrix including a plurality of rows, wherein the colocation matrix is generated using the term, the plurality of additional terms, and the score; and determine a classifier for the document using the colocation matrix. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer-implemented method comprising:
-
identifying a term within a document; determining a pre-defined threshold distance; identifying a plurality of additional terms in the document, wherein the plurality of additional terms are located within the pre-defined threshold distance of the term; calculating a distance between the term and an additional term of the plurality of additional terms; determining a corresponding weight for the calculated distance, wherein determining the corresponding weight uses a proximity weighting scheme; calculating a score for the additional term using the calculated distance and the corresponding weight; generating a colocation matrix including a plurality of rows, wherein the colocation matrix is generated using the term, the plurality of additional terms, and the score; and determining a classifier for the document using the colocation matrix.
-
Specification