Selecting tags for a document by analyzing paragraphs of the document
First Claim
Patent Images
1. A computer-implemented method comprising:
- accessing a document stored in one or more tangible media, the document comprising a plurality of text units, a text unit comprising a plurality of words, the plurality of words comprising a plurality of keywords;
performing the following for each text unit using a processor;
ranking the plurality of words of the each text unit according to a ranking technique;
selecting one or more highly ranked words as the keywords of the each text unit;
establishing relatedness among the keywords of each text unit; and
selecting one or more keywords according to the established relatedness as one or more candidate tags to yield a candidate tag set for the each text unit;
using the processor, determining relatedness between the candidate tags of each candidate tag set and the candidate tags of other candidate tag sets; and
using the processor, assigning at least one candidate tag to the document according to the determined relatedness.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, assigning tags to a document includes accessing the document, where the document comprises text units that include words. The following is performed for each text unit: a subset of words of a text unit is selected as candidate tags, relatedness is established among the candidate tags, and certain candidate tags are selected according to the established relatedness to yield a candidate tag set for the text unit. Relatedness between the candidate tags of each candidate tag set and the candidate tags of other candidate tag sets is determined. At least one candidate tag is assigned to the document according to the determined relatedness.
63 Citations
18 Claims
-
1. A computer-implemented method comprising:
-
accessing a document stored in one or more tangible media, the document comprising a plurality of text units, a text unit comprising a plurality of words, the plurality of words comprising a plurality of keywords; performing the following for each text unit using a processor; ranking the plurality of words of the each text unit according to a ranking technique; selecting one or more highly ranked words as the keywords of the each text unit; establishing relatedness among the keywords of each text unit; and selecting one or more keywords according to the established relatedness as one or more candidate tags to yield a candidate tag set for the each text unit; using the processor, determining relatedness between the candidate tags of each candidate tag set and the candidate tags of other candidate tag sets; and using the processor, assigning at least one candidate tag to the document according to the determined relatedness. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more non-transitory computer-readable tangible media encoding software operable when executed to:
-
access a document stored in one or more tangible media, the document comprising a plurality of text units, a text unit comprising a plurality of words, the plurality of words comprising a plurality of keywords; perform the following for each text unit; rank the plurality of words of the each text unit according to a ranking technique; select one or more highly ranked words as the keywords of the each text unit; establish relatedness among the keywords of each text unit; and select one or more keywords according to the established relatedness as one or more candidate tags to yield a candidate tag set for the each text unit; determine relatedness between the candidate tags of each candidate tag set and the candidate tags of other candidate tag sets; and assign at least one candidate tag to the document according to the determined relatedness. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification