Selecting Tags For A Document By Analyzing Paragraphs Of The Document
First Claim
1. A method comprising:
- accessing a document stored in one or more tangible media, the document comprising a plurality of text units, a text unit comprising a plurality of words, the plurality of words comprising a plurality of keywords;
performing the following for each text unit;
establishing relatedness among the keywords of each text unit; and
selecting one or more keywords according to the established relatedness as one or more candidate tags to yield a candidate tag set for the each text unit; and
determining relatedness between the candidate tags of each candidate tag set and the candidate tags of other candidate tag sets; and
assigning at least one candidate tag to the document according to the determined relatedness.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, assigning tags to a document includes accessing the document, where the document comprises text units that include words. The following is performed for each text unit: a subset of words of a text unit is selected as candidate tags, relatedness is established among the candidate tags, and certain candidate tags are selected according to the established relatedness to yield a candidate tag set for the text unit. Relatedness between the candidate tags of each candidate tag set and the candidate tags of other candidate tag sets is determined. At least one candidate tag is assigned to the document according to the determined relatedness.
-
Citations
21 Claims
-
1. A method comprising:
-
accessing a document stored in one or more tangible media, the document comprising a plurality of text units, a text unit comprising a plurality of words, the plurality of words comprising a plurality of keywords; performing the following for each text unit; establishing relatedness among the keywords of each text unit; and selecting one or more keywords according to the established relatedness as one or more candidate tags to yield a candidate tag set for the each text unit; and determining relatedness between the candidate tags of each candidate tag set and the candidate tags of other candidate tag sets; and assigning at least one candidate tag to the document according to the determined relatedness. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. One or more computer-readable tangible media encoding software operable when executed to:
-
access a document stored in one or more tangible media, the document comprising a plurality of text units, a text unit comprising a plurality of words, the plurality of words comprising a plurality of keywords; perform the following for each text unit; establishing relatedness among the keywords of each text unit; and selecting one or more keywords according to the established relatedness as one or more candidate tags to yield a candidate tag set for the each text unit; and determine relatedness between the candidate tags of each candidate tag set and the candidate tags of other candidate tag sets; and assign at least one candidate tag to the document according to the determined relatedness. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system comprising:
-
means for accessing a document stored in one or more tangible media, the document comprising a plurality of text units, a text unit comprising a plurality of words, the plurality of words comprising a plurality of keywords; means for performing the following for each text unit; establishing relatedness among the keywords of each text unit; and selecting one or more keywords according to the established relatedness as one or more candidate tags to yield a candidate tag set for the each text unit; and means for determining relatedness between the candidate tags of each candidate tag set and the candidate tags of other candidate tag sets; and means for assigning at least one candidate tag to the document according to the determined relatedness.
-
Specification