Tagging text snippets
First Claim
1. A method of tagging a set of text snippets with a tag organized in a taxonomy, the method comprising:
- receiving, by a processor, the set of text snippets and the tag of a set of tags, wherein the tag of the set of tags comprises a set of words;
parsing, by the processor, the set of words in the tag into a parse tree with a root node and one or more levels to determine one or more types of phrases belonging to one or more parts of speech;
determining by the processor,a frequency of one or more words present in the tag of the set of tags, anda headword from the one or more words present in the tag of the set of tags;
assigning, by the processor, a numeric weight to the one or more words of the tag based on the frequency, parse tree, headword of the set of words associated with the tag, and the parts of speech category of words, wherein the numeric weight indicates relative importance of the one or more words in the tag with respect to other words present in other tags;
determining, by the processor, correspondences between the one or more words of the tag and words present in the set of text snippets, wherein the determining of the correspondences results in identification of a same word or a similar meaning word present in the tag;
computing, by the processor, a belief factor for the tag by applying a certainty factor algebra (CFA) based upon the numeric weight of the same word and the similar meaning word; and
identifying, dynamically a text, from the set of text snippets, for which a feedback is sought from a user, and wherein the text is identified using an active learning approach;
receiving, dynamically a feedback about the tag assigned to the text snippet from the user, wherein the feedback received is used for historical learning by updating a knowledge base referred for tagging the set of text snippets and future set of text snippets using the set of tags present in the taxonomy; and
updating the dynamic feedback in the knowledge base using the active learning approach assigned to the set of text snippets, wherein the active learning approach optimizes an amount of feedback received from the user, wherein the knowledge base comprises a word sense-importance database and a lexical resource, and wherein the word sense-importance database discriminates between one or more different meanings of a word and a relative importance of the set of words associated with a tag in context.
1 Assignment
0 Petitions
Accused Products
Abstract
The present subject matter discloses system and method for tagging set of text snippets with set of tags. A set of text snippet and set of tags are received as input by the system. Further, each tag comprises set of words, and for each word of the set of words, numeric weight is assigned based on frequency of the word and headword of the set of words. Further, same words and similar meaning words are determined from the tag and text snippets. Further, belief factor is computed for the tag by applying certainty factor algebra upon the numeric weight assigned to the same words and the similar meaning words. Further, the tag is assigned to the text snippet based on comparison of the belief factor with threshold. Further, feedback is received about the tagging done. Based on the feedback, knowledge base of the system may be updated for future tagging.
16 Citations
8 Claims
-
1. A method of tagging a set of text snippets with a tag organized in a taxonomy, the method comprising:
-
receiving, by a processor, the set of text snippets and the tag of a set of tags, wherein the tag of the set of tags comprises a set of words; parsing, by the processor, the set of words in the tag into a parse tree with a root node and one or more levels to determine one or more types of phrases belonging to one or more parts of speech; determining by the processor, a frequency of one or more words present in the tag of the set of tags, and a headword from the one or more words present in the tag of the set of tags; assigning, by the processor, a numeric weight to the one or more words of the tag based on the frequency, parse tree, headword of the set of words associated with the tag, and the parts of speech category of words, wherein the numeric weight indicates relative importance of the one or more words in the tag with respect to other words present in other tags; determining, by the processor, correspondences between the one or more words of the tag and words present in the set of text snippets, wherein the determining of the correspondences results in identification of a same word or a similar meaning word present in the tag; computing, by the processor, a belief factor for the tag by applying a certainty factor algebra (CFA) based upon the numeric weight of the same word and the similar meaning word; and identifying, dynamically a text, from the set of text snippets, for which a feedback is sought from a user, and wherein the text is identified using an active learning approach; receiving, dynamically a feedback about the tag assigned to the text snippet from the user, wherein the feedback received is used for historical learning by updating a knowledge base referred for tagging the set of text snippets and future set of text snippets using the set of tags present in the taxonomy; and updating the dynamic feedback in the knowledge base using the active learning approach assigned to the set of text snippets, wherein the active learning approach optimizes an amount of feedback received from the user, wherein the knowledge base comprises a word sense-importance database and a lexical resource, and wherein the word sense-importance database discriminates between one or more different meanings of a word and a relative importance of the set of words associated with a tag in context. - View Dependent Claims (2, 3, 4)
-
-
5. A system for tagging a set of text snippets with a tag organized in a taxonomy, the system comprises:
-
a processor a memory coupled to the processor, wherein the processor executes a plurality of modules stored in the memory and wherein the plurality of modules comprising; a receiving module to receive the set of text snippets and the tag of a set of tags, wherein the tag of the set of tags comprises a set of words; a feature extractor module to parse the set of words in the tag into a parse tree with a root node and one or more levels to determine one or more types of phrases belonging to one or more parts of speech; determine a frequency of one or more words present in the tag of the set of tags, determine a headword from the one or more words present in the tag of the set of tags, and assign a numeric weight to the one or more words of the tag based on the frequency, parse tree, headword of the set of words associated with the tag, and the parts of speech category of words, wherein the numeric weight indicates relative importance of the one or more words in the tag with respect to other words present in other tags; a determining module to determine correspondences between the one or more words of the tag and words present in the set of text snippets, wherein the determining of the correspondences result in identification of a same word or a similar meaning word present in the tag, a belief factor computing module to compute a belief factor for the tag by applying a certainty factor algebra (CFA) based upon the numeric weight of the same word and the similar meaning word; and a feedback module to receive feedback about the tag assigned to the text snippet from a user, wherein the feedback received is used for historical learning by updating a knowledge base referred for tagging the set of text snippets and future set of text snippets using the set of tags present in the taxonomy, wherein the feedback module updates the dynamic feedback in the knowledge base using the active learning approach assigned to the set of text snippets wherein the active learning approach optimizes an amount of feedback received from the user, wherein the knowledge base comprises a word sense-importance database and a lexical resource, and wherein the word sense-importance database discriminates between one or more different meanings of a word and a relative importance of the set of words associated with a tag in context. - View Dependent Claims (6, 7, 8)
-
Specification