Tagging text snippets

US 10,331,768 B2
Filed: 09/20/2016
Issued: 06/25/2019
Est. Priority Date: 09/21/2015
Status: Active Grant

First Claim

Patent Images

1. A method of tagging a set of text snippets with a tag organized in a taxonomy, the method comprising:

receiving, by a processor, the set of text snippets and the tag of a set of tags, wherein the tag of the set of tags comprises a set of words;

parsing, by the processor, the set of words in the tag into a parse tree with a root node and one or more levels to determine one or more types of phrases belonging to one or more parts of speech;

determining by the processor,a frequency of one or more words present in the tag of the set of tags, anda headword from the one or more words present in the tag of the set of tags;

assigning, by the processor, a numeric weight to the one or more words of the tag based on the frequency, parse tree, headword of the set of words associated with the tag, and the parts of speech category of words, wherein the numeric weight indicates relative importance of the one or more words in the tag with respect to other words present in other tags;

determining, by the processor, correspondences between the one or more words of the tag and words present in the set of text snippets, wherein the determining of the correspondences results in identification of a same word or a similar meaning word present in the tag;

computing, by the processor, a belief factor for the tag by applying a certainty factor algebra (CFA) based upon the numeric weight of the same word and the similar meaning word; and

identifying, dynamically a text, from the set of text snippets, for which a feedback is sought from a user, and wherein the text is identified using an active learning approach;

receiving, dynamically a feedback about the tag assigned to the text snippet from the user, wherein the feedback received is used for historical learning by updating a knowledge base referred for tagging the set of text snippets and future set of text snippets using the set of tags present in the taxonomy; and

updating the dynamic feedback in the knowledge base using the active learning approach assigned to the set of text snippets, wherein the active learning approach optimizes an amount of feedback received from the user, wherein the knowledge base comprises a word sense-importance database and a lexical resource, and wherein the word sense-importance database discriminates between one or more different meanings of a word and a relative importance of the set of words associated with a tag in context.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present subject matter discloses system and method for tagging set of text snippets with set of tags. A set of text snippet and set of tags are received as input by the system. Further, each tag comprises set of words, and for each word of the set of words, numeric weight is assigned based on frequency of the word and headword of the set of words. Further, same words and similar meaning words are determined from the tag and text snippets. Further, belief factor is computed for the tag by applying certainty factor algebra upon the numeric weight assigned to the same words and the similar meaning words. Further, the tag is assigned to the text snippet based on comparison of the belief factor with threshold. Further, feedback is received about the tagging done. Based on the feedback, knowledge base of the system may be updated for future tagging.

16 Citations

View as Search Results

8 Claims

1. A method of tagging a set of text snippets with a tag organized in a taxonomy, the method comprising:
- receiving, by a processor, the set of text snippets and the tag of a set of tags, wherein the tag of the set of tags comprises a set of words;
  
  parsing, by the processor, the set of words in the tag into a parse tree with a root node and one or more levels to determine one or more types of phrases belonging to one or more parts of speech;
  
  determining by the processor,a frequency of one or more words present in the tag of the set of tags, anda headword from the one or more words present in the tag of the set of tags;
  
  assigning, by the processor, a numeric weight to the one or more words of the tag based on the frequency, parse tree, headword of the set of words associated with the tag, and the parts of speech category of words, wherein the numeric weight indicates relative importance of the one or more words in the tag with respect to other words present in other tags;
  
  determining, by the processor, correspondences between the one or more words of the tag and words present in the set of text snippets, wherein the determining of the correspondences results in identification of a same word or a similar meaning word present in the tag;
  
  computing, by the processor, a belief factor for the tag by applying a certainty factor algebra (CFA) based upon the numeric weight of the same word and the similar meaning word; and
  
  identifying, dynamically a text, from the set of text snippets, for which a feedback is sought from a user, and wherein the text is identified using an active learning approach;
  
  receiving, dynamically a feedback about the tag assigned to the text snippet from the user, wherein the feedback received is used for historical learning by updating a knowledge base referred for tagging the set of text snippets and future set of text snippets using the set of tags present in the taxonomy; and
  
  updating the dynamic feedback in the knowledge base using the active learning approach assigned to the set of text snippets, wherein the active learning approach optimizes an amount of feedback received from the user, wherein the knowledge base comprises a word sense-importance database and a lexical resource, and wherein the word sense-importance database discriminates between one or more different meanings of a word and a relative importance of the set of words associated with a tag in context.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the set of text snippets represents at least one of answers received in response to an open-ended question, text data corresponding to social networks and mobile applications, clinical free text data, diagnosis reports, request or complaints of end-users filed through online sources, and other text data.
  - 3. The method of claim 1, wherein the certainty factor algebra (CFA) is, CFA=[(α
    - +α
      
      2)(α
      
      1*α
      
      2)], wherein α
      
      1 and α
      
      2 are at least one of a belief factor and the numeric weight assigned for the same word and the similar meaning word present in the tag.
  - 4. The method of claim 1, wherein the feedback received comprises a positive feedback and a negative feedback, and wherein:
    - on basis of the positive feedback, a reinforce token is assigned to the belief factor, and wherein the reinforce token indicates an appropriateness of each tag of the set of tags, andon basis of the negative feedback, the knowledge base is updated using a corrective action suggested by the user in the negative feedback.

5. A system for tagging a set of text snippets with a tag organized in a taxonomy, the system comprises:
- a processora memory coupled to the processor, wherein the processor executes a plurality of modules stored in the memory and wherein the plurality of modules comprising;
  
  a receiving module to receive the set of text snippets and the tag of a set of tags, wherein the tag of the set of tags comprises a set of words;
  
  a feature extractor module toparse the set of words in the tag into a parse tree with a root node and one or more levels to determine one or more types of phrases belonging to one or more parts of speech;
  
  determine a frequency of one or more words present in the tag of the set of tags,determine a headword from the one or more words present in the tag of the set of tags, andassign a numeric weight to the one or more words of the tag based on the frequency, parse tree, headword of the set of words associated with the tag, and the parts of speech category of words, wherein the numeric weight indicates relative importance of the one or more words in the tag with respect to other words present in other tags;
  
  a determining module to determine correspondences between the one or more words of the tag and words present in the set of text snippets, wherein the determining of the correspondences result in identification of a same word or a similar meaning word present in the tag,a belief factor computing module to compute a belief factor for the tag by applying a certainty factor algebra (CFA) based upon the numeric weight of the same word and the similar meaning word; and
  
  a feedback module to receive feedback about the tag assigned to the text snippet from a user, wherein the feedback received is used for historical learning by updating a knowledge base referred for tagging the set of text snippets and future set of text snippets using the set of tags present in the taxonomy, wherein the feedback module updates the dynamic feedback in the knowledge base using the active learning approach assigned to the set of text snippets wherein the active learning approach optimizes an amount of feedback received from the user, wherein the knowledge base comprises a word sense-importance database and a lexical resource, and wherein the word sense-importance database discriminates between one or more different meanings of a word and a relative importance of the set of words associated with a tag in context.
- View Dependent Claims (6, 7, 8)
- - 6. The system of claim 5, wherein the set of text snippets represents at least one of answers received in response to an open-ended question, text data corresponding to social networks and mobile applications, clinical free text data, diagnosis reports, request or complaints of end-users filed through online sources, and other text data.
  - 7. The system of claim 5, wherein the certainty factor algebra (CFA) is, CFA=[(α
    - 1+α
      
      2)(α
      
      1*α
      
      2)], wherein α
      
      1 and α
      
      2 are at least one of a belief factor and the numeric weight assigned for the same word and the similar meaning word present in the tag.
  - 8. The system of claim 5, wherein the feedback received comprises a positive feedback and a negative feedback, and wherein:
    - on basis of the positive feedback, a reinforce token is assigned to the belief factor, and wherein the reinforce token indicates an appropriateness of each tag of the set of tags, andon basis of the negative feedback, the knowledge base is updated using a corrective action suggested by the user in the negative feedback.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Original Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Inventors
Patil, Sangameshwar Suryakant, Palshikar, Girish Keshav, Shrivastava, Apoorv
Primary Examiner(s)
Leland, III, Edwin S

Application Number

US15/271,116
Publication Number

US 20170083484A1
Time in Patent Office

1,008 Days
Field of Search

704 9
US Class Current
CPC Class Codes

G06F 16/3326   using relevance feedback fr...

G06F 16/334   Query execution G06F16/335 ...

G06F 16/686   using information manually ...

G06F 40/117   Tagging; Marking up details...

G06F 40/205   Parsing

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/247   Thesauruses; Synonyms

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

Tagging text snippets

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

16 Citations

8 Claims

Specification

Use Cases

Quick Links

Others

Tagging text snippets

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

8 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others