Extracting actionable information from emails
First Claim
1. A method for improving efficiency of a computing device used in extracting actionable information from a message, comprising:
- receiving a hierarchically structured message having text content;
parsing the message;
identifying one or more keywords from a dictionary in the parsed message;
separating the message into nodes;
generating node scores for the nodes, wherein the node scores are generated by;
incrementing a given node score based on the given node containing at least one keyword; and
adding the node scores of direct child nodes of a particular parent node to a node score associated with the particular parent node;
identifying an area of interest based at least in part on the node scores;
correlating the area of interest to one or more sub-templates;
identifying a template based on the one or more sub-templates; and
extracting actionable information from the message based on the identified template.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for extracting actionable information from emails in a completely unsupervised manner with no need for the data to be labeled (i.e., the systems and methods do not a human to identify unlabeled or relabeled emails). Changes in the email structure are automatically incorporated to learn new templates through the novel concept of sub-templates. The systems and methods incorporate the minor variations in email structure seamlessly, without needing to introduce new templates. Email templates are computed as permutations of multiple sub-templates in the email, which allows the systems and methods to handle variations in email structure seamlessly and highly efficiently. These systems and methods are extendable to any domain using structured emails, and improve the efficiency of the systems that receive and act on information contained in emails.
27 Citations
19 Claims
-
1. A method for improving efficiency of a computing device used in extracting actionable information from a message, comprising:
-
receiving a hierarchically structured message having text content; parsing the message; identifying one or more keywords from a dictionary in the parsed message; separating the message into nodes; generating node scores for the nodes, wherein the node scores are generated by;
incrementing a given node score based on the given node containing at least one keyword; and
adding the node scores of direct child nodes of a particular parent node to a node score associated with the particular parent node;identifying an area of interest based at least in part on the node scores; correlating the area of interest to one or more sub-templates; identifying a template based on the one or more sub-templates; and extracting actionable information from the message based on the identified template. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for improving efficiency of a computing device in extracting actionable information from a message, comprising:
-
a parser, operable to receive a hierarchically structured message having text content and operable to break the message into nodes based on the hierarchical structure of the message, wherein the nodes are organized according to a tree structure corresponding to the hierarchical structure; a domain dictionary, in communication with the parser; a template library, in communication with the parser; wherein the parser is further operable to identify keywords in nodes of the message matching entries in the domain dictionary and assign node scores to each of the nodes based on keyword presence; wherein the parser is further operable to identify an area of interest in the message, the area of interest comprising a parent node and a child node of the parent node, based on the child node having a highest node score that is furthest from a root of the tree structure; wherein the parser is further operable to identify a template for the message from the template library, wherein the template library is built based on identifying tree structures and node scores from one or more areas of interest identified in previous messages; and wherein the parser is further operable to extract actionable information from the area of interest based on the template. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A computer readable storage device including instructions, which when executed by a processor are operable to:
-
defining a plurality of nodes of an email message, the plurality of nodes arranged in a tree structure based on a structure of the email message; parse the email message according to a domain dictionary to identify keywords from the domain dictionary included in leaf nodes in the tree structure; increment a node score for each leaf node that includes at least one keyword; combine node scores of each child node of the tree structure at a parent node; identify a node in the tree structure having a highest node score; define the node having the highest node score and child nodes of the node having the highest node score as an core region in the email message; identify one or more sub-templates having tree structures and node scores matching tree structures and node scores of one or more portions of the core region; identify an ur-template that includes the one or more sub-templates; and extract actionable information from the core region based on the ur-template. - View Dependent Claims (15, 16, 17)
-
-
18. A method for improving efficiency of a computing device used in extracting actionable information from a message, comprising:
-
receiving a hierarchically structured message having text content; parsing the message; identifying one or more keywords from a dictionary in the parsed message; separating the message into nodes; generating node scores for the nodes; identifying an area of interest based at least in part on the node scores, wherein the identified area of interest comprises a given node having a highest node score, wherein when the highest node score is shared by multiple nodes, a node of the multiple nodes sharing the highest node score located furthest from a root node is selected as the node with the highest node score; correlating the area of interest to one or more sub-templates; identifying a template based on the one or more sub-templates; and extracting actionable information from the message based on the identified template.
-
-
19. A method for improving efficiency of a computing device used in extracting actionable information from a message, comprising:
-
receiving a hierarchically structured message having text content; parsing the message; identifying one or more keywords from a dictionary in the parsed message; separating the message into nodes; generating node scores for the nodes; identifying an area of interest based at least in part on the node scores; correlating the area of interest to one or more sub-templates, wherein correlating the area of interest to the one or more sub-templates further comprises; determining whether portions of the area of interest match one or more existing sub-templates; in response to determining that a given portion of the area of interest matches a given existing sub-template, selecting the given sub-template; and in response to determining that the given portion of the area of interest does not match the one or more existing sub-templates, saving the given portion as a new sub-template; identifying a template based on the one or more sub-templates; and extracting actionable information from the message based on the identified template.
-
Specification