Selecting pattern matching segments for electronic communication clustering
First Claim
1. A system comprising one or more processors operably coupled with non-transitory memory that stores instructions that, in response to execution of the instructions by the one or more processors, cause the one or more processors to:
- identify a set of pattern matching segments that match at least one of a corpus of email addresses;
determine a measure of coverage of each of the set of pattern matching segments across the corpus of email addresses;
determine one or more measures of flexibility associated with each of the set of pattern matching segments based on a count of wildcard characters within each pattern matching segment relative to a count of fixed text characters within each pattern matching segment;
determine, based on the measure of coverage and the one or more measures of flexibility associated with each of the set of pattern matching segments, a score associated with each pattern matching segment;
select, based on scores associated with the pattern matching segments, one or more of the pattern matching segments that satisfy one or more thresholds that are automatically adjusted;
group a corpus of emails into a plurality of clusters based on a comparison of the one or more selected pattern matching segments to email addresses associated with the corpus of emails;
analyze emails of a given cluster of the plurality of clusters grouped based on a given pattern matching segment to identify content that is transient among the emails of the given cluster;
generate, for the given cluster, a data extraction template that is usable to extract transient content from subsequent emails that include sender email addresses that match the given pattern matching segment; and
apply the data extraction template to a subsequent email having a sender address that matches the given pattern matching segment to extract transient data from a subject or body of the subsequent email, wherein the extracted transient data is output to a user via an output device of a computing device operated by the user.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, apparatus, systems, and computer-readable media are provided for selecting pattern matching segments suitable for electronic communication clustering. A set of pattern matching segments may be identified that match at least one of a corpus of electronic communication addresses. A measure of coverage of each of the set of pattern matching segments across the corpus of electronic communication addresses may be determined. A score associated with each pattern matching segment may be determined based on the measure of coverage and one or more measures of flexibility associated with each of the set of pattern matching segments. One or more of the pattern matching segments may be selected based on the determine scores. A corpus of electronic communications may then be grouped into a plurality of clusters based on a comparison of the one or more selected pattern matching segments to electronic communication addresses associated with the corpus of electronic communications.
56 Citations
10 Claims
-
1. A system comprising one or more processors operably coupled with non-transitory memory that stores instructions that, in response to execution of the instructions by the one or more processors, cause the one or more processors to:
-
identify a set of pattern matching segments that match at least one of a corpus of email addresses; determine a measure of coverage of each of the set of pattern matching segments across the corpus of email addresses; determine one or more measures of flexibility associated with each of the set of pattern matching segments based on a count of wildcard characters within each pattern matching segment relative to a count of fixed text characters within each pattern matching segment; determine, based on the measure of coverage and the one or more measures of flexibility associated with each of the set of pattern matching segments, a score associated with each pattern matching segment; select, based on scores associated with the pattern matching segments, one or more of the pattern matching segments that satisfy one or more thresholds that are automatically adjusted; group a corpus of emails into a plurality of clusters based on a comparison of the one or more selected pattern matching segments to email addresses associated with the corpus of emails; analyze emails of a given cluster of the plurality of clusters grouped based on a given pattern matching segment to identify content that is transient among the emails of the given cluster; generate, for the given cluster, a data extraction template that is usable to extract transient content from subsequent emails that include sender email addresses that match the given pattern matching segment; and apply the data extraction template to a subsequent email having a sender address that matches the given pattern matching segment to extract transient data from a subject or body of the subsequent email, wherein the extracted transient data is output to a user via an output device of a computing device operated by the user. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method, comprising:
-
identifying a set of pattern matching segments that match at least one of a corpus of sender email addresses; determining a measure of coverage of each of the set of pattern matching segments across the corpus of sender email addresses; determining one or more measures of flexibility associated with each of the set of pattern matching segments based on a count of wildcard characters within each pattern matching segment relative to a count of fixed text characters within each pattern matching segment; determining, based on the measure of coverage and the one or more measures of flexibility associated with each of the set of pattern matching segments, a score associated with each pattern matching segment; selecting, based on scores associated with the pattern matching segments, one or more of the pattern matching segments that satisfy one or more thresholds that are automatically adjusted; grouping a corpus of emails into a plurality of clusters based on a comparison of the one or more selected pattern matching segments to sender email addresses associated with the corpus of emails; analyzing emails of a given cluster of the plurality of clusters grouped based on a given pattern matching segment, wherein the analyzing includes identifying content that is transient among the emails of the given cluster; generating, for the given cluster, a data extraction template that is usable to extract transient content from subsequent emails that include sender email addresses that match the given pattern matching segment; and applying the data extraction template to a subsequent email having a sender address that matches the given pattern matching segment to extract transient data from a subject or body of the subsequent email, wherein the extracted transient data is output to a user via an output device of a computing device operated by the user. - View Dependent Claims (8, 9)
-
-
10. At least one non-transitory computer-readable medium comprising memory that stores instructions that, in response to execution of the instructions in the memory by one or more processors of a computing system, cause the one or more processors to perform the following operations:
-
identifying, from a superset of pattern matching segments, a set of pattern matching segments that match at least one of a corpus of email addresses; determining a measure of coverage of each of the set of pattern matching segments across the corpus of email addresses; determining one or more measures of flexibility associated with each of the set of pattern matching segments based on a count of wildcard characters within each pattern matching segment relative to a count of fixed text characters within each pattern matching segment; determining, based on the measures of coverage and flexibility, a score associated with each of the set of pattern matching segments; selecting, based on scores associated with the pattern matching segments, one or more of the pattern matching segments that satisfy one or more thresholds that are automatically adjusted; grouping a corpus of emails into a plurality of clusters based on a comparison of the one or more selected pattern matching segments to sender email addresses associated with the corpus of emails; analyzing emails of a given cluster of the plurality of clusters grouped based on a given pattern matching segment, wherein the analyzing includes identifying content that is transient among the emails of the given cluster; generating, for the given cluster, a data extraction template that is usable to extract transient content from subsequent emails that include sender email addresses that match the given pattern matching segment; and applying the data extraction template to a subsequent email having a sender address that matches the given pattern matching segment to extract transient data from a subject or body of the subsequent email, wherein the extracted transient data is output to a user via an output device of a computing device operated by the user.
-
Specification