System for mending through automated processes
First Claim
1. A system for transforming historical data collected in response to one or more triggering events, in order to classify textual values, the system comprising:
- a computer apparatus including a processor and a memory; and
a software module stored in the memory, comprising executable instructions that when executed by the processor cause the processor to;
access a plurality of textual values from historical transaction data;
remove undesired characters from the plurality of textual values;
implement a clustering algorithm to the plurality of textual values to identify one or more distinct patterns within the plurality of textual values, wherein the clustering algorithm comprises;
a primary process for coding the plurality of textual values into one or more phonetic components, thereby reducing the plurality of textual values into a combination of consonant sounds, wherein identifying the one or more distinct patterns within the plurality of textual values comprises comparing pronunciations and phonetics of the plurality of textual values; and
a secondary process for identifying and classifying, based on an Internet search, one or more of the plurality of textual values unable to be classified by the primary process;
create one or more clusters by grouping the plurality of textual values based, respectively, on the one or more distinct patterns output by the primary process and the Internet search of the secondary process;
apply a similarity gauge to the textual values of each of the clusters to determine similarity or dissimilarity among the textual values of each cluster;
filter the textual values of each cluster to determine which textual values belong in each cluster and which textual values do not belong in each cluster, wherein the textual values that belong are cluster values;
pass the cluster values for each cluster to a reference table;
store the cluster values for each cluster in the reference table for future access; and
in response to a need for classification of a future set of textual values, access the reference table and lookup the future set of textual values in the reference table to determine whether any of the future set of textual values are cluster values.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for transforming historical data collected in response to one or more triggering events, in order to classify textual values. Embodiments access a plurality of textual values from historical transaction data; identify one or more distinct patterns within the plurality of textual values; group the textual values based on the one or more distinct patterns, thereby forming one or more clusters; apply a similarity gauge to the textual values of each of the clusters to determine similarity or dissimilarity among the textual values of each cluster; and filter the textual values of each cluster to determine which textual values belong in each cluster, wherein the textual values that belong are cluster values. Some embodiments also remove undesired characters from the textual values, and in some cases identifying the distinct patterns includes comparing pronunciations and/or phonetics of the textual values.
113 Citations
15 Claims
-
1. A system for transforming historical data collected in response to one or more triggering events, in order to classify textual values, the system comprising:
-
a computer apparatus including a processor and a memory; and a software module stored in the memory, comprising executable instructions that when executed by the processor cause the processor to; access a plurality of textual values from historical transaction data; remove undesired characters from the plurality of textual values; implement a clustering algorithm to the plurality of textual values to identify one or more distinct patterns within the plurality of textual values, wherein the clustering algorithm comprises; a primary process for coding the plurality of textual values into one or more phonetic components, thereby reducing the plurality of textual values into a combination of consonant sounds, wherein identifying the one or more distinct patterns within the plurality of textual values comprises comparing pronunciations and phonetics of the plurality of textual values; and a secondary process for identifying and classifying, based on an Internet search, one or more of the plurality of textual values unable to be classified by the primary process; create one or more clusters by grouping the plurality of textual values based, respectively, on the one or more distinct patterns output by the primary process and the Internet search of the secondary process; apply a similarity gauge to the textual values of each of the clusters to determine similarity or dissimilarity among the textual values of each cluster; filter the textual values of each cluster to determine which textual values belong in each cluster and which textual values do not belong in each cluster, wherein the textual values that belong are cluster values; pass the cluster values for each cluster to a reference table; store the cluster values for each cluster in the reference table for future access; and in response to a need for classification of a future set of textual values, access the reference table and lookup the future set of textual values in the reference table to determine whether any of the future set of textual values are cluster values. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product for transforming historical data collected in response to one or more triggering events, in order to classify textual values, the computer program product comprising:
a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to access a plurality of textual values from historical transaction data; computer readable program code configured to remove undesired characters from the plurality of textual values; computer readable program code configured to implement a clustering algorithm to the plurality of textual values to identify one or more distinct patterns within the plurality of textual values, wherein the clustering algorithm comprises; a primary process coding the plurality of textual values into one or more phonetic components, thereby reducing the plurality of textual values into a combination of consonant sounds, wherein identifying the one or more distinct patterns within the plurality of textual values comprises comparing pronunciations and phonetics of the plurality of textual values; and a secondary process for identifying and classifying, based on an Internet search, one or more of the plurality of textual values unable to be classified by the primary process; computer readable program code configured to create one or more clusters by grouping the plurality of textual values based, respectively, on the one or more distinct patterns output by the primary process and the Internet search of the secondary process; computer readable program code configured to apply a similarity gauge to the textual values of each of the clusters to determine similarity or dissimilarity among the textual values of each cluster; computer readable program code configured to filter the textual values of each cluster to determine which textual values belong in each cluster and which textual values do not belong in each cluster, wherein the textual values that belong are cluster values; computer readable program code configured to pass the cluster values for each cluster to a reference table; computer readable program code configured to store the cluster values for each cluster in the reference table for future access; and computer readable program code configured to, in response to a need for classification of a future set of textual values, access the reference table and lookup the future set of textual values in the reference table to determine whether any of the future set of textual values are cluster values. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A method for transforming historical data collected in response to one or more triggering events, in order to classify textual values, the method comprising:
-
accessing a plurality of textual values from historical transaction data; removing undesired characters from the plurality of textual values; implementing a clustering algorithm to the plurality of textual values to identify one or more distinct patterns within the plurality of textual values, wherein the clustering algorithm comprises; a primary process for coding the plurality of textual values into one or more phonetic components, thereby reducing the plurality of textual values into a combination of consonant sounds, wherein identifying the one or more distinct patterns within the plurality of textual values comprises comparing pronunciations and phonetics of the plurality of textual values; and a secondary process for identifying and classifying, based on an Internet search, one or more of the plurality of textual values unable to be classified by the primary process; creating one or more clusters by grouping the plurality of textual values based, respectively, on the one or more distinct patterns output by the primary process and the Internet search of the secondary process; applying a similarity gauge to the textual values of each of the clusters to determine similarity or dissimilarity among the textual values of each cluster; filtering the textual values of each cluster to determine which textual values belong in each cluster and which textual values do not belong in each cluster, wherein the textual values that belong are cluster values; passing the cluster values for each cluster to a reference table; storing the cluster values for each cluster in the reference table for future access; and in response to a need for classification of a future set of textual values, accessing the reference table and lookup the future set of textual values in the reference table to determine whether any of the future set of textual values are cluster values.
-
Specification