Detecting and measuring risk with predictive models using content mining
First Claim
1. A system for detecting risk in a transaction, the system comprising:
- a database of unique merchant names, each merchant name associated with a merchant cluster, each merchant name being textual data or other high categorical data;
at least one computing system implementing a transaction processing component that receives a transaction of a plurality of transactions between a consumer and a merchant, the transaction process component deriving transaction data from the transaction, the transaction processing component determining from the database a unique merchant identity for the merchant; and
at least one computing system implementing a statistical model that receives the derived transaction data and the unique entity identity to output a score indicative of a level of risk in the transaction;
wherein the statistical model uses clustered context vectors generated by;
selecting a plurality of high categorical information elements from the plurality of transactions,linking each high categorical information element with a context vector in a vector space such that high categorical information elements that co-occur in the plurality of transactions have context vectors that are similarly oriented in the vector space, the co-occurrence representing that context vectors corresponding to the co-occurring high categorical information elements are less than a predetermined distance apart in the vector space for more than a predetermined number of transactions, andclustering the context vectors of the high categorical information elements into a number of clusters that is less than number of high categorical information elements, each cluster being a low categorical information cluster.
1 Assignment
0 Petitions
Accused Products
Abstract
Computer implemented methods and systems of processing transactions to determine the risk of transaction convert high categorical information, such as text data, to low categorical information, such as category or cluster IDs. The text data may be merchant names or other textual content of the transactions, or data related to a consumer, or any other type of entity which engages in the transaction. Content mining techniques are used to provide the conversion from high to low categorical information. In operation, the resulting low categorical information is input, along with other data, into a statistical model. The statistical model provides an output of the level of risk in the transaction. Methods of converting the high categorical information to low categorical clusters, of using such information, and other aspects of the use of such clusters are disclosed.
-
Citations
28 Claims
-
1. A system for detecting risk in a transaction, the system comprising:
-
a database of unique merchant names, each merchant name associated with a merchant cluster, each merchant name being textual data or other high categorical data; at least one computing system implementing a transaction processing component that receives a transaction of a plurality of transactions between a consumer and a merchant, the transaction process component deriving transaction data from the transaction, the transaction processing component determining from the database a unique merchant identity for the merchant; and at least one computing system implementing a statistical model that receives the derived transaction data and the unique entity identity to output a score indicative of a level of risk in the transaction; wherein the statistical model uses clustered context vectors generated by; selecting a plurality of high categorical information elements from the plurality of transactions, linking each high categorical information element with a context vector in a vector space such that high categorical information elements that co-occur in the plurality of transactions have context vectors that are similarly oriented in the vector space, the co-occurrence representing that context vectors corresponding to the co-occurring high categorical information elements are less than a predetermined distance apart in the vector space for more than a predetermined number of transactions, and clustering the context vectors of the high categorical information elements into a number of clusters that is less than number of high categorical information elements, each cluster being a low categorical information cluster.
-
-
2. A computer-implemented method of determining a level of risk in a transaction, the method comprising:
-
receiving, by at least one data processor, a transaction of a plurality of transactions between a first entity and a second entity, the first entity comprising a consumer associated with the transaction and the second entity comprising a merchant associated with the transaction; deriving, by at least one data processor and by using a selecting process, high categorical information elements from the transaction, the high categorical information elements being text data; determining, by at least one data processor, a low categorical information cluster that is closest in vector space to a context vector that is derived from context vectors of the high categorical information elements, the determining comprising; linking, by at least one data processor, each high categorical information element with a context vector in a vector space such that high categorical information elements that co-occur in the plurality of transactions have context vectors that are similarly oriented in the vector space, the co-occurrence representing that context vectors corresponding to the co-occurring high categorical information elements are less than a predetermined distance apart in the vector space for more than a predetermined number of transactions, and clustering, by at least one data processor, the context vectors of the high categorical information elements into a number of clusters that is less than number of high categorical information elements, each cluster being a low categorical information cluster; and applying, by at least one data processor, the low categorical information cluster and data derived from the transaction to a predictive model to output the level of risk in the transaction to detect if the transaction is fraudulent. - View Dependent Claims (3)
-
-
4. A computer implemented method for determining a level of risk of a transaction between a consumer and an entity, the method comprising:
-
storing, by at least one data processor, a plurality of entity clusters, the plurality of entity clusters determined from statistical co-occurrences of the entity identified in a plurality of transactions, the statistical co-occurrences identified using entity identifiers or other high categorical data, the entity comprising a merchant associated with the transaction, the plurality of entity clusters being generated by; selecting a plurality of high categorical information elements from the plurality of transactions from the plurality of transactions, linking each high categorical information element with a context vector in a vector space such that high categorical information elements that co-occur in the plurality of transactions have context vectors that are similarly oriented in the vector space, the co-occurrence representing that context vectors corresponding to the co-occurring high categorical information elements are less than a predetermined distance apart in the vector space for more than a predetermined number of transactions, and clustering the context vectors of the high categorical information elements into a number of entity clusters that is less than number of high categorical information elements, each entity cluster being a low categorical information cluster; receiving, by at least one data processor, data from said transaction between said consumer and said entity; determining, by at least one data processor, an entity cluster of the plurality of entity clusters linked to the entity of the transaction based on at least one or the entity identifier and the other high categorical data; and applying, by at least one data processor, the determined entity cluster in conjunction with data derived from the transaction to a predictive model, the predictive model outputting a level of risk of the transaction to detect if the transaction is fraudulent. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer implemented method for determining a level of risk of a transaction between a consumer and an entity, the method comprising:
-
storing, by at least one data processor, a plurality of entity clusters, the plurality of entity clusters determined from statistical co-occurrences of the entity identified in a plurality of transactions, the statistical co-occurrences identified using entity identifiers or other high categorical data, the entity comprising a merchant associated with the transaction, the plurality of entity clusters being generated by; selecting a plurality of high categorical information elements, linking each high categorical information element with a context vector in a vector space such that high categorical information elements that co-occur in the plurality of transactions have context vectors that are similarly oriented in the vector space, the co-occurrence representing that context vectors corresponding to the co-occurring high categorical information elements are less than a predetermined distance apart in the vector space for more than a predetermined number of transactions, and clustering the context vectors of the high categorical information elements into a number of entity clusters that is less than number of high categorical information elements, each entity cluster being a low categorical information cluster; receiving, by at least one data processor, data from said transaction between said consumer and said entity; determining, by at least one data processor, an entity cluster of the plurality of entity clusters associated with the entity of the transaction based on at least one or the entity identifier and the other high categorical data; determining, by at least one data processor, an affinity measure indicative of an affinity of a consumer to the entity cluster; and applying, by at least one data processor, the affinity measure in conjunction with data derived from the transaction to a predictive model, the predictive model outputting the level of risk of the transaction to characterize whether the transaction is fraudulent, wherein the affinity measure comprises at least one of transaction frequency, total currency volume, and average transaction amount, which are associated with transactions between the consumer and one or more entities that are included in the entity cluster. - View Dependent Claims (17, 18, 19)
-
-
20. A method of determining the level of risk in a transaction by a consumer, the method comprising:
-
storing, by at least one data processor, a plurality of entity clusters, the plurality of entity clusters determined from statistical co-occurrences of the entity identities in a plurality of transactions, the entity identities being entity identifiers or other high categorical data, the plurality of entity clusters being generated by; selecting a plurality of high categorical information elements from the plurality of transactions, linking each high categorical information element with a context vector in a vector space such that high categorical information elements that co-occur in the plurality of transactions have context vectors that are similarly oriented in the vector space, the co-occurrence representing that context vectors corresponding to the co-occurring high categorical information elements are less than a predetermined distance apart in the vector space for more than a predetermined number of transactions, and clustering the context vectors of the high categorical information elements into a number of entity clusters that is less than number of high categorical information elements, each entity cluster being a low categorical information cluster; receiving, by at least one data processor, data of a current transaction between a consumer and an entity, the entity comprising a merchant associated with the current transaction; determining, by at least one data processor, a predicted entity cluster in which the consumer is predicted to perform a future transaction based on transactions of the consumer prior to the current transaction; determining, by at least one data processor, an actual entity cluster associated with the entity of the transaction based on the entity identity, wherein said entity identity is an entity identifier or other high categorical data; determining, by at least one data processor, a measure of difference between the predicted entity cluster and the actual entity cluster; and applying, by at least one data processor, the determined measure of difference in conjunction with data derived from the transaction to a predictive model, and outputting the level of risk of the transaction to detect if the transaction is fraudulent.
-
-
21. A method for implementation by one or more data processors comprising:
-
receiving, by at least one data processor, data characterizing a transaction of transactions between a consumer and merchant; linking, by at least one data processor, the merchant with one of a plurality of pre-defined merchant clusters, the merchant clusters defined by a frequency at which merchants historically shared customers, the merchant clusters being generated by; selecting a plurality of high categorical information elements from the transactions, linking each high categorical information element with a context vector in a vector space such that high categorical information elements that co-occur in the transactions have context vectors that are similarly oriented in the vector space, the co-occurrence representing that context vectors corresponding to the co-occurring high categorical information elements are less than a predetermined distance apart in the vector space for more than a predetermined number of transactions, and clustering the context vectors of the high categorical information elements into a number of merchant clusters that is less than number of high categorical information elements, each merchant cluster being a low categorical information cluster; determining, by at least one data processor, an affinity of the customer to an associated merchant cluster; generating, by at least one data processor, a risk score using at least one predictive model and at least the determined affinity; and initiating, by at least one data processor, provision of data characterizing the generated score. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
Specification