Should I recommend you buy pasta or toothpaste?


Transaction logs are generally stored securely, but companies often don’t take advantage of this fact. A transaction can be any instantaneous exchange of goods, services, funds. The specificity of transactions is that several transactions may have certain points in common (like the buyer, the seller, the credit card, the ATM machine...), but each transaction on its own has to be considered individually in order to be scored, rather than by any point it holds in common with other transactions. As an example, a lot of rich information can be gleaned from considering previous transactions made on the same credit card, or by the same credit card holder, or even the same merchant. Yet still, in a case of fraud detection, it is the transaction itself which has to be detected as fraudulent, not the credit card, its holder or the merchant. And past transactions need to be linked to the current one to provide information.


Let’s consider a concrete example, where the transactions are orders from an online grocery delivery service.


We have general information about the order and the aisle (date of order, customer, name of the aisle, department to which the aisle belongs...). And we want to improve the performance of our model by exploiting the historic data concerning the orders/aisles, including all of the previous customer orders.

To build aggregates, relational data must respect the star schema. The identifier of the entity to be scored (i.e. here each order plus aisle) must be used as a join key between the Central and Peripheral tables.

In this specific case, this means that the identifier of the upcoming order plus aisle must be added to the Peripheral table.

At the start, the user id is the only joining key. But the entity we want to analyse and predict is the aisle in the context of an order (order plus aisle identifier). A simple query, using a join on the user id, will allow usto add the new identifier to the Peripheral table.

Then, in, we create a project, upload both the Central and Peripheral data sets, get insights, and improve the first model by requesting aggregates. From there, we see that 2 features from the Central table (the aisle and the department) are informative. And that aggregates have been built and evaluated positively, so they contribute to the new model.

In this way, we can score transactional data, improving the relevance of our predictions by integrating past transactions from the same customer.

Intrigued? To get a sense of what we do at, please visit our Free Trial page