Anti-money laundering plays a significant role in daily operation of banks. Effective anti-money laundering activities can curb economic criminal activities. However, determining if a transaction record is a money laundering activity is boring and easy to make mistake. Traditionally, banks use rule-based model to filter those obviously non-money laundering record and manually reviewing the rest.

Rule-based model helps a lot but still costing lots of time in human manually reviewing due to the small coverage. Besides, although it works well in known traditional cases but lack of insights in unknown cases. Therefore, using machine learning model can be a possible solution for finding invisible associations among all features.

Webank, the first Internet bank of China, currently take advantage of machine learning models such as logistic regression model and extended usage features up to more than 900. Nevertheless, these models suffer a lot from lack of data, more specifically, the positive case of money laundering. Without plenty of positive cases, models get bad evaluating performance and can hardly obtain inferring ability for unseen cases.

Fortunately, Webank introduces a novel approach called Federated Learning to solve this problem. Federated learning enables multiple institutions building a common model without sharing their data physically. In order to achieve this, FATE (Federated AI Technology Enabler), an open-source project initiated by Webank’s AI Department has been taken advantage.

With the usage of FATE, Webank united several banks and trained anti money laundering models jointly. The process of this cooperation can be described as follows:

 

 

The federated trained model is called Homogeneous Logistic Regression(Homo-LR). All the banks provide homogeneous type of data which means they have same features with different sample ids. Through this combination, the overall data set includes plenty of positive cases and makes the model perform well. The principle of homo-LR has been shown below.

 

 

In every iteration, each party trains their model by own data and send their model weights or gradients to a third Party called Arbiter. Arbiter aggregated all these model weights or gradients and then update back to each party. Data of each party will never come out of its own database while the model is trained by all of them.

 

The inferring process is also easy to understand and execute.

This cooperation has achieved great success which improve the performance of model Significantly. The AUC of lr model has increase 14% and significantly reduce the number and difficulty of manual reviewing works.

Here is a comparison of effect before and after using joint models. The numbers of each square represent for the probability of being a money laundering case.

If we use traditional unilateral model, these cases cannot be identified as suspicious cases. Then, when we review the details of two red cases, they have characteristics of illegal settlement type underground money houses which has high probability of using our e-bank account for transition.

 

Furthermore, the AUC will increase with the modeling data increase which improve the need of increasing data.

 

As a result, if we use rule-based model only, more than one thousand cases need to be reviewed per day. However, this number has been decrease to 38 with the usage of federated homo-LR.

With utilization of FATE, the data island problem has been solved creatively which greatly expand the range of artificial intelligence applicable. Simultaneously, user privacy and institution data