• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Unsupervised Learning Part 2: The AML Connection

February 22, 2018   FICO

Putting innovation into production is a big theme at FICO, as we commercialize analytic breakthroughs from FICO Labs into the products our customers rely on, worldwide. Recently, this has included applying advanced unsupervised learning to money laundering, one of the many domains in which FICO technology fights financial crime.

In my first blog about unsupervised learning, I took a deep dive into this machine learning technique, which draws inferences in the absence of outcomes. In a nutshell: “Good unsupervised learning requires more care, judgement and experience than supervised, because there is no clear, mathematically representable goal for the computer to blindly optimize to without understanding the underlying domain.”

Springboarding from that blog, today’s post covers three categories of unsupervised learning that FICO has investigated, refined and put into our anti-money laundering (AML) solutions.

The State of the Art in Unsupervised Analytics

Category 1: Finding distance-based outliers relative to training points

This category of unsupervised learning quantifies “outlierness” under the principle that if a query point is close to many training points, it is considered ordinary, but if it is further from them, then it should score higher (e.g., to denote greater outlierness). These are the most well-known and intuitive approaches, and what a data scientist would typically respond with if given a pop quiz to find outliers in multi-dimensional data. Here are two classic techniques:

  • Near-neighbor statistics, for example, the average distance from the query point to its M closest neighbors, with higher values corresponding to greater outlies. Arbitrariness in the metric is a difficult problem, and run-time scoring requires potentially expensive searches through saved data.
  • Clustering, a technique that, in training, estimates a finite number of cluster centers, thus characterizing the dataset in a fewer number of prototypical points. Score points are determined by the query points’ distance to their closest cluster center. Clustering suffers from an arbitrary free parameter that defines the number of clusters.

These techniques hinge on the choice of metric, and excess feature cross-correlation can be a big problem.  A Mahalanobis distance may help somewhat with correlation, but performs poorly with the frequently encountered non-Gaussian distributions and categorical features. Focusing on the difficulties involved in defining a proper metric becomes a matter of art and science trying to deal with cross-correlation and improper variable scaling, which emphasizes some outliers while being less sensitive to others.

Category 2: Machine learning of underlying data

Unsupervised learning methods that adapt to underlying data present a more sophisticated approach with explicit machine learning. They are less commonly used and understood than the Category 1 methods, but have some support in published scholarly and scientific literature. Based on ML concepts, these methods are more adaptive to the underlying data even when they have complex distributions and correlations.

  • An explicit dot-product/metric is necessary in the kernel function and has major effects on results
  • A significant number of training data must be stored for scoring
  • The raw score has no quantitative interpretation and often has a very ugly empirical distribution

Furthermore, the training procedure for the OCSVM, like most support vector models, does not scale sufficiently to the sizes of data sets now commonly encountered.

  • Neural-network autoencoders train multi-layer neural networks to predict their inputs, with a hidden layer “bottleneck” of smaller dimensionality than the original space. Outliers are those points that are less predictable than inliers. Autoencoders adapt to nonlinearity well, and explicitly reduce redundancy in inputs by mapping to a lower-dimensional manifold. In addition, being a neural network, training is easily scalable to large data set sizes, scoring is fast and score distributions are smooth.

    Neural network practitioners can readily use this technique to adapt to combined continuous and categorical features. There are still some arbitrary choices in the distance metric, but the arbitrary element has less direct impact than in previously described methods in which the metric intimately controls the geometry.

Category 3: Probabilistic and topological detection

Probablistic and topological methods showcase FICO’s latest machine learning innovations; our data science team is unaware of any other technique that has all the associated advantages. FICO has developed and implemented these in our labs from scratch, mindful of our experience with the previously discussed varieties of unsupervised learning.

  • Probabilistic neural-network outlier detection obtains a true calibrated probability estimate by starting from an analytically compact probability model. Bayesian principles are used to correct and enhance the first stage with a multi-layer neural network, to best represent the observed training data. With probabilistic neural-network outlier detection, training is scalable to large data sizes; scoring is rapid and provides smooth, calibrated probability estimates; and through the neural network, learns complex correlations. Furthermore, the flexibility of the starting probability model has let FICO’s data science team overcome the vexing boundaries and constraints of methods with explicit distances, such as the one-class SVM.
  • “Topological” outlier detection finds outliers that show patterns and relationships which are rarely or never seen in the baseline data. Some new data may be an outlier compared to the training database by being exaggerations or mutations of normal data. For instance, professional athletes are outliers in being often faster and stronger than 99.9% of us. But other sorts of outliers, which we call “topological” or “structural”, are even more exotic, showing patterns and relationships never or rarely seen in the baseline data. So, compared to professional athletes, imagine an otherwise normal person who happens to have an entirely different configuration of internal organs. That individual would be a topological outlier!

Unsupervised Learning Topological Outliers Unsupervised Learning Part 2: The AML Connection

Advanced Analytics for AML

FICO’s advanced analytics for anti-money laundering incorporates our most sophisticated machine learning-based outlier detection technologies. One key feature of our product is a quantification, from transaction and other data, of the degree of unusual and risky behavior that a few customers exhibit relative to the bulk of low-risk customers. Our fundamental innovations in unsupervised modeling and outlier scoring have improved the sophistication, palatability and success to tackle one of the world’s most elusive and disturbing channels: money laundering and associated crimes against humanity.

Follow me on Twitter @ScottZoldi.

Let’s block ads! (Why?)

FICO

Connection, Learning, Part, Unsupervised
  • Recent Posts

    • TripleBlind raises $8.2 million for its encrypted data science platform
    • Ba’al comes to CPAC, Ted Cruz jokes about his Cancun trip
    • Optimizing data migration/integration with Power Platform
    • AI Weekly: Biden calls for $37 billion to address chip shortage
    • NOT WHAT THEY MEANT BY “BUILDING ON THE BACKS OF….”
  • Categories

  • Archives

    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited