Data Analytics for Financial Crime Monitoring
Expert-defined terms from the Professional Certificate in AI in Financial Crime Compliance course at UK School of Management. Free to read, free to share, paired with a globally recognised certification pathway.
**Anti #
Money Laundering (AML)**
: A set of laws, regulations, and procedures designed to prevent financial insti… #
AML compliance typically involves customer identification, transaction monitoring, and suspicious activity reporting.
**Artificial Intelligence (AI)** #
**Artificial Intelligence (AI)**
: The simulation of human intelligence in machines that are programmed to think… #
AI can be categorized as either weak (designed to perform a narrow task, such as voice recognition) or strong (general artificial intelligence that can perform any intellectual task that a human being can do).
**Automated Machine Learning (AutoML)** #
**Automated Machine Learning (AutoML)**
: The process of automating the machine learning pipeline, including data prepar… #
AutoML can help non-experts build machine learning models more efficiently and effectively.
**Bias** #
**Bias**
: A systematic error in a machine learning model that leads to unfair or inaccur… #
Bias can arise from a variety of sources, including biased data, biased algorithms, and biased decision-makers.
**Clustering** #
**Clustering**
: A type of unsupervised machine learning that involves grouping data points bas… #
Clustering can be used for customer segmentation, anomaly detection, and data exploration.
**Confusion Matrix** #
**Confusion Matrix**
: A table used to evaluate the performance of a machine learning model #
A confusion matrix contains four values: true positives, false positives, true negatives, and false negatives. These values can be used to calculate metrics such as accuracy, precision, recall, and F1 score.
**Deep Learning** #
**Deep Learning**
: A subset of machine learning that involves training neural networks with many… #
Deep learning models can learn complex patterns in large datasets and are often used for image and speech recognition, natural language processing, and game playing.
**Decision Tree** #
**Decision Tree**
: A type of supervised machine learning that involves creating a tree-like model… #
Decision trees can be used for classification or regression tasks and are often used for their interpretability.
**Ensemble Learning** #
**Ensemble Learning**
: A machine learning technique that involves combining the predictions of multip… #
Ensemble learning can be used to reduce overfitting, improve robustness, and increase diversity.
**Feature Engineering** #
**Feature Engineering**
: The process of selecting and transforming variables (features) to improve the… #
Feature engineering can involve techniques such as data cleaning, normalization, scaling, and dimensionality reduction.
**Feature Selection** #
**Feature Selection**
: The process of selecting a subset of relevant features from a larger set of va… #
Feature selection can help reduce the complexity of a model, improve interpretability, and reduce overfitting.
**False Negative** #
**False Negative**
: A prediction that incorrectly classifies a positive instance as negative #
False negatives can lead to missed opportunities or failures to detect fraud.
**False Positive** #
**False Positive**
: A prediction that incorrectly classifies a negative instance as positive #
False positives can lead to unnecessary investigations or false accusations.
**Fraud Detection** #
**Fraud Detection**
: The process of identifying and preventing fraudulent activity in financial tra… #
Fraud detection can involve machine learning models, rule-based systems, and expert systems.
**Fraud Ring** #
**Fraud Ring**
: A group of individuals or organizations that collaborate to commit fraud #
Fraud rings can be difficult to detect and investigate due to their complexity and coordination.
**General Data Protection Regulation (GDPR)** #
**General Data Protection Regulation (GDPR)**
: A regulation in EU law that governs the processing and movement of personal da… #
GDPR imposes obligations on organizations that process personal data, including the requirement to obtain consent, implement appropriate technical and organizational measures, and appoint a data protection officer.
**Hyperparameter Tuning** #
**Hyperparameter Tuning**
: The process of adjusting the parameters of a machine learning model to optimiz… #
Hyperparameter tuning can involve techniques such as grid search, random search, and Bayesian optimization.
**Imputation** #
**Imputation**
: The process of replacing missing or invalid data with estimated values #
Imputation can help improve the accuracy and completeness of a dataset and reduce bias.
**Interpretability** #
**Interpretability**
: The ability of a machine learning model to be understood and explained by huma… #
Interpretability is important for building trust in machine learning models and ensuring that they are used ethically and responsibly.
**K #
means Clustering**
: A type of unsupervised machine learning that involves partitioning data points… #
K-means clustering is a simple and efficient algorithm that can be used for customer segmentation, anomaly detection, and data exploration.
**Logistic Regression** #
**Logistic Regression**
: A type of supervised machine learning that involves estimating the probability… #
Logistic regression is a simple and interpretable model that can be used for classification tasks.
**Machine Learning** #
**Machine Learning**
: A subset of artificial intelligence that involves training algorithms to learn… #
Machine learning models can be supervised, unsupervised, or semi-supervised and can be used for a variety of tasks, including classification, regression, clustering, and anomaly detection.
**Natural Language Processing (NLP)** #
**Natural Language Processing (NLP)**
: A field of artificial intelligence that involves analyzing and generating huma… #
NLP can be used for tasks such as sentiment analysis, text classification, and machine translation.
**Neural Network** #
**Neural Network**
: A type of machine learning model inspired by the structure and function of the… #
Neural networks can learn complex patterns in large datasets and are often used for image and speech recognition, natural language processing, and game playing.
**Normalization** #
**Normalization**
: The process of scaling numerical data to a common range, typically between 0 a… #
Normalization can help improve the performance of a machine learning model and reduce bias.
**Overfitting** #
**Overfitting**
: A machine learning problem that occurs when a model is too complex and learns… #
Overfitting can lead to poor generalization performance and high variance.
**Principal Component Analysis (PCA)** #
**Principal Component Analysis (PCA)**
: A technique for dimensionality reduction that involves projecting high-dimensi… #
PCA can help improve the performance of a machine learning model and reduce noise.
**Random Forest** #
**Random Forest**
: An ensemble learning method that involves training multiple decision trees on… #
Random forests can improve the accuracy and robustness of a machine learning model and reduce overfitting.
**Recall** #
**Recall**
: A metric for evaluating the performance of a machine learning model that measu… #
Recall is also known as sensitivity or the true positive rate.
**Regression** #
**Regression**
: A type of supervised machine learning that involves estimating a continuous ou… #
Regression models can be linear or nonlinear and can be used for tasks such as prediction, forecasting, and trend analysis.
**Rule #
based System**
: A system that uses predefined rules to make decisions or predictions #
Rule-based systems can be useful for simple or well-defined tasks, but may struggle with complexity or uncertainty.
**Support Vector Machine (SVM)** #
**Support Vector Machine (SVM)**
: A type of supervised machine learning that involves finding a hyperplane that… #
SVMs can be used for classification or regression tasks and can handle nonlinear decision boundaries using kernel functions.
**Supervised Learning** #
**Supervised Learning**
: A type of machine learning that involves training a model on labeled data, whe… #
Supervised learning can be used for classification or regression tasks and can improve the accuracy and generalization performance of a machine learning model.
**Synthetic Data** #
**Synthetic Data**
: Artificially generated data that simulates real-world scenarios #
Synthetic data can be used for training machine learning models, testing algorithms, or validating hypotheses.
**Transfer Learning** #
**Transfer Learning**
: The process of using a pre-trained machine learning model as a starting point… #
Transfer learning can help improve the performance and efficiency of a machine learning model and reduce the amount of labeled data required.
**True Negative** #
**True Negative**
: A prediction that correctly classifies a negative instance as negative #
True negatives are important for ensuring that a machine learning model is not overly sensitive or prone to false positives.
**True Positive** #
**True Positive**
: A prediction that correctly classifies a positive instance as positive #
True positives are important for ensuring that a machine learning model is not overly conservative or prone to false negatives.
**Unsupervised Learning** #
**Unsupervised Learning**
: A type of machine learning that involves training a model on unlabeled data, w… #
Unsupervised learning can be used for clustering, dimensionality reduction, or anomaly detection tasks and can help uncover hidden patterns or structures in the data.
**Underfitting** #
**Underfitting**
: A machine learning problem that occurs when a model is too simple and fails to… #
Underf