Machine Learning Fundamentals — Glossary · Professional Certificate in AI in Robotic Process Automation

Machine Learning Fundamentals #

Machine Learning (ML) is a subset of artificial intelligence (AI) that focuses o… #

In the Professional Certificate in AI in Robotic Process Automation course, understanding the fundamentals of Machine Learning is crucial for developing intelligent automation solutions.

Algorithm #

An algorithm is a set of rules or instructions that a computer follows to solve… #

In Machine Learning, algorithms are used to train models on large datasets to make predictions, classify data, or optimize processes.

Artificial Intelligence (AI) #

Artificial Intelligence is the simulation of human intelligence processes by mac… #

AI encompasses various technologies, including Machine Learning, natural language processing, and computer vision, to perform tasks that typically require human intelligence.

Classification #

Classification is a type of Machine Learning task where the goal is to predict t… #

For example, classifying emails as spam or non-spam is a common classification problem in text analysis.

Clustering #

Clustering is a Machine Learning technique used to group similar data points tog… #

It is often used in customer segmentation, anomaly detection, and recommendation systems.

Deep Learning #

Deep Learning is a subset of Machine Learning that uses artificial neural networ… #

Deep Learning models have shown remarkable performance in image recognition, speech recognition, and natural language processing.

Feature Engineering #

Feature Engineering is the process of selecting, extracting, or creating meaning… #

It involves transforming data into a format that is easier for algorithms to interpret.

Hyperparameter #

Hyperparameters are parameters that are set before the training process of a Mac… #

Examples of hyperparameters include learning rate, number of hidden layers, and batch size. Tuning hyperparameters is essential for optimizing model performance.

Label #

In Machine Learning, a label is the output or target variable that the model aim… #

For example, in a classification task, the labels represent the different classes that the model needs to assign to input data.

Model Evaluation #

Model Evaluation is the process of assessing the performance of a Machine Learni… #

Common evaluation metrics include accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC-ROC).

Neural Network #

A Neural Network is a computational model inspired by the structure and function… #

It consists of interconnected nodes (neurons) arranged in layers, where each neuron processes input data and passes the output to the next layer.

Overfitting #

Overfitting occurs when a Machine Learning model performs well on the training d… #

This usually happens when the model is too complex or when it memorizes noise in the training dataset.

Preprocessing #

Preprocessing is the initial step in data preparation where raw data is cleaned,… #

Common preprocessing techniques include data normalization, feature scaling, and handling missing values.

Regression #

Regression is a type of Machine Learning task where the goal is to predict a con… #

Examples of regression problems include predicting house prices, stock prices, and demand forecasting.

Supervised Learning #

Supervised Learning is a type of Machine Learning where the algorithm learns fro… #

The model aims to predict the correct output for new, unseen data based on the patterns learned from the training data.

Unsupervised Learning #

Unsupervised Learning is a type of Machine Learning where the algorithm learns p… #

The goal is to discover hidden structures or groupings within the data without the need for predefined output labels.

Validation #

Validation is the process of assessing the performance of a Machine Learning mod… #

It helps prevent overfitting and provides an estimate of the model's performance in the real world.

Feature Selection #

Feature Selection is the process of choosing the most relevant features or varia… #

By selecting the right features, the model can focus on the most important information for making accurate predictions.

Cross #

Validation:

Cross #

Validation is a technique used to assess the performance and generalization ability of Machine Learning models by splitting the dataset into multiple subsets for training and testing. It helps provide a more reliable estimate of the model's performance compared to a single train-test split.

Ensemble Learning #

Ensemble Learning is a Machine Learning technique that combines multiple models… #

Popular ensemble methods include Random Forest, Gradient Boosting, and AdaBoost, which leverage the wisdom of crowds to make better predictions.

Feature Extraction #

Feature Extraction is the process of automatically selecting or transforming raw… #

It helps reduce the dimensionality of the data and improves model efficiency.

Gradient Descent #

Gradient Descent is an optimization algorithm used to minimize the error or loss… #

It is a fundamental technique for training neural networks and other complex models.

Loss Function #

A Loss Function is a mathematical function that quantifies the difference betwee… #

The goal is to minimize the loss function during training to improve the model's accuracy and performance.

Optimization #

Optimization is the process of adjusting the parameters of a Machine Learning mo… #

Common optimization techniques include Gradient Descent, Stochastic Gradient Descent, and Adam optimization.

Regularization #

Regularization is a technique used to prevent overfitting in Machine Learning mo… #

Popular regularization methods include L1 (Lasso) and L2 (Ridge) regularization, which help improve model generalization.

Underfitting #

Underfitting occurs when a Machine Learning model is too simple to capture the u… #

It usually happens when the model is not complex enough to represent the true relationship.

Anomaly Detection #

Anomaly Detection is a Machine Learning task that involves identifying rare or u… #

It is used in fraud detection, network security, and predictive maintenance to detect outliers and potential threats.

Bias #

Variance Tradeoff:

The Bias #

Variance Tradeoff is a fundamental concept in Machine Learning that illustrates the balance between bias (underfitting) and variance (overfitting) in model performance. Finding the right balance is essential for building models that generalize well to new data.

Convolutional Neural Network (CNN) #

A Convolutional Neural Network (CNN) is a type of deep neural network designed f… #

CNNs use convolutional layers to extract features hierarchically and learn spatial patterns from the input data.

Decision Tree #

A Decision Tree is a simple yet powerful Machine Learning algorithm that uses a… #

Decision Trees are easy to interpret and can handle both categorical and numerical features.

Dimensionality Reduction #

Dimensionality Reduction is the process of reducing the number of input features… #

Techniques like Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) are commonly used for dimensionality reduction.

Ensemble Method #

An Ensemble Method is a Machine Learning technique that combines multiple base l… #

Ensemble methods leverage the diversity of individual models to improve overall performance and robustness.

Grid Search #

Grid Search is a hyperparameter tuning technique that exhaustively searches thro… #

It is commonly used in Machine Learning to fine-tune models and improve accuracy.

K #

Nearest Neighbors (KNN):

K-Nearest Neighbors (KNN) is a simple yet effective Machine Learning algorithm u… #

KNN makes predictions based on the majority vote of the k nearest neighbors in the feature space.

Logistic Regression #

Logistic Regression is a statistical model used for binary classification tasks… #

Despite its name, logistic regression is a linear model that predicts the probability of a sample belonging to a specific class.

Naive Bayes #

Naive Bayes is a probabilistic Machine Learning algorithm based on Bayes' theore… #

Naive Bayes is commonly used for text classification, spam filtering, and sentiment analysis due to its simplicity and efficiency.

One #

Hot Encoding:

One #

Hot Encoding is a technique used to convert categorical variables into a numerical format that Machine Learning algorithms can process. Each category is represented by a binary vector where only one element is "hot" (1) while the others are "cold" (0).

Random Forest #

Random Forest is an ensemble learning algorithm that combines multiple decision… #

Random Forest builds each tree on a random subset of features and aggregates the predictions to make more robust decisions.

Reinforcement Learning #

Reinforcement Learning is a Machine Learning paradigm where an agent learns to i… #

Reinforcement Learning is used in autonomous driving, game playing, and robotics to learn complex behaviors through trial and error.

Support Vector Machine (SVM) #

A Support Vector Machine (SVM) is a powerful Machine Learning algorithm for clas… #

SVM finds the optimal hyperplane that separates different classes in the feature space by maximizing the margin between data points.

Text Mining #

Text Mining, also known as text analytics, is the process of extracting meaningf… #

Machine Learning algorithms are used in text mining tasks such as sentiment analysis, topic modeling, and named entity recognition.

Transfer Learning #

Transfer Learning is a Machine Learning technique where a model trained on one t… #

Transfer Learning is used to leverage knowledge learned from large datasets to improve performance on smaller, domain-specific datasets.

Word Embedding #

Word Embedding is a technique used to represent words as dense vectors in a cont… #

Word embeddings capture semantic relationships between words and are widely used in natural language processing tasks such as text classification and machine translation.