by Prof Tim Dodwell
Machine Learning Workflow
8. Section Overview
In this section, you will not only learn about the components of the machine learning workflow but also how they relate to each other and how to apply them in practice. We'll cover topics such as data preprocessing, feature engineering and selection, model selection and evaluation, and finally deployment of models in production. By the end of this section, you will have a complete understanding of the machine learning workflow. Understanding the Machine Learning Workflow
Whilst different machine learning algorithms and tasks may require different data, questions, and deployments, it is important to remain flexible in your approach. However, as a newcomer to machine learning, it can be helpful to have a clear idea of the workflow so you know "what is next?"
Below is a simple diagram which outlines a typical machine learning workflow or pipeline, and one that we follow regularly through out this course.
-
Data preparation: The first step in a machine learning workflow is data preparation. Data from the real world can often be noisy, incomplete, or in different formats and must be cleaned in order for it to be suitable for use with a machine learning algorithm. Although not always glamorous, data preparation is an essential part of any AI project and having good processes in place to store and clean data will make you a more effective AI practitioner.
-
Feature Representation: An important aspect of machine learning is finding the most efficient way to represent data for a particular task. For example, when trying to identify pictures of cats, it would take a huge amount of data if we were to analyze the image pixel-by-pixel. A better approach would be to use unsupervised machine learning algorithms to detect salient features (such as pointy ears) which are common to all cats. Feature representation is a crucial part of the machine learning workflow.
-
Model Selection: In practice, it is not often the case that you know exactly what method to use on a dataset. Although there are some broad principles around which methods might work best for certain problems, in reality many different types of models and structures must be tested before making a final decision. This process is referred to as model selection.
-
Training Process: The training process is where the calculations take place. During this phase, models are adjusted to fit the training data and performance is evaluated. This process takes advantage of advanced optimization techniques, but luckily there are many accessible tools that can be used to leverage these methods without needing to understand their inner workings.
-
Validation: The final part of the traditional machine learning workflow involves validating the model's performance. This is done by setting aside a portion of the training data and testing how well the model can generalize to this unseen data. Depending on the application, it's important to take a close look at your model during this step and identify any areas where improvements can be made. This will help ensure that your model is ready to be deployed with confidence.
This isn't a linear process. At any stage, building a machine learning model requires you to loop back. Try a different model, tune parameters to adjust training, collect more data or explore different approach to representing inputs.
In this section we take a close look at each of these 5 parts, focusing particularly on training, loss functions and training curves.