At Inawisdom, we are very passionate about MLOps; we’ve been productionising and operationalising our clients’ machine learning models for over 6 years now. We’ve gathered a wealth of experience and best practice over those years that we have always been excited to share – from insights on what’s next in MLOps, to deep dives into key concepts and amazing use cases that our clients have asked us to be involved in.
As Inawisdom’s CTO of AI and ML, I am very excited that in this blog I can introduce you to our Inawisdom MLOps Framework, MALEO, a new capability within our RAMP platform. MALEO is built from years of experience and with all our best practices, to provide our clients with a way of rapidly delivering Machine Learning at the highest possible quality. This will allow our clients to unlock the value contained in their data even faster than before.
The MALEO is collection of components built around the same four tenets of our RAMP platform:
- Reusability: The components within MALEO are built to be used across multiple use cases and are customised via configuration options.
- Adaptability: MALEO is built to work within the wide range of AWS setups and support the widest possible choice of AWS native services and technology options.
- Standardisation: MALEO standardises how ML Ops is done across all use cases within a business, allowing for consistency across Data Science teams.
- Modularity: MALEO has loosely coupled components with clear responsibilities and interfaces. This has two main advantages: keeping costs low by using only the components required for a particular use case and supporting the adaptability of MALEO so that clients can drop in their own preferred tooling if required.
We have built MALEO from the bottom-up, focusing on the key challenges our clients face and enabling our Data Science teams to deploy their models as fast as possible. The initial version of the MALEO has following components:
- Data Science Code Delivery: MALEO enables the delivery and productionisation of Data Science code by supporting the full Machine Learning life cycle. MALEO builds and packages Data Science code into versioned artefacts with complete auditability and quality assurance built in, using the AWS continuous delivery stack (CodeBuild, CodePipelines, Code Artifact and CodeCommit).
- Retrain Pipelines: The Data Science code, with your data, can then be used to create your Machine Learning models. MALEO allows you to do this by using pipelines to validate, transform, train, and evaluate stages of your models. The pipelines are implemented using AWS StepFunctions and are configurable based on what your use case needs. The running of the stages is performed using Docker Containers in AWS Fargate to give you maximum flexibility.
- Model Promotion: Once your models have been evaluated and you’re happy with the results, you’ll want to promote the models so that your inference process can use them. MALEO uses its inbuilt model registry to record this approval and then, using a pipeline powered by AWS StepFunctions, deploy your model to AWS ECS so that it can be used as task or service.
- Model Registry: MALEO includes a “lite” model registry, which stores the Docker Images in ECR, the trained models and data sets in S3, and metadata about a model in DynamoDB.
MALEO has full MLOps life cycle support, with the following steps:
- Validation: The validation of datasets for training and inference
- Transform: The transforming of training datasets using features and encoding, including splitting data into train and test datasets
- HPO: The optimisation of hyper-parameters for a model, to improve the accuracy or generalisation of a model
- Train: The training of a model on a dataset prepared by the transform stage
- Evaluate: The evaluation of a trained model against a holdout data set or against a set of metrics
- Predict (Deploy): The ability to carry out inferences and get predictions from an approved model
The invocation of MLOps life cycle steps is automated using AWS StepFunctions that can be triggered on new training data or changes to the logic of the model. The AWS StepFunctions for each step are driven from configuration. Here is an example of running a pipeline:
As mentioned above, MALEO has a basic model registry to track models; this is updated by the pipelines. The registry uses DynamoDB, S3 and ECR – which I’ve written about in more detail here.
The neat thing about the model registry is that it is decoupled from the pipelines by being event-driven. The pipelines (StepFunctions) send events to bus in Amazon EventBridge; then, the Lambda in the event registry listens to the events and updates the metadata. Here are the tables containing the metadata and extract from model summary:
Deployment, Inference & Runtimes
MALEO is designed to support deploying our Data Science code across a range of deployment types and runtimes. The following are supported as part of the MVP:
- Real-Time using API Gateway and Lambda
- Mini Batch using API Gateway and Lambda
- Full Batch using ECS Fargate
The key enabler here is Docker. The Data Science code uses Docker as its runtime, which means it can be ported across these AWS services with ease. Soon MALEO will also have a SageMaker option using the Bring Your Own Container pattern.
To be as rapid and as reusable as possible, MALEO is a configuration-based approach for the delivery of Machine Leaning models. The configuration options include what runtimes to build, which steps of the pipelines to run, and how to run them. Below are two examples of the configuration:
It was an absolute pleasure being involved in the creation of MALEO. We had hours and hours around a white board (in person) diving deep. There were times where ideas just flowed, as we debated what MALEO would and wouldn’t do. We decided to keep our MVP scope tight and focused on delivering a thin slice to prove the overall design and the added features our clients need. This means we have an exciting roadmap for MALEO going forward – and top of the list is SageMaker support.
MALEO is our new accelerator, but the awesome thing about it is that to us, it’s not new at all. It’s everything we do, every day, to productionise and operationalise Machine Learning models for our clients. We’re really excited to be able to package this experience and best practice into a tool that allows businesses to deliver high-quality ML models faster. If you would like to know more or would like to see a demonstration, then please get in touch.