Just before Christmas, I was lucky enough to attend re:Invent 2022. This was my third time at re:Invent and my second time with Inawisdom. As an AWS Ambassador, I had very busy week with lots of exciting things to do – including filming some videos with AWS (coming soon!) and of course, attending the keynotes to catch all the latest announcements.
So far in our ReCap series, we’ve looked at highlights from the data and serverless announcements, so today’s focus is on all things AI and ML.
Amazon Comprehend for IDP
Over the last year at Inawisdom, we’ve seen a lot of interest from our customers around Intelligent Document Processing (IDP). With a number of IDP projects in play, we decided to build an accelerator using Amazon Textract and Comprehend as core services. Because the two services use completely different inputs and outputs, this required some Lambdas to do the data transformation and a StepFunction to bring it all together.
So it was exciting to see that Comprehend now has direct support for Word, PDF and image inputs. This will remove some of the boiler plate code for simple use cases, although for complex ones you’ll still want to decouple extraction from understanding.
Learn more on the AWS launch blog >
Amazon SageMaker Geospatial Support
AWS also announced built-in Geospatial Support in SageMaker… and I love it! I’ve seen first-hand how time consuming it can be to work with geospatial data.
In our engagement with logistics provider Aramex, working out the optimal route and exact location of each delivery address was key. To do this, we had to label millions of addresses containing latitude and longitude into polygons to see how close they were to a planned route. Luckily the labelling of the data was the easy part, as it was all stored in RedShift – using its geospatial support, we managed to get 5 years of data labelled in 3 hours!
However, when it came to the ML models, we had to do it the long way, developing and optimising nearly a thousand lines of keras layers for a TensorFlow. AWS have now simplified the Machine Learning side of things with Geospatial Support, including a dedicated algorithm, which will save a lot of time.
Learn more on the AWS launch blog >
Shadow Testing for SageMaker Endpoints!
This update has been much needed, making A/B testing possible within SageMaker Endpoints! SageMaker Endpoints has supported deploying multiple variants of the same model for at least the past 4 years, as well as allowing you to have a version of the model trained on different data or two different types of models deployed.
However, you had to split the amount of traffic to each model by percentage, which made comparison of individual predictions difficult since the models might not be deterministic. With this new capability, you can send the same traffic to all your models, allowing you to compare the results more accurately.
Learn more on the AWS launch blog >
SageMaker ML Governance
Amazon SageMaker Governance was a great announcement, although it’s not really a new service. It is, however, bringing together several services in a more unified and consumable way, to aid in the governance of models. Here are the key elements:
- Role Manager: This is essentially a wizard for Machine Learning on IAM. It creates the IAM Roles needed for a data scientist to access only the Data Sets and Elements of SageMaker needed for a project and can help enforce tagging.
- Model Cards: Cards provides the ability to capture additional information around the standard lineage provided by SageMaker that pushes into the SageMaker Model Registry. This can include things like who owns the model, the business problem it addresses, who can use the model, and where the data was sourced from.
- Model Dashboards: This is getting closer to what I see as the ‘360 view’ of the model; it brings together SageMaker Model Monitor and CloudWatch so you can see the performance of all your models from one dashboard. I would, however, like for it to tell you how much a model is costing you and advise on any savings (building on the inference recommender announced last year).
Next-Generation SageMaker Notebooks
There was a series of useful new features in SageMaker Studio Notebook announced, including:
- Simplified Data Preparation: Using SageMaker Data Wrangler (which is an awesome tool), this allows you to see the characteristics of your data straight from a notebook.
- Shared Spaces: This snuck in with hardly any mention! But for me this will be a massive win, especially if you have a medium-to-large team working in SageMaker. Up until now, the ability to share notebooks was poor – you had to commit them to GIT, use session sharing and/or S3. Shared Spaces uses a File Share on EFS, which is much simpler – although I am keen to see how things are version-controlled.
- Automatically Convert Notebook Code to Production-Ready Jobs: AWS have filled a gap here that others have been focusing on for a while, and that’s how to take a Notebook and run it as Job. This is a good feature to speed up the deploying of ML; however, it does not address the performance of Notebooks and means those Notebooks have been cleansed of experimental code. So it does not replace the ML Engineer!
AWS SimSpace Weaver
This one is not strictly Machine Learning related, but for me it was one of the coolest announcements. SimSpace Weaver allows you to build 3D worlds using the Unreal Engine and run simulations to see the impact of particular events. This is useful in things like city planning – for example, seeing the impact of a new road layout or what would happen if a river bursts. The interesting thing is that it can pull in data from other sources. This means Machine Learning could power those source systems.
Other SageMaker updates
Just a few more minor announcements worth noting:
- New UI for Amazon SageMaker Studio: Not really a new feature but upgrade to the Studio interfaces
- Real-Time and Batch Inference in Amazon SageMaker Data Wrangler: Amazon SageMaker Data Wrangler could only be used prepare training data sets. However now it can used inference.
- Amazon SageMaker Data Wrangler Support for SaaS Applications as Data Sources: Allows to you use SaaS data directly from SageMaker by using AWS App Flow to ingest data in the S3 and tracks it in the Glue Data Catalog.
The main thing I am really pleased to see from these announcements, apart from AWS SimSpace Weaver, is that Amazon are doubling down on improving or combining existing services. The past few years have seen a lot of “first phase” services announced, which were all very disconnected from a user experience point-of-view. There now seems to be a push to bring it all back together and improve the overall user experience.
This wraps up our re:Invent ReCap series! Overall, it was a great event, with lots of cool things happening and plenty of opportunities to catch up with the AWS community. I was so pleased to see everything back to full scale this year, all action and just as crazy as it was before the pandemic.
Time to start counting down to re:Invent 2023!