Tonight I’m presenting at the Big Data and Machine Learning – London meetup.  I appreciate it’s a bit late notice but there’s still a few spaces left, based on the over-occupancy planning model of the organisers that most meetups suffer a 60{d093e0ed3b34f7b2723df508a98ef00fff93fa564feecd6d0dc6e7ce42b939fd} no-show rate.

The topic…

Rather than the usual slideware, I thought I’d get brave.  Using a public dataset, we’ll look at how you might use a Jupyter notebook to characterise and explore the dataset.  In a “here’s one I made earlier” kind of way, I’ll show the kind of data quality issues that come up with imperfect real world scenarios.  Then I’ll show some not-so-immediately-obvious conclusions derived from this analysis, and go on to show how you’d build an XGBoost predictive model from it for a specific business use case.

XGBoost Feature Importance

So yes, python without a safety net.  I’ll leave out all the fun and games I had getting XGboost compiled on my machine in the first place 🙂