The excellent KDNuggets site has recently published its latest survey of data science tool usage, and as always it’s interesting reading.  This is the 17th year (!) so it’s a great way to look at trends.  I won’t repeat all their analysis or results here, but I will just pull out a few points.

Python and R

Python is still “the daddy” of data science tooling, i.e. it’s still growing fast and so has the trajectory.   I know there is a bit a religious war between R and Python worlds, i.e. we all have our favourites.  It’s actually a very friendly rivalry it seems to me, and to be fair R has a massive and solid user base. Having used both for customer projects I must say I prefer Python, but maybe that’s as I’m originally from a C and Java background.  I find the availability of the latest machine learning libraries and developments seems better, but that may just be my confirmation bias.  I’m lazy also and like learning a few languages as possible, so its fit with the AWS Lambda functions that we use makes it a useful Swiss Army knife.

Tableau and QlikView

Tableaux

Tableaux église Saint-Léonor de Tréflaouénan (by Kergourlay)

I didn’t have a very clear preconceived view of their respective market share, but from a user vote point of view I was surprised to see Tableau receive 4-5 times as many votes from users.  The poll is open to vendor influence by asking their users to vote so this result could be affected in this way, so this needs to be taken with a health warning.  But if I were a QlikView exec I’d be worried by this, even if both vendors core markets are MI/BI rather than data science communities. This finding is backed up for me by one of our customers who is moving away from QlikView to a competitor product at the moment.

TensorFlow and MXNet

Typical CNN architecture

Typical CNN architecture (by Aphex34)

This is one of the big findings in the survey.  Google’s TensorFlow has come from nowhere over the last 2 years to being widely used, whereas in comparison MXNet (which Amazon are backing) has 10x less usage.  Google have played a blinder with TensorFlow, both technically and in terms of building developer mindshare.  It feels like we are right at the peak of the hype curve for deep learning.  Despite the amazing achievements with neural network-based architectures over the last few years, noises are starting to emerge about their limitations and that maybe the future of AI requires radically different approaches.  I’m sure TensorFlow and other deep learning frameworks will still feature strongly in the survey next year, especially as they enter the enterprise space.  But in the meantime, research leaders may well be increasingly moving away from this area.