data science

What can we learn from the Simpson's Paradox?

The Simpson’s Paradox is one of the most well-known paradoxes in statistics. A quick google will find plenty of blog posts (many from the data science community) about this puzzling phenomenon. It is clearly a topic of real-world significance. There seem to be some important lessons that we are supposed to learn from it. But what are those lessons? Is it nothing more than a cautionary tale about how easy it is for data analyses to go wrong?
Read more

A mind-boggling analogy between machine learning and quantum physics

A recent paper published in PNAS titled “The Fermi-Dirac distribution provides a calibrated probabilistic output for binary classifiers” caught my attention, because it describes a surprising relationship between machine learning and quantum physics. In fact, surprising is an understatement. Mind-boggling is more like it. According to the analogy developed by the authors, positive samples in binary classification problems are like… fermions?! What?! I decided that I should try to understand the gist of this paper, at least to the extent that I can.
Read more

Use basic data science skills to debunk a myth about koalas!

Did you know that the koala is the dumbest animal in the world? According to an Internet meme, koalas have really tiny brains because the eucalyptus leaves that they eat are toxic and poor in nutrition. That seems plausible to me, but you shouldn’t believe in Internet memes. Let’s turn to the most authoritative source of knowledge in the world, the Wikipedia, instead. This is what the Wikipedia has to say about koala’s brain:
Read more