A mind-boggling analogy between machine learning and quantum physics

A recent paper published in PNAS titled “The Fermi-Dirac distribution provides a calibrated probabilistic output for binary classifiers” caught my attention, because it describes a surprising relationship between machine learning and quantum physics. In fact, surprising is an understatement. Mind-boggling is more like it. According to the analogy developed by the authors, positive samples in binary classification problems are like… fermions?! What?! I decided that I should try to understand the gist of this paper, at least to the extent that I can.
Read more

Heh, Emacs LISP function! Lemme give you a piece of advice!

Did you know that in some programming languages, you can give a function a piece of advice? The basic idea is this: if you are using an application or a library written by somebody else, what can you do if you need to modify the behavior of a particular function? You could modify its source code, if your version improves it for everybody. However, if you only want to customize it for your personal needs, a more lightweight solution might be desirable.
Read more

Beautiful ideas in programming: generators and continuations

In this post, I’ll summarize what I’ve learned from an attempt to gain a deeper understanding of two important concepts in programming: Python’s generators and Scheme’s continuation. The aim is not to teach Python or Scheme programming. Rather, what I want to do is to demonstrate that generators are special cases of a much more powerful construct - continuations. Continuations allow programmers to invent new control structures, and it is the foundation upon which iterators, generators, coroutines, and many other useful constructs can be built.
Read more

Simple exercises with grep, sed and awk in org-mode

For text processing, I had never bothered to learn classic Unix tools such as sed and awk, because I can always use Python's regular expression library. The syntax of sed and awk just appeared to be too arcane to me. However, recently I realize that for many simple ad-hoc tasks, even writing a Python script is too much overhead. This motivated me to learn to use regular expressions directly in the command line.
Read more

Use basic data science skills to debunk a myth about koalas!

Did you know that the koala is the dumbest animal in the world? According to an Internet meme, koalas have really tiny brains because the eucalyptus leaves that they eat are toxic and poor in nutrition. That seems plausible to me, but you shouldn’t believe in Internet memes. Let’s turn to the most authoritative source of knowledge in the world, the Wikipedia, instead. This is what the Wikipedia has to say about koala’s brain:
Read more