What can we learn from the Simpson's Paradox?

The Simpson’s Paradox is one of the most well-known paradoxes in statistics. A quick google will find plenty of blog posts (many from the data science community) about this puzzling phenomenon. It is clearly a topic of real-world significance. There seem to be some important lessons that we are supposed to learn from it. But what are those lessons? Is it nothing more than a cautionary tale about how easy it is for data analyses to go wrong?
A colleague recommended a book called “Computer Age Statistical Inference” by Efron & Hastie. I love the organization. Part I - classical stuff. Part II - Early computer-age methods. Part III - 21st century topics. That’s exactly the type of textbooks that we need.