Statistical testing of the Lindy effect with real data

Aleksandr Levchuk
4 min readSep 24, 2019

--

TL;DR Network protocols have stronger adherence to Lindy effect than Linux distributions

The Lindy effect is a theory that the future life expectancy of some non-perishable things like a technology or an idea is proportional to their current age, so that every additional period of survival implies a longer remaining life expectancy. — Wikipedia

I first found out about the Lindy effect via Stephan Livera podcast. It was mentioned in passing with a brief definition similar to the Wikipedia one above. Yet, it captured my attention.

The idea is simple and testable. I ran an experiment against a real word data of the history of Linux distributions. Also known as a “distro” — collection of open source programs frozen in time at specific version so that the programs can be tested, distributed, and upgraded. There is a community for each distro.

Data is shows birth, death, and relationships of Linux distributions
Data is shows birth, death, and relationships of Linux distributions. https://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg

The experiment itself is posted here https://github.com/alevchuk/lindy-effect and shows that simply picking the oldest survivor and betting that it will survive at least double its current life span would have resulted in being correct 59.8% of the time.

Specifically, the algorithm was correct by picking:

Slackware is alive and well today in 2019, yet I could only run the experiment up to 2005 to guarantee that each subject will survive 2x its current life span.

Charts

Each circle represents the age and future life in days for a Linux distribution at point of time starting from 1992 to 2005. Time points are at monthly granularity. Data was available up to April 2019 which is counted as time of death for all remaining survivors.
Picking the oldest survivor at each time point. We pick a new survivor only after the previous pick has died.
A snippet of intermediate data showing the “thought process” of Lindy effect algorithm, leading up the final result of pick being correct 59.8% of the time

About the results

59.8% does not seem like a lot. Yet, it’s better than random. In Hans Rosling’s TED talk he articulates how even getting 50% is sometimes hard for educated humans. While the 9.8% lead over randomness was achieved by a simple rule that knew nothing about Linux. To be precise, it knew strictly nothing other than the current age of the distro.

Better examples of Lindy effect

Linux distribution data attracted me because of the awesome visualization.

The SVG file lets you Zoom-In and Pan around this vast map of life and death. Opening https://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg and pressing Ctrl and + or ⌘ and + would bring you into a deep dive.

However, there is not much network effect between Linux distributions. If my server runs Ubuntu distro and your server runs Redhat distro, the differences are strictly local and not visible to the outside world. Moreover, while running different distros, our servers can still talk to each other over the Internet.

It would be much more interesting to evaluate the Lindy effect on network protocols. Today we know that TCP/IP protocol is the winner in world communications, yet there were others before it. For example, Network Control Program (NCP) was ARPANET’s protocol before transitioning to Transmission Control Protocol (TCP) in 1983.

… let’s do a back of the envelope calculation now!

ARPANET first successful message was sent in 1969, fifty years ago. TCP is known to dominate from 1983 to 2019. Let’s say we run the experiment for the first 25 years, up to 1994. Let’s define Lindy effect as being correct when picking a protocol that will live for 2x its current age. Assuming NCP use for 14 years before switching to TCP, Lindy effect is correct for the first 7 years for ARPANET, then wrong for 7 years, then correct again for 11 year from 1983 to 1994. That makes it correct 18 out of 25 years. That’s correct 72% of the time.

Network protocols have strong effects on each other because a new protocol is not compatible with the rest of the world. Once TCP/IP was used and did not have catastrophic failures, all the benefits of new protocols were not nearly enough to outweigh the benefit of staying connected to everyone else.

--

--