My God, it's full of powerlaws!
Yes, everywhere you look you will find powerlaw distributions. They're an inevitable consequence of variety and inequality, both of which are forces of nature--human and otherwise. I'm delighted that my work on the Long Tail has encouraged people to turn their gaze from the steep left ("hit") side of these curves and notice that the shallow right goes on and on and on. But I will resist turning this into the Powerlaw Blog, which is really more Clay Shirky's domain anyway.
That said, I can't resist posting this one from Alex Barnett, who asks "is this the Long Tail of conversations and relationships?"
The data comes from a paper by the HCI lab of the University of Maryland, which analyzed one individual’s fifteen-year email archive, consisting of about 45,000 messages and over 4,000 relationships.




"They are an inevitable consequence of variety and inequality"
No, they aren't. Important power law fact: preferential binding is required.
Let me explain: One of the original examples of power laws was the distribution of Number of Citations vs. Number of Papers in academic fields. A very few papers have many, many citations etc in a classic power law curve.
I propose a strawman model of citations: each researcher reads a few random papers and cites them. This *doesn't* lead to a power law.
However, if we assume "preferential binding" we get a power law distribution. By preferential binding I mean that commonly cited papers are more likely to be cited in the future.
(The term comes from power law graphs where new nodes `bind' to existing nodes, but that isn't important.)
So, variety is normally distributed, and a normal distribution doesn't give rise to a power law. However if "the rich get richer" or "the commonly cited are more likely to be cited" etc a power law emerges.
Hope that clears something up.
AGL
Posted by: Adam Langley | April 03, 2005 at 02:19 PM
Well, what I said is that they're a natural function of variety *and inequality*. In the first instance you cite, good papers get cited more than poor ones. In the second, your randomization appears to eliminate the quality factor--thus making them all, in a sense, equal. Am I missing something?
Posted by: Chris Anderson | April 03, 2005 at 02:24 PM
>Am I missing something?
Possibly.
Let's suppose that God receives a copy of Physical Review Letters A and He judges each paper, thus giving it a quality, Q. It's safe to assume that Q is normally distributed (as are people's highs, ability etc). Not all papers are equal (so, is this *inequality*?).
However, if the papers were citied according to their quality, we do not end up with a power law graph.
In other terms; if people's earnings were directly related with their ability we would not see a power law distribution of earnings. Without a feedback factor ("rich get richer") the power law doesn't emerge.
(Actually, the example of earnings is probably bad because it's only a power law for the top 3% http://www.newscientist.com/article.ns?id=mg18524904.300 .
It appears that the feedback factor doesn't kick in for the bottom 97%. In fact, if you look at the graph in that article the non-power part looks just the same as mine below suggesting a normal
distribution.)
If you already understood all this and I just misread - sorry for wasting your time!
Posted by: Adam Langley | April 03, 2005 at 02:25 PM
Adam,
Well, I certainly understand *some* of it. Indeed, I discussed the New Scientist article on my blog last week:
http://longtail.typepad.com/the_long_tail/2005/03/microstructure_.html
I realize that "powerlaw", as I've used it, is a pretty sloppy description for the multitude of non-linear distributions that one sees out there (almost, but not quite, as sloppy as The 80/20 Rule). I also realize that, as in the case of the wealth distribution above, in many cases the distribution function changes somewhere down the curve.
I'm no expert on this (yet), but I think what causes powerlaws to show up so often is exactly what you say: the combination of inequality and postitive feedback/network effects that make the rich richer, the famous more famous and so on. So, to have been more precise, I should have said "They're an inevitable consequence of variety and inequality, amplified by ubiquitous network effects". But not everyone understands what network effects are, so I erred on the side of simplicity.
Posted by: chris anderson | April 03, 2005 at 02:35 PM
"Power law" isn't a vague term, it's:
log(y) = a*log(x) + c
If you need a term for "very few with bucket loads and very many with scraps" then maybe ... erm. I can't think of one, but it's probably needed.
Posted by: Adam Langley | April 03, 2005 at 02:36 PM
You should the classic text on probability by Feller. He mentions Zipfian distributions (the generalization of 80-20 rules) quite explicitly and cautions against the tendency to see them everywhere.
A pointer to the relationship between 'Zipfian' and log normal distributions, and how they can show up from natural local processes, can be found at:
http://geomblog.blogspot.com/2004/05/zipfs-law-and-log-normal-distributions.html
Posted by: Suresh | April 03, 2005 at 08:36 PM
Suresh,
As it happens, you can search inside both volumes (1,2) of Feller's classic on Amazon. Not only do neither mention Zipfian distributions, but I can't even find a place where he warns against the tendency to find Paretos either. Can you give me a page number or some other reference?
Posted by: Chris Anderson | April 03, 2005 at 09:49 PM
ooh. I will have to go look up my copy of feller. It is in vol 2 as far as I can recall. I will check it and post the answer here
In computer science (especially in the new fields of web and internet research), power laws are ubiquitous, to the point that one finds them slightly annoying: power laws are mathematically ugly to deal with, and their very ubiquity makes the statement "so and so obeys a power law' very uninformative. Usually one has to dig much deeper to understand the true phenomenon, which is some kind of fractal, self-similar behaviour.
Have you seen the work by Walter Willinger (AT&T) and others on power laws for internet traffic ? that is actually quite an interesting application of power laws
Posted by: Suresh | April 04, 2005 at 12:12 AM
I attended a talk by Steve Marron (University of North Carolina at Chapel Hill) yesterday at Cornell (there's a Workshop on Heavy Tails going on), and he had some data that suggested that HTTP traffic was NOT a simple Pareto, but a sum of three Pareto-like distributions with heavy tails.
The short version is that file sizes tend to "cluster" around certain typical sizes in a couple of categories. (Small images and 404s, larger webpages, and larger files (multimedia, office documents, etc.)) Within each group it's Pareto-like, and since the sum generally has a heavy tail, some of the ideas still apply.
Posted by: John Thacker | April 23, 2005 at 10:31 AM
I would recommend visiting the Blog by Harvard's Program on Networked Governance headed by David Lazer.
The Complexity and Social networks blogs covers various issues such as Powerlaws: http://www.iq.harvard.edu/blog/netgov/
You can find even more information on SNA by Ines mergel here: http://www.ksg.harvard.edu/netgov/html/sna.htm
Posted by: Peter | January 29, 2006 at 01:19 PM