« The Long Tail of design | Main | Robert X. Cringely, Longtailer »

April 03, 2005

My God, it's full of powerlaws!

Yes, everywhere you look you will find powerlaw distributions. They're an inevitable consequence of variety and inequality, both of which are forces of nature--human and otherwise. I'm delighted that my work on the Long Tail has encouraged people to turn their gaze from the steep left ("hit") side of these curves and notice that the shallow right goes on and on and on. But I will resist turning this into the Powerlaw Blog, which is really more Clay Shirky's domain anyway.

That said, I can't resist posting this one from Alex Barnett, who asks "is this the Long Tail of conversations and relationships?"

Long_tail_email_relationships_2

The data comes from a paper by the HCI lab of the University of Maryland, which analyzed one individual’s fifteen-year email archive, consisting of about 45,000 messages and over 4,000 relationships.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341bfb6353ef00d8343f972553ef

Listed below are links to weblogs that reference My God, it's full of powerlaws!:

Comments

"They are an inevitable consequence of variety and inequality"

No, they aren't. Important power law fact: preferential binding is required.

Let me explain: One of the original examples of power laws was the distribution of Number of Citations vs. Number of Papers in academic fields. A very few papers have many, many citations etc in a classic power law curve.

I propose a strawman model of citations: each researcher reads a few random papers and cites them. This *doesn't* lead to a power law.

However, if we assume "preferential binding" we get a power law distribution. By preferential binding I mean that commonly cited papers are more likely to be cited in the future.

(The term comes from power law graphs where new nodes `bind' to existing nodes, but that isn't important.)

So, variety is normally distributed, and a normal distribution doesn't give rise to a power law. However if "the rich get richer" or "the commonly cited are more likely to be cited" etc a power law emerges.

Hope that clears something up.


AGL

Well, what I said is that they're a natural function of variety *and inequality*. In the first instance you cite, good papers get cited more than poor ones. In the second, your randomization appears to eliminate the quality factor--thus making them all, in a sense, equal. Am I missing something?

>Am I missing something?

Possibly.

Let's suppose that God receives a copy of Physical Review Letters A and He judges each paper, thus giving it a quality, Q. It's safe to assume that Q is normally distributed (as are people's highs, ability etc). Not all papers are equal (so, is this *inequality*?).

However, if the papers were citied according to their quality, we do not end up with a power law graph.

In other terms; if people's earnings were directly related with their ability we would not see a power law distribution of earnings. Without a feedback factor ("rich get richer") the power law doesn't emerge.

(Actually, the example of earnings is probably bad because it's only a power law for the top 3% http://www.newscientist.com/article.ns?id=mg18524904.300 .

It appears that the feedback factor doesn't kick in for the bottom 97%. In fact, if you look at the graph in that article the non-power part looks just the same as mine below suggesting a normal
distribution.)

If you already understood all this and I just misread - sorry for wasting your time!

Adam,

Well, I certainly understand *some* of it. Indeed, I discussed the New Scientist article on my blog last week:
http://longtail.typepad.com/the_long_tail/2005/03/microstructure_.html

I realize that "powerlaw", as I've used it, is a pretty sloppy description for the multitude of non-linear distributions that one sees out there (almost, but not quite, as sloppy as The 80/20 Rule). I also realize that, as in the case of the wealth distribution above, in many cases the distribution function changes somewhere down the curve.

I'm no expert on this (yet), but I think what causes powerlaws to show up so often is exactly what you say: the combination of inequality and postitive feedback/network effects that make the rich richer, the famous more famous and so on. So, to have been more precise, I should have said "They're an inevitable consequence of variety and inequality, amplified by ubiquitous network effects". But not everyone understands what network effects are, so I erred on the side of simplicity.

"Power law" isn't a vague term, it's:
log(y) = a*log(x) + c

If you need a term for "very few with bucket loads and very many with scraps" then maybe ... erm. I can't think of one, but it's probably needed.

You should the classic text on probability by Feller. He mentions Zipfian distributions (the generalization of 80-20 rules) quite explicitly and cautions against the tendency to see them everywhere.

A pointer to the relationship between 'Zipfian' and log normal distributions, and how they can show up from natural local processes, can be found at:

http://geomblog.blogspot.com/2004/05/zipfs-law-and-log-normal-distributions.html

Suresh,

As it happens, you can search inside both volumes (1,2) of Feller's classic on Amazon. Not only do neither mention Zipfian distributions, but I can't even find a place where he warns against the tendency to find Paretos either. Can you give me a page number or some other reference?

ooh. I will have to go look up my copy of feller. It is in vol 2 as far as I can recall. I will check it and post the answer here

In computer science (especially in the new fields of web and internet research), power laws are ubiquitous, to the point that one finds them slightly annoying: power laws are mathematically ugly to deal with, and their very ubiquity makes the statement "so and so obeys a power law' very uninformative. Usually one has to dig much deeper to understand the true phenomenon, which is some kind of fractal, self-similar behaviour.

Have you seen the work by Walter Willinger (AT&T) and others on power laws for internet traffic ? that is actually quite an interesting application of power laws

I attended a talk by Steve Marron (University of North Carolina at Chapel Hill) yesterday at Cornell (there's a Workshop on Heavy Tails going on), and he had some data that suggested that HTTP traffic was NOT a simple Pareto, but a sum of three Pareto-like distributions with heavy tails.

The short version is that file sizes tend to "cluster" around certain typical sizes in a couple of categories. (Small images and 404s, larger webpages, and larger files (multimedia, office documents, etc.)) Within each group it's Pareto-like, and since the sum generally has a heavy tail, some of the ideas still apply.

I would recommend visiting the Blog by Harvard's Program on Networked Governance headed by David Lazer.

The Complexity and Social networks blogs covers various issues such as Powerlaws: http://www.iq.harvard.edu/blog/netgov/

You can find even more information on SNA by Ines mergel here: http://www.ksg.harvard.edu/netgov/html/sna.htm

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Tidbits

Search this site

The Long Tail by Chris Anderson

Notes and sources for the book

FREE will be available in all digital forms--ebook, web book, and audiobook--for free shortly after the hardcover is published on July 7th (exact dates will be announced here as each form is released). The ebook and web book will be free for a limited time, the unabridged audiobook will be available free forever.[Update: the first free versions have now been released.]

Order the hardcover now!