June 12, 2005



Yes, I think this is something that most people just don't understand about the market for music: a title that sells 10,000 units in a given year is probably in the top 10% of sales for that year and certainly in the top 20%(i.e. 90% of all discs released in that year will have sold less than that).

Indeed, I saw figure somewhere probably based on Soundscan data stating that most CDs sell less than 1000 units. Indeed, the leap from selling 1000 units to 10,000 units is much much bigger than most people realize (including many musicians who do not realize just how big an accomplishment it is to sell 10,000 records).

That's because the power laws are in full effect in the music biz, where most of the real spoils go only to the very top part of the curve. Indeed, the L.A. Times reported a while back that "[s]tatistics tabulated by SoundScan... [indicate that]...[o]f the 6,188 albums released [in 2000], only 50 sold more than a million copies. Sixty-five sold 500,000 units and 356 sold 100,000 or more."

So agree with you. In aggregate, there's much more noise at the long tail end of the curve.



It would be interesting to plot this out with real data for an N > 5. For instance, Audioscrobbler's aggregate listening data for ~150,000 people is sitting for the taking at: http://www.audioscrobbler.com/data/

Looking at actual listening behavior also removes some of the potential self-report bias of just asking people what their top records of the year are. Of course, it also reveals taste preferences instead of just purchase preferences, which may in fact show different distribution patterns. If anything I would suspect that repeated listening patterns are even more top-heavy than people realize (and self-report).

Jakob Nielsen

One weakness in your data analysis: you are confounding current sales with cumulative sales or past sales.

The probability of being represented in a music collection will be some funciton of the lifetime popularity of the CD (in other words, its cumulative sales) or possibly its peak popularity at some point in the past.

A CD's rank on Amazon, on the other hand, is almost purely dictated by its current sales. (I believe that Amazon does give some weight to historical sales, but sales that happened more than a month ago have very little weight.)

In my own case, my CD collection has twenty years' worth of music. I wasn't even an early adopter, so other people will have been collecting for more years.

So, for example, I own a Paul McCartney CD that was a best-seller in 1985 when I got it as a birthday present. This CD currently ranks #27,163 on Amazon, which is remarkable for an old CD.

Anyway, your curve, if plotted for my collection, would be bumped up for the 25-30,000 bin, even though in reality that CD represents a purchase from the <5,000 bin.

chris anderson

Anon: Absolutely, I need to do this with larger N datasets (I've updated the post to make that clearer.) BigChampagne is one company that has that data and we're talking. For the record, the Ns for the invidivuals I looked at ranged from a low of 20 in one case to a high of more than 300. John's did a rough analysis of his own 3,000-CD collection, and I took a few liberties in charting it. Overall, as I mentioned, it's just suggestive at this size but does point the way to a proper analysis as a next step.

chris anderson

Jakob: Thanks for yet another insightful comment. I did realize that the problem with Amazon rankings is they're mostly based on current sales, which depresses the rankings of older titles (although back-catalog sales represents a bigger part of total sales than any other industry I've looked at).

Given that I won't be able to tell when people bought their music, or what the sales ranks was at that time, I need to look for useful proxies. One, using P2P data such as that from BigChampagne, would be to rank music by the number of personal collections it appears in. That way I could compare individual data with collective data, which seems like a reasonable approach to plotting personal collections on an overall popularity chart. Do you agree?

Chris Neumann

It feels to me like this data is skewed toward the top of the tail right now because no good filter exists for music. That is, how am I supposed to find stuff that's further down the curve? For instance, when I'm working, I like to listen to "chill" music such as Morcheeba and Air. Someone recently recommended Zero 7 to me, and I really like it, but the word of mouth filter is not so good when you have hundreds of thousands of albums to deal with. I know there are guys out there trying to burte force it by hiring grad students to evaluate music and categorize it, and there are companies trying to do some sort of waveform analysis, but these both seem like they're doomed. Netflix does a good job of creating these recommendations, but I haven't seen anything very good for music. So, two things:

1. What sorts of filters work best for recommending music?
2. Once the filters are working well, won't more of the music in peoples' collections start shifting down the tail? I think this has already happened a little with TV since there used to be just a few channels and now with more channels the viewing audience is spread out over more of them.


that first graph intrigued me since it reminds me of a Temperature vs Pressure phase diagram. i've been turning over the problem(?) of "cult" content (movies like "Rocky Horror", products like Pez, aso) in my head for a while and haven't seen that resolved graphically (to be fair, i'm not following all this as closely as i could; apologies). but it's like cult items just never quite reach a kind of critical mass; the kind that might carry them out of their niche. so i started equating that with the energy required for state changes.

you can see an example phase diagram here.

plus i like how this graph might offer - via sublimation - a kind of explanation for inexplicable success stories. "The Blair Witch" movie comes to mind. it very much reminds me of something starting out as a solid and going straight to a gas.

anyway, thought i'd throw that out there. and if someone could point me to where "cult" stuff is folded into all this - i'd appreciate it.


Estimates based on SoundScan source data (web addresses listed at the end of this note) for albums released in 2004:

~40,000 new album titles were released in 2004.
~269 million units of new album titles were sold in 2004.

~55% of new album titles sold fewer than 100 units.
~25% of new album titles sold at least 100 units but fewer than 1000 units.
46 new album titles sold at least one million units. yes, just slightly more than 0.1% of new album titles reached platinum status.

source data:

Koranteng Ofosu-Amaah

Hi there Chris... I give you:

On The Long Tail of Music, Metrics and Recommendations

I have some better data for you (read my entire music collection), some commentary (it's a blog after all, I need to add some bits of value) and pointers to some artists who inhabit this fringe


Spot on, the last point..."It's a big world out there, and the top 40 is just the beginning of it, not the end."

Top 40 works as the end as well. Its the last stop. Majors. Big push, maketing etc. Distilled
by refined ears and bigger budgets. Most of top40 surfs off the long tail. Time defines the tail in a way. The Killers where part of the noise, off the radar 3/4 years ago, known to less than more. Now they are Top40 Top10. Part of the short curve. Or are they just kicking out to another or the next wave/ larger universe.

That said, the top40 enables both financially and formatically. It thankfully gets lost in what is now an ocean of an industry and music.("Top 100 is irrelevant in an abundant market") The relevancy exists as the enabler. It's just that the record industry has become the music industry and the econmomies or the new digital world have punctuated that clearly.

Top40's expensive sounds, production values, and mass-media machine marketing serves as a center benchmark, one to destroy musically, but a center thats needed to go long on. It (top40) oddly enuff is the noise for the non music consumer but the closest they come to any tail at all. The start of their "consumer behavior"
and a bigger piece of the entertainment segment.

It was not that long ago that the music sector
did not compete with games, dvds, the net etc for

Tim Oren

You might get some traction by backing off the Shannonesque view of information implicit in S/N ratios, and instead look at Bateson's definition of information: "A difference that makes a difference". Implying the question: A difference to whom? That puts POV at the core of the analysis, and (I believe) suggests some new directions for evolution of 'search' and other elements of citizens' media platforms.

Like Jakob's point (hi!) as well. The surrounding market changes over time. For that matter, we change over time. Today's POV is not yesterday's.

Hamish MacEwan

"there's more noise in the tail because there's more everything there."

I don't think that holds at all. S/N ratio is based on the proportion, not the absolute size of either the signal or noise.

Shawn Fumo

One thing to be careful of when talking about using filters to "move people down the tail". It makes the tail itself sound static, which it isn't.

If there was so little music that we all could listen to everything first before buying, the tail would really represent the taste of everyone, but obviously we need the filtering because it isn't like that.

But when the filtering helps you move down the tail, you'll also buy what you like, which move it UP the tail. So, the very act of filtering changes the tail itself. Since Amazon is itself an example of the long tail, that is why the results seem a bit counter-intuitive.

My guess is that all of this is just getting started, and as time goes on the curve will become a bit more linear, with a dropoff that isn't quite so severe at the top. Or at least that the gap between the top 40 and the niches won't be quite so big, as the popular niches get pushed up the tail by the newer forms of exposure and filters.

