August 03, 2005


Morris Rosenthal

I took a further look at the time dependency after corresponding with Chris all weekend, and I realized that the current analysis will yield exactly the same Long Tail contribution whether we look at one day, two days, or a week. The controlling factor is where the slope of the rank/sales line changes on the log-log graph, which I did with straight line approximations. The true curve would of course be a curve, rather than a straight line, which would lead to a gradual drop in Long Tail contribution as the time period increased. By the time the sales rank # reaches 4,000,000, the curve needs to become asymptotic, a vertical line.

Also, 36% is just the first estimate using the 100,000 break point. An identical analysis using the 130,000 break point from his original article gave a Long Tail contribution of 25%. Staying with the 100,000 break point, but adjusting my rank/sales chart for the fact the data was gathered in October/November (a slow season) brought the estimate down appreciably, and using 125,000 instead of 200,000 for the estimate of unique titles Amazon sells in a given day dropped it all the way to 12%.



It seems to me that more data would generate more accuracy, and something like a customer survey would help a lot.

Couldn't we set up a system where participants of the survey register their current Amazon book purchases? That would potentially generate a lot of data that can be cross-checked with the books' rankings and improve reliability of the data.

Morris Rosenthal

It's an interesting idea, but the critical number for long tail analysis are the total number of unique titles sold by Amazon on a given day. That's not something that can be derived from a large data set, it's just a question of how you interpret Amazon rankings. It would also be great to know the exact number of books Amazon ships per year for the financial checksum, which is where the M.I.T. study Chris relies on doesn't make the cut.


John "Z-Bo" Zabroski

I think Rosenthal correctly mentions a real issue with his study when he says it is difficult to account for used book sales. IMHO, you are not studying the long tail if you do not account for this information. I don't think just because there is a glut of products and services that Long Tail companies offer that there is a glut of repetitive content (i.e, being sold new, but rather being re-sold used).


Agree with John Z - Amazon is tremendously useful for out of print used books and that market should be part of the long tail for their business.
(I really enjoyed this discussion, but found it awfully hard to read down the narrow column BTW.)

Morris Rosenthal

Surprisingly enough, the contribution from used book sales, while a large percentage of book item sales, is a very small percentage of sales dollars, because Amazon reports only the income they receive from the sellers. Even if Amazon was selling one used book for every two new books, the contribution to book revenues would be well less than 10% of the total reported. You could actually back out a reasonable estimate for the actual number of used books sold if you first established an average used book price through a fairly complex data collection job (you'd have to watch a lot of books to get prices for those that actually sold, rather than were simply offered). The sales rank curve doesn't require knowing how many used book are selling for calibration, as long as you use ebooks that or watch your test books like a hawk and adjust for used sales. Since they are included in the sales rank graph, used book sales are included in my Long Tail estimates by default. Where I don't account for them is in the checksum, but I believe all that really impacts is the average price part, which is itself a rough estimate.

John "Z-Bo" Zabroski

Yeah, so what? The point isn't what percentage of sales dollars resellers account for. The point is any break in Continuity and the continuous characteristic makes it harder to effectively study the Long Tail. In other words, the resellers are a far more mischeivous (sp?) data set than you would like to think; convenience in product continuity and continuously linked choices is Amazon.com's business model for selling books. To try and create a graph about Amazon.com's book selling without accounting for product continuity and continuously linked choices means your study is leaking a lot of information. You assume that sales rank is the most effective way to reverse engineer amazon's sales figures to get some good Long Tail data... I am not so sure about this. Amazon is a recommendation engine so its trivial to figure out what frequency of a statistical distribution a certain product falls into. What is important is how the filters of product X recommends the filters of product Y which leads to sales of either X or Y or both or sales of some "Omega product" that is nontrivial to X and Y because in some way the filters of X and Y contribute to the sales of product Omega without changing the sales of X or Y... You can then understand why Jeffrey Bezos would be secretive of sales information... so long as Amazon.com's knowledge management hides its sales information and marketing/promotion strategies they will have at least some form of competitive advantage.

David Regal

After a significant reduction in impact of the long tail, I don't see what the issue about long tail is. Long tail is nothing more than a huge mistake on your part, Chris. Long tail is a non-issue and doesn't really have much impact at all since you cannot build a growing business on it!


Morris Rosenthal

Unlike my hand graphed curve, all of these academic economics studies used the Pareto function, and I don't see how it could have been applied to modeling the old Amazon ranking system. The Pareto function describes a nice continuous distributions, which the old ranking system wasn't. I believe they became enamored with using it because with a few data points or observations, they could produce a curve that agreed with their economics mindset.

The old ranking system they all wrote their papers about was not a continious function. At the head of the curve, the ranking was done almost entirely based on "what have you done for me lately." This was necessary so that the latest Harry Potter wouldn't remain the #1 book until the next one came out. Out on the Long Tail, the only thing that mattered was the total sales for all time, be it 1 books or 100. In the middle of the curve (say 1,000 to 100,000) neither of these ranking functions dominated, producing a mix of books that had sold a large number of copies in the past (anywhere from a few hundred to a few thousand or more), along with books that were selling at a good enough rate that the system PREDICTED they would eventually belong there. Prediction was a key for the mid region, which is why I believe on of the original purposes of the system was to help Amazon
with ordering. There was a large discontinuity at 100,000 on the way down that was either caused by a third function being used for mid-region or by a change in time period for integration, which is how I tried to compensate for it.

I could go on at length with observations of how different books acted over time that support my conclusions, and I should point out that the only "flaw" I ever found in the old system was that it frequently locked up, probably "divide by zero" errors in a subroutine:-) It's simply a system that, by design or accident, acted like two or three individual functions summed to make a prediction for a given time period. Similar systems are common in antenna transmission, where the contributions to electomagetic field strength from additive functions that dominate in regions defined by the distance from the antenna; near field, mid field and far field. The
effect is so strong that you can closely approximate the field strength in any region by using only the dominant function, though boundaries are a bugger.

The whole reason I got involved in estimating Amazon sales from ranks back five or six years ago was to give anxious authors and small publishers who checked their ranks on a regular basis some idea of actual sales. Long Tail assumptions and other economic analysis had nothing to do with it. Even today, the bulk of my correspondence on the subject is from authors who believe that their publisher is cheating them because they saw their sales rank go from 100,000 to 10,000 in a day and they think they must have sold a lot of books.

As another comment observed nothing of import happens on the Long Tail. That's certainly true for authors and regular publishers, both large and small, but Amazon and some of the larger markeplace sellers and subsidy publishers make significant money out on the Long Tail. If you run a huge subsidy press with tens of thousands of POD titles in print, you can make a few hundred thousand a month on the Long Tail, but it's not a business I'd want to be in:-)

John "Z-Bo" Zabroski

Hmm, I value what you are saying and I understand your point about discontinuity and it might be valid that previous studies of Amazon were biased. I am not sure that was addressing my point though you definitely well-defended and explained some practical uses for your study. The real thing I think most people are interested in is the concealed impact of the Long Tail: does having a more content mean recommendations become more powerful? At least to a degree I am sure they do -- Touching the Void is an example of serious impact... but how big is this phenomenon? That's the real phenomenon of Chris Anderson's story in the Long tail, imho... forget the fact you have more choices. It's the social networks that exist between each choice that create otherwise nonexistent networks between products/people. Another issue I might mention is that certain books I have purchased through Amazon.com I have had to purchase through Amazon.co.uk INSTEAD only because Amazon.com does not "Stock" them but their UK site does -- a truly weird situation. An example of this is books by Charles Handy, in particular "The Empty Raincoat."

David Gibbons

Interesting study.
David R - you miss the value of the tail - John "Z-Bo", you nail it! When the social value in selling slow-moving items outweighs the cost to do so, NOT selling he LT is a huge mistake - the challenge is keeping the cost down. I worked at Amazon from '01 to '04 and spent much of that time integrating vendors to drop-ship the tail direct to customers from their warehouses. Externalizing fulfillment is how a very long tail is most cost effectively managed - and as John says, its impact on future demand is cumulative.

Alex Choo

We've created a search engine to search products in Amazon's long tail.

Our blog has the details.


Lee Herrin

Have you seen the study "Thriving in a World of Consumer Control" by the Boston Consulting Group (available at http://www.bcg.com/publications/files/Thriving_in_a_World_of_Consumer_Control_June05_T&C.pdf)?

They don't seem aware of the Long Tail yet but the article does have a lot of research & numbers that might bolster your media section.


PS tell your publishers that this *IS* the way to prepare a book! I'm enjoying watching your progress.

Working Nomad

I too don't see what the issue about long tail is. How can a business be sustainable based on this model? Is Amazon an actual example because I don't think it is.


Very good site. You are doing a great job. Please keep it up!

It's very difficult competing with Amazon and we agree that they are very good at what they do. But we actually believe that we are better.

Excellent article! Texas Holdem Poker Tips offer the most comprehensive tips and strategies to win at Texas Holdem.


