« Mark Cuban's blind spot | Main | The 80/20 Rule Revisited »

August 08, 2005

Comments

Morris Rosenthal

Professors Brynjolfsson and Smith are satisfied with the Pareto distribution, but as I described in the comments on the previous Long Tail posting, I don't believe it could be used for the old sales rank system (up to October 2004), which all of their analysis and related studies were based on. In short, the Pareto equation is continious, the Amazon sales ranks function was not continious. They simply used points from the middle couple decades of the curve where the Pareto function happened to work OK.

In addition, their analysis put the U.S. Amazon sales rate for 2001 at of 99.4 million titles per year. Amazon's total US media sales for 2001, which includes Books, CD's, DVDs/Video, which was $1.688 billion, which works out to 1.17 billion for books and yields an average Amazon selling price of $11.77 per book. The same M.I.T. put the average book selling price of a book at Amazon between $29 and $41, which would mean the area under their curve was off by a factor of three, or 300%.

I have to admit this is my first experience trying to carry on a conversation via comments on a long blog post, and I suspect these particular trees kept getting lost in the forest.

Erik Brynjolfsson

I appreciate Morris Rosenthal’s interest our paper and welcome any suggestions for improving the research. He makes a number of points in his August 9 post, but unfortunately they require some clarification.

1. The Pareto distribution is widely used for discrete data like book sales ranks -- Vilfredo Pareto's first use was for wealth of individuals. Individuals are discrete (except, perhaps, those assimilated by the Borg). The Pareto distribution is also commonly fit to the size ranks of cities, sand particles, meteorites, and numerous other discrete data. More broadly, the essence of econometrics is fitting continuous lines and curves to discrete data, and in this case, the Pareto fit happened to be unusually good. If Mr. Rosenthal wants to propose an alternative functional form that fits the old or new book sales rank data even better, we'd love to see it.
2. I’m confused that he takes us to task for using “points from the middle couple decades of the curve where the Pareto function happened to work OK” yet our data of 861 points run from less than 250 to around 1,000,000, which is basically the same as his dataset which from this description http://www.fonerbooks.com/surfing.htm seems to run from 1,000 to 1,000,000.
3. Mr. Rosenthal also notes that our estimate implied a 2001 sales rate of 99.4 million books for Amazon. In comparison, his estimate is 101 million. In my experience, 99.4 ~ 101 in most of the social sciences, but I’ll grant that we each may be off by 2% or more, which would translate into a somewhat smaller potential error in the size of the long tail.
4. A careful reader of our paper would notice that we do NOT use the Dealtime data, with prices of $29 to $41, as the basis for our calculations as Mr. Rosenthal implies (see the text above table 5 at http://ssrn.com/abstract=400940 and equation 9). Instead, we use the average selling price for this purpose, exactly as he advocates. The area under the curve was not off by “a factor of three”, although we did offer error bands of about 30% in the paper.

Bottom line: The forest of the long tail remains visible to anyone who steps back and looks at the big picture, even if particular trees are occasionally lost. Our results in the paper (since peer-reviewed and published in the journal Management Science) were indeed correct with a reasonable margin of error, though they are at times misquoted or misunderstood.

However, I can only agree that carrying on conversations via a blog posts can be frustrating, and welcome Mr. Rosenthal to email or call us if he has additional questions or suggestions on our research (and especially if he has data he’d like to share!)

Morris Rosenthal

Ah, communications. I assume I'll be able to find your e-mail somewhere and write direct, but I may as well respond to your numbered list online.

1) I don't propose an alternative to the Pareto function for the old ranking system. The result of the overlapping Amazon ranking systems was not amenable to a single exponential function. I don't understand why you all assumed it was.

2) Data from 250 to 1,000,000 on a log graph is between 3 and 4 decades. I would define 3 or 4 as "a few." Depending on the number of data points you had in the area from 250 - 1,000 or close to 1,000,000, you may well find you're closer to 3 then 4. That said, the graph you are looking at on my site corresponds to the new ranking system. If you read through to the bottom of the page, you'll get a description of how the old system worked, plus my old graph which covered 7 decades. I eventually split the head of the curve into multiple lines to drive home the fact it was a moving target.

3 + 4) The only average price information I find in your paper is Table 5, where you give the average Amazon price you observed for ranks under 100,000 and over 100,000 as $29.26 and $41.60. I don't follow your Dealtime comment. My assertion that the area under your curve was off by a factor 3 by your own data takes the average price to be on the low side of your spread, $33. Using revenue data from Amazon's financial reports for that time period and the number of titles you estimated they sold yields an average selling price of about $11.00. If $11 doesn't go into $33 three times, I'll admit defeat. If it does, you have a factor of three to explain:-)

Morris

erikbrynjolfsson

0. My email address is erikb (at) mit.edu. You can find it by clicking on my name next to my post and then going to my home page.

1. Glad to hear you don't seem to object to research examining whether the Pareto is a good fit. We didn't assume it necessarily would be a fit but we did examine this hypothesis. What we found, and reported, was that that this very simple equation has an R^2 of over 80% for these 861 data points. Perhaps you don't find that worth publishing. No problem.

2. Yes, I was comparing our results to your newest results. They seem to use a comparable span of data. And I heartily commend you on the span of data you chose to use!

3. A careful reading of our paper will reveal that we do not use the $29.26 and $41.60 figures (which, as we note next to table 5, are from Dealtime) to estimate the value of sales in the Long Tail. (For the record, we use these figures to support our conjecture that average prices in the "tail" are not lower than at the "head". This allows us to allocate a proportional amount of total revenues to the tail and compute the total consumer surplus using equation 9 and total revenues, and implicitly Amazon's -- NOT Dealtime's -- overall average selling prices). I'm not sure what you think we are using $33 for, but if you Read The Fine Manuscript you should get a clearer idea of our methods, which we tried our very best to describe carefully to the interested reader.

4. Yes, you are absolutely, positively correct that 33 is three times 11. Unfortunately for both of us, this has little or nothing to do with our analysis or results. To be as clear as possible: we could omit the Dealtime numbers entirely from the paper and the basic calculations for the Long Tail would be unchanged. Please see equation 9 and the rest of the detailed methodology if you are genuinely interested in learning what we did.

I know your goal is to help illuminate the blogosphere on this topic but I think it would be most useful if you carefully read the paper (and/or ask one of the authors to explain it) before posting your interpretations. For my part, I apologize for not writing the paper (and my postings) more clearly.

Please let me buy you a cup of coffee if you come to Cambridge and I'll go into as much detail as you like. Perhaps we can jointly analyze some of the fascinating new data you have.

Morris Rosenthal

In case anybody is actually following this thread, Erik and I are now in a direct correspondence and hope to arrive at some mutually agreed conclusion:-)

Morris

The comments to this entry are closed.

Tidbits

The Long Tail by Chris Anderson

Notes and sources for the book

FREE was available in all digital forms--ebook, web book, and audiobook--for free shortly after the hardcover was published on July 7th. The ebook and web book were free for a limited time and limited to certain geographic regions as determined by each national publisher; the unabridged MP3 audiobook (get zip file here) will remain free forever, available in all regions.

Order the hardcover now!