Q: Why are people so uncomfortable with Wikipedia? And Google? And, well, that whole blog thing?
A: Because these systems operate on the alien logic of probabilistic statistics, which sacrifices perfection at the microscale for optimization at the macroscale.
Q: Huh?
A: Exactly. Our brains aren't wired to think in terms of statistics and probability. We want to know whether an encyclopedia entry is right or wrong. We want to know that there's a wise hand (ideally human) guiding Google's results. We want to trust what we read.
When professionals--editors, academics, journalists--are running the show, we at least know that it's someone's job to look out for such things as accuracy. But now we're depending more and more on systems where nobody's in charge; the intelligence is simply emergent. These probabilistic systems aren't perfect, but they are statistically optimized to excel over time and large numbers. They're designed to scale, and to improve with size. And a little slop at the microscale is the price of such efficiency at the macroscale.
But how can that be right when it feels so wrong?
There's the rub. This tradeoff is just hard for people to wrap their heads around. There's a reason why we're still debating Darwin. And why Jim Suroweicki's book on Adam Smith's invisible hand is still surprising (and still needed to be written) more than 200 years after the great Scotsman's death. Both market economics and evolution are probabilistic systems, which are simply counterintuitive to our mammalian brains. The fact that a few smart humans figured this out and used that insight to build the foundations of our modern economy, from the stock market to Google, is just evidence that our mental software has evolved faster than our hardware.
Probability-based systems are, to use Kevin Kelly's term, "out of control". His seminal book by that name looks at example after example, from democracy to bird-flocking, where order arises from what appears to be chaos, seemingly reversing entropy's arrow. The book is more than a dozen years old and decades from now we'll still find the insight surprising. But it's right.
Is Wikipedia "authoritative"? Well, no. But what really is? Britannica is reviewed by a smaller group of reviewers with higher academic degrees on average. There are, to be sure, fewer (if any) total clunkers or fabrications than in Wikipedia. But it's not infallible either; indeed, it's a lot more flawed that we usually give it credit for.
Britannica's biggest errors are of omission, not commission. It's shallow in some categories and out of date in many others. And then there are the millions of entries that it simply doesn't--and can't, given its editorial process--have. But Wikipedia can scale to include those and many more. Today Wikipedia offers 860,000 articles in English - compared with Britannica's 80,000 and Encarta's 4,500. Tomorrow the gap will be far larger.
The good thing about probabilistic systems is that they benefit from the wisdom of the crowd and as a result can scale nicely both in breadth and depth. But because they do this by sacrificing absolute certainty on the microscale, you need to take any single result with a grain of salt. As Zephoria puts it in this smart post, Wikipedia "should be the first source of information, not the last. It should be a site for information exploration, not the definitive source of facts."
The same is true for blogs, no single one of which is authoritative. As I put it in this post, "blogs are a Long Tail, and it is always a mistake to generalize about the quality or nature of content in the Long Tail--it is, by definition, variable and diverse." But collectively they are proving more than an equal to mainstream media. You just need to read more than one of them before making up your own mind.
Likewise for Google, which seems both omniscient and inscrutable. It makes connections that you or I might not, because they emerge naturally from math on a scale we can't comprehend. Google is arguably the first company to be born with the alien intelligence of the Web's large-N statistics hard-wired into its DNA. That's why it's so successful, and so seemingly unstoppable.
Paul Graham puts it beautifully:
"The Web naturally has a certain grain, and Google is aligned with it. That's why their success seems so effortless. They're sailing with the wind, instead of sitting becalmed praying for a business model, like the print media, or trying to tack upwind by suing their customers, like Microsoft and the record labels. Google doesn't try to force things to happen their way. They try to figure out what's going to happen, and arrange to be standing there when it does."
The Web is the ultimate marketplace of ideas, governed by the laws of big numbers. That grain Graham sees is the weave of statistical mechanics, the only logic that such really large systems understand. Perhaps someday we will, too.
[Update: Nicholas Carr, who seems to have inherited the Clifford Stoll chair of reliable techno-skepticism, has a clever and well-written response here.]
Wikipedia is not a probabilistic system.
I do not really "understand" Google because the math is beyond me, but I trust it. I understand Wikipedia just fine, which is why I don't trust it.
Information systems are only useful to the user at the point in time at which the system is accessed. At the time of a Google search you are presented with a mathmatically determined 'average' value; the sum wisdom of the internet's hyperlinks. It is an average value, and even if 30% of the links on the web are "wrong" you still get the right answer.
Wikipedia does not work like that. When you access Wikipedia you do not get the average value of an article; you get the last author's value only. Instead of getting a probabilistic average you instead are getting a single data-point.
Google is "wrong" only when the entire web is wrong. This happens on occasion, such as when an urban legend becomes more popular than the truth (when it's done purposefully it's called a Google Bomb). Wikipedia is wrong when a single person is wrong. It is also incredibly easier to "bomb" Wikipedia. Anyone with a login can do it with 1 minute's work. With 860,000 articles an error in an obscure article can remain undetected for some time.
(I found an article where someone had inserted "Jake is the best!" or something like that in the middle of a sentence. As an experiment I left it there to see how long it took for someone to find it. It's still there 4 months later, and that's with an obvious error. An error in the data that only an authoritative source would know was wrong is likely to last even longer.)
To use an analogy most survivors of the Dot.Bomb would understand, a Google search is like predicting stock performance by taking the average stock price of every Wall St. analyst (occasionally wrong and sometimes very wrong, but usually close); while a Wikipedia search is like doing the same by trolling chat rooms for tips.
Posted by: Brock | December 18, 2005 at 05:39 PM
Brock,
In the popular entries with many eyes watching, Wikipedia becomes closer to the statistical average of the views of the participants, weighted by such factors the authority of each as defined by the others (frequent contributors to any entry tend to win any vote-offs). Studies have shown that for such entries, the mean time to repair vandalism of the sort you describe is measured in minutes. As Wikipeida grows that rapid self-repairing property will spread to more entries.
But the main point I was making about Wikipedia was not that any single entry is probabilistic, but that the *entire encylopedia* is probabilistic. Your odds of getting a substantive, up-to-date and accurate entry for any given subject are excellent on Wikipedia, even if every individual entry isn't excellent.
To put it another way, the quality range in Britannica goes from, say, 5 to 9, with an average of 7. Wikipedia goes from 0 to 10, with an average of, say, 5. But given that Wikipedia has ten times as many entries as Britannica, your chances of finding a reasonable entry on the topic you're looking for are actually higher on Wikipedia.
That doesn't mean that any given entry will be better, only that the overall value of Wikipedia is higher than Britannica when you consider it from this statistical perspective.
Posted by: chris anderson | December 18, 2005 at 06:03 PM
Either way it takes the academics in ivory towers out of the equation, which is both a very good and a very bad thing.
Posted by: kitchen hand | December 18, 2005 at 08:04 PM
Chris,
I agree that Wikipedia as a whole has more total value than Britannica as a whole. It probably does produce more social utility than Britannica, just as the Web + Google produces more utility than a good library + a card catalog.
But no one needs the whole of Wikipedia. They need the article they need, and they need it to be (mostly) right.
My point was that individual Google searches are probabilisitic, but that individual Wikipedia articles (the ones in the Long Tail at any rate) are not. Since individual searches and articles are what matter to individual people, I think that's the more important thing to focus on.
I think Wikipedia would be more probabilistic to the user if disputed issues, history of changes, and "voting" was displayed in the actual article without having to comb through the changes. Put the statistics of opinion right out in front where the intelligent reader can judge them for himself.
I just want to make clear that I think Wikipedia is great in a lot of ways, but it is engineered poorly. Wikipedia is a lot like Communism - a nice idea, but inappropriate for humans. Too many of us has motivations far from the pursuit of objective truth. It would be far better if each author could write his own, complete version (perhaps borrowing sections using a Creative Commons license). If you don't like it, write your own, but don't mess with his. Then all readers have to do is find both articles, read them, and judge for himself.
Of course Step 1, "finding", brings us back to Google ... :-)
Posted by: Brock | December 18, 2005 at 09:56 PM
Brock;
You don't care about the entire Google database either, just one or two entries. PageRank isn't an average either, it's basically whoever gets has the most links today (with weighting).
I think you make an erroneous argument, that the latest Wikipedia article is the result of only the last person's edit. This would be true if every edit involved a complete rewrite of the article. This is astronomically rare. Almost all changes are incremental, and as a matter of practical interest they're often reviewed by the most recent contributors. As such, the wiki article you view is more of an average, or better an aggregation, of all previous edits. The most recent edit might be less trusted than the previous ten, but it usually represents a small portion of article.
Add to that, if you have even the slightest doubt about something, you can persue the article history to find when such a crazy thing was added.
Then there's the human habit of yielding to people who seem to know what they're talking about. This means that uninformed people tend to avoid putting in the work to contest something they don't understand, and informed and motivated people tend to do most of the work. Wikipedia's NPOV policy, maintained by crowd without the natural stimuluses towards mob mentality, means that demagogues naturally lose. This is rather unlike the practice of mid-sized groups that produce traditional encyclopedias.
And finally, I have to say that anyone who regards any *single* source as authorative gets what they deserve. Wikipedia is my first stop, and it's sometimes my last stop (for revisions) when I find out most authorative sources say something a little different.
Just try to write a report on something like witchcraft based on the Encyclopedia Britanica I grew up with. It won't even get you started. Wikipedia will though, because contributors try to be comprehensive to all input, not authorative about what something should be. That's precisely Wikipedia's strength: It's not meant to be authorative, but it will take authorative input (even when two authoraties viciously disagree). It doesn't take academic authorities out of the equation--they're reduced from all powerful to merit-weighted influence.
Posted by: JCJ | December 18, 2005 at 11:20 PM
Brock makes some good points about Wikipedia. Surowiecki explains in WoC that a good "aggregation function" is critical to extracting the wisdom from the crowd, such as a voting mechanism or calculating the average. Wikipedia doesn't really have one. Chris suggests that "frequent contributors" win vote-offs, but that is rare, and it puts the quality issue back in the hands of a few. (Google's aggregation function is the math that Brock and I don't understand, and is their core asset).
There is another concept relevant to the WoC that Surowiecki does not spend much time on called the Condorcet Jury Theorem, which says that if the members of the crowd each individually have a less than 50% chance of getting the answer right, then the chance the crowd will get it right is almost certainly 0%. (See http://www.lessig.org/blog/archives/003027.shtml). That is a real likelihood in Wikipedia, especially if the "frequent contributors" are few and in the < 50% category.
Chris has faith that as "wikipedia grows" it will become better. I fear that the growth necessary is similar to that of a Ponzi scheme: every human being on Earth will need to be doing nothing but editing the wiki entries they have knowledge on all the time for it to be reliable.
Posted by: jelons17 | December 19, 2005 at 01:31 AM
jelons17 -
You say I fear that the growth necessary is similar to that of a Ponzi scheme: every human being on Earth will need to be doing nothing but editing the wiki entries they have knowledge on all the time for it to be reliable.
Not so, I think - there are technical fixes around that problem. An expert on, say, the First World War only really needs a stored RSS search that informs them if the pages on that subject change, or even better one that informs them if the pages on that subject change in particular ways. Any given person might need to keep a feed of the page about them (if there is one); their company (if they have one); and whatever other tiny number of things they happen to be sufficiently expert in that they would be expected to constantly edit those pages on Wikipedia in your Ponzi model.
Now, admittedly, Wikipedia doesn't have RSS searches that tell you when pages have changed. Yet. But lots of newspapers - the Baltimore Sun-Times is, I think, the longest-running example and NYT the most recent - have saved RSS search facilities, it's not especially hard to do.
Posted by: Seamus | December 19, 2005 at 02:23 AM
I really wish I'd read this post before I wrote my post concerning what I think is going on:
http://www.well.com/~wiggy/2005/12/battle-in-new-war.html
The interesting thing is that people don't see this for what it is: an outright philosophical war. Some people think a small group who are qualified in some capacity can produce 'better' information than a much larger group that on average is lesser qualified, but INCLUDES the small 'highly qualified' group anyway.
It's OK to think about these things in terms of probabilistic systems, but it's much simpler than that: you either believe in democracy and freedom of speech or you don't. Twenty years from now, information will be a more valued resource than oil. In some industries, it already is. We have a choice: do we want to put the systems in place now to make sure we all own it, or do we actively fight against a system that seems counter-intuitive, thereby putting the ball back into the court of a very small group of people.
The Internet needs Wikipedia and sites like it. It needs information to be free and editable by anybody. To fail to work out the very small glitches and protect assets from the attacks predicted by game theory would be to plan to lose to the Murdochs, the Turners, the Rumsfelds of this World. Simple as that.
Posted by: Paul Robinson | December 19, 2005 at 05:33 AM
re.: "the *entire encylopedia* is probabilistic."
Doesn't this ignore the way users access the content on Wikipedia? Sure, some scholars may browse subject areas and therefore the greater content is probabilistic - but most folk engage in hit and run activity. Quick in, quick out. This is the age of attention deficit - we want single entries now not subject areas or entire encycolpedias. Wikipedia has broken our trust in the single entries of content (and not doing much to rebuild it, to be honest) - and this could be Wikipedia's downfall.
Posted by: Piers Fawkes | December 19, 2005 at 06:08 AM
"Twenty years from now, information will be a more valued resource than oil"
This ignores supply and demand. In twenty years we'll be saturated in information and thirsting for oil
Posted by: robbie | December 19, 2005 at 06:57 AM
Actually, I couldn't disagree more with your observation. All biological systems are emergent, including ourselves. If you notice the folks having the problem with these types of systems, they are scientists, engineers or business folk. These people have been trained since their formative years to think in a quite unnatural way when dealing with the world. The world does not follow a simple set of linear equations that can be pulled from you typical college textbook, it is emergent. Yes, they may be using a mathematical technique to exploit emergent properties in an information space, but that doesn't make it any less emergent. For most of us, it actually feels right already. It's most of you who it feels wrong for, with "you" being the scientists, academics, professionals, etc.
Posted by: Tony Mendoza | December 19, 2005 at 07:15 AM
I always think the best example of "Wisdom of Crowds" is the "Ask the audience" part of Who Wants to be a Millionaire. The crowd is almost never wrong. The people who don't know the answer make a random guess, but all the random guesses cancel each other out and you're left with the people who really DO know the answer.
Posted by: Andrew Thomas | December 19, 2005 at 07:44 AM
Umm, all you seem to be saying is that these system are built to be mostly right, most of the time, and we strange weird primitives don't "GET IT" when we are bothered that they're notably wrong many times.
That's a comprehensible view - but not necessarily an easily defensible view!
Posted by: Seth Finkelstein | December 19, 2005 at 09:44 AM
Piers: I can't speak for anyone, but I find that due to the interlinked structure of Wikipedia, it's rare that I view only a single entry. Typically, I surf broadly related entries for a half-hour or more, absorbing information on a variety of topics.
Obviously, this is pure, not applied, research. But is Wikipedia really the tool for applied research anyways?
Posted by: Mike Purvis | December 19, 2005 at 11:04 AM
The "Rumsfelds of the world"? Is he a media mogul now, too?
Posted by: Nate | December 19, 2005 at 11:40 AM
Re: everyone editing wikipedia all the time. It should not be necessary to edit a topic more than *once* or to monitor it constantly. The "technical fixes" should take care of all that. Once information is entered it should be preserved, not hidden away under "changes" where a casual reader may not see it. If each entry could be made probalistic as well as the whole site it would increase the value of each entry. The value of the entry is what most visitors will be interested in, especially in out consumer based society.
Posted by: Daniel | December 19, 2005 at 12:01 PM
For those of you who don't trust Wikipedia, I pose the question, "Do you trust Britannica?" If so, check out this link . Seems there are errors either way. The search for truth is an endless quest.
Posted by: Trent | December 19, 2005 at 02:02 PM
Wikipedia is probabilistically successful even if you hit and run. The question is "Given a query, what is the chance that you will get an answer, and that it will be correct?" With Britannica, the latter half of that question is a bit higher, but the first half is much lower. Getting no answer at all is a failure, too. Overall, your odds of getting useful information are higher on wikipedia.
I do think it could do with better aggregation. There are plenty of experiments out there, wikipedia's just one of them...we'll get there.
Posted by: joe | December 19, 2005 at 02:25 PM
I trust that someone is accountable for mistakes in Britanica. That may be misguided too. But it's also why we have defamation law. Wikipedia is kind of defamiation proof. Sure, if someone complains, the offending publication will be removed. But in at least some cases, the damage is already done, and the distributed nature of the WP makes it difficult or impossible to hold anyone accountable (particularly given the ease with which people can post anonymously). Conversely, if Britanica did the same thing, they'd face a defamiation suit. This, I would imagine, if a pretty profound incentive to err on the side of not publishing untruths or information damaging to someone's reputation.
I like WP quite a bit myself. It's a very nice way to interface with info to the extent the info is accurate. I love being able to read one article and then drill down on a term by clicking on a link. That's a really great way to explore.
But I think the WP should do a better job making clear to the users the inherent limitations of the WP at the micro level (i.e., there's a pretty good chance that any given article could be wrong in a pretty major way).
I'm one of those over educated academic/professional people someone was complaining about above. But from time to time, I teach college students. The limitations of the WP are not at all obvious to them. They just want the easiest path to getting an answer (or at least the feeling of getting the answer), regardless of whether the answer is accurate. Clearly, it's the job of teachers to help educate students about the limitations of things like the WP, but it sure would help if the WP folks were a bit more forthright with the user about the WP limitations.
WP does have a disclaimer. But you must click an 8 point type link below the fold at the bottom of the page to get to it. How many people ever click on links like that? Not many.
Instead, I think each article should begin with some language like this followed by a link to the longer disclaimer:
"WIKIPEDIA IS A PLACE TO START RESEARCH, NOT A PLACE TO FINISH IT. THE WIKIPEDIA COMMUNITY DOES ITS BEST TO POLICE THE ACCURACY OF THE INFORMATION HERE. BUT BECAUSE WIKIPEDIA ALLOWS ANONYMOUS CONTRIBUTORS, NO INDIVIDUAL OR INSTITUTION IS LEGALLY ACCOUNTABLE FOR THE ACCURACY OF THIS INFORMATION. THEREFORE, THIS INFORMATION IS PRESENTED "AS IS," WITH NO WARRANTY TO ITS ACCURACY, AND THE BEST PRACTICE IS TO CHECK WIKIPEDIA ENTRIES AGAINST OTHER MORE EASILY VERIFIABLE SOURCES."
Posted by: j-lon | December 19, 2005 at 02:48 PM
With respect, I don't buy the idea that the human mind can't handle the notion of a micro/macro tradeoff. In point of fact, the human brain is BUILT to discard information at the microscale and produce decent average results at the macroscale.
A simple case in point is the concept of temperature. There's no such thing at the microscale, in this case meaning atomic scale. Temperature is an aggregate property of the average motion of huge numbers of atoms, not the instantaneous, or even long-term-average, motion of a single atom.
Even at the macroscale, human perception of temperature involves more loss of low-level precision. Most people can't tell you how the temperature at their elbows compares to the temperature at their knees, let alone how much signal they're getting from a single, specific nerve. Nor can most people give you a precise statement of the absolute temperature around them at the moment.
It's hard to find a part of the human information-processing system that doesn't characterize information and throw away the detail before passing the message up to the next level of processing, in fact.
IMO, the real trouble is that people want to believe that every problem has a simple, easily-stated, one-size-fits-all solution that will always provide good answers. No such solution exists, or ever has, but in time, people get used to the inaccuracies of whatever system is in use at the time, and learn to ignore them.
We discount the fact that many specific news stories about, say, atrocities in the Superdome following Katrina, were completely inaccurate, because we believe that on average the mechanism of news production gives reasonably good results.
People have trouble with Google and such because they haven't had time to develop a blind spot that lets them ignore the erronous results, and go back to their comfortable assumption that the system is Platonically perfect.
Posted by: Mike Stone | December 19, 2005 at 06:43 PM
If Wikipedia is a "place to start", that "shouldn't be cited" and beneficial for the ability to "surf a bunch of interrelated topics through links to get a quick overview" built by anonymous contributors who can't be check for authority, how is it any different from the Web with Google?
Also, does the success/quality of Wikipedia require that there only be one Wikipedia? If there are more than one Wikipedia, doesn't that make it harder for each individual article to have the many eyes necessary to improve quality? If so, who gets to decide which Wikipedia is the one?
Posted by: jelons17 | December 19, 2005 at 08:02 PM
"Given a query, what is the chance that you will get an answer, and that it will be correct?"
Wiki does much better at this than one would initially assume because queries aren't randomly distributed through wiki-space -- people share common interests. Queries cluster. The more likely it is that you are interested in a particular topic, the more likely it is that other people were interested too. Interested enough to create, modify, and watchlist that topic.
Thus, Wikipedia could easily be >99% accurate (measured as percentage of accurate answers returned) even if half the articles in the database were complete nonsense, so long as the /right/ articles are in the accurate half. The important question is whether the articles being given the most attention are the ones people care most about the answer to. Which is where the probability comes in.
Posted by: Glen Raphael | December 19, 2005 at 11:10 PM
Ok, two last points.
What I've been trying to say is that WP does not provide enough of a filter. Information gets in too easily. "Correct" information almost always has a higher signal strength than incorrect information, so raising the bar should not damage WP.
Soundbite: With each additional user WP gets less reliable and Google gets more so. (And the evidence of WP's co-founder Wales editing his own bio should make my point quite clearly)
And on Daniel's point, he's right. Correct information should have have to be contantly guarded. Vigilance is a high-cost activity and I have better things do with my time than constantly watch out for people editing the article about me, or mentiong me in other articles.
And as a last, third point, Paul Robinson (above) is full of crap. This is not a philosophical war. This is a straight-up social engineering question of how information is processed within a society, how information is filtered, and how decisions are made. Some systems are better than others at different kinds of tasks. The only "War" is the war to improve Wikipedia.
Posted by: Brock | December 20, 2005 at 02:11 AM
Re: . It's most of you who it feels wrong for, with "you" being the scientists, academics, professionals, etc.
Posted by: Tony Mendoza | December 19, 2005 at 07:15 AM
-----
There is a reason scientists would have a concern about this. There are "laws" of nature that are immutable as far as we are concerned. We can analyze a process, and if done the same way every time we will get the same results. This ONLY applies to the real, observable science fields (mathematics, physics, etc) - not to fields like archaeology, anthropology, etc where people see what they want to see. There is truth, and there are embellishments of it, retractions from it, etc. It only takes a person with an "agenda" to put their slant on the information to make it "tainted." Same applies for standard encyclopedias.
Posted by: Eric Johnson | December 20, 2005 at 11:18 AM
what if filters worked together on an aggregation platform focused on the specific content the filter's cared about and completely eliminated the clutter found around it. what if filter's could start aggregating the specific content they like like content from other sites. what if there was a web search engine that did not contain web pages but instead only contained the specific stuff a user wanted from any given page. what if the scalability of filters was infinite and their actions over time created a social engine of purely filtered content. we are attempting all of this and more at clipmarks.com. click my name to see what i am filtering...
Posted by: Adam Moskowitz | December 20, 2005 at 11:53 AM
I don't buy the "not wired" argument, that seems more a matter of how we're trained to think. Many people are quite good at figuring out things intuitively, and I'd be inclined to guess that what we call intuition is an innate capacity for sorting out imprecise probabilities without consciously working out special algorithms.
And evolution? That's not a question of probabilities confusing people, that's a question of refusing to accept notions that counter what you've been raised to think.
Posted by: Pablo | December 22, 2005 at 09:29 AM
Article prompted by this post: An Introduction to Connective Knowledge
-- Stephen
Posted by: Stephen Downes | December 22, 2005 at 09:29 AM
I think Chris' point with Wikipedia was to illustrate how people often think in binary terms - things are 'bad' or 'good'. Further, they make these determinations by examining the exceptions to the rule rather than the rule itself. For instance, the media recently publicized a few very prurient examples of incorrect entries on Wikipedia which causes a lot of people doubt the 'referencability' of the entire body of work (see Nicholar Carr's Rough Type for an illustration). However, this type of thinking ignores the fact that Wikipedia is as accurate as other historically accepted reference materials *on average*.
People who dismiss it Wikipedia because of a few instances of bad entries are missing out on the rest of the material which may be as good as any other source. This kind of thinking can be limiting, whether for personal research or building enormous businesses (Google).
Think about how much Google was denigrated when it IPO'd. People just don't get it. Why? Because they thought Google was a "dot-com". They couldn't appreciate the fact that conditions may have changed that would allow the company to leverage distributed information using an innovative business model as many smart people had predicted.
I think Wikipedia has an opportunity to adjust the system to improve it's accuracy. And if they do, then it goes from being widely disputed to the global standard. Wikipedia may undergo a similar transformation to Google: a little tweak could make all of the difference.
In other words, I think it's probable that Wikipedia's short-comings can be addressed.
;)
Posted by: Jake Kaldenbaugh | December 22, 2005 at 01:21 PM
I keep forgeting about Wikipedia all the time, I just hit homepage button and I am on Google, but I will change that.
Posted by: Marko | December 22, 2005 at 11:10 PM
An insightful piece and well placed with context. Although I reject your premise based solely on the fact that known data about mammalian brains shows operation based on noise reduction and recursion. The systems you mention such as Google and Wikipedia currently work as a magnifying glass on the output side and as a massively backpropagating neural net on the input side in each case there is random input and well filtered output. Obviously the number of calculations and bias/dampening on the whole is far less than that of a rodent's brain at this stage, the point is the interface, we are all looking through ACME brand scale systems.
Given Bayes Theorem applied to a multi-user state enabled system like Wikipedia you will find that the difference in the ratio of input to output quality is decreasing but the interface is still the same. Wikipedia view models have a much closer correlation to input then Google (hence the scheduled crawl/whatnaught). Given that Google bases results on known indexed web pages, video, and other items, Wikipedia is very different and bases all results on directly written items. If the system continues at the pace it is currently there will be lengthy entries for the word "the" and the word "this". Disruptions in the quality of both systems benefit the back propagation and enhance the returned results in the long run en masse. Now we CRAWL soon we will WALK eventually they will run.
The human brain and most other animals do large amounts of filtering and pre-processing before it is deemed as conscious. The brain does absorb more audio input then consciously recognized, as well as tactile and so on. Signal degradation among the pathways in the Limbic system, Hypothalamus and such damper this causing less processing to be done in the Cerebrum. Google and Wikipedia operate in similar fashion but the interface is much smaller and the output is very generalized. A close example is the stop words you see at returned results from Google excluding words such as "the", "him", "it". Kurzweil identifies that this gap is shortening in the statement about "Law of accelerating returns".
Invoking Darwinism I will agree that it is better to be on the side of the masses in this aspect. How many editors revert or change Britannica entries on a daily basis? How many books has the Librarian read versus indexed texts by Google? But on the defense of Britannica how many similar methods used to build the volumes were around before it?
I will agree that my statement is pure bullshit. We are just building stuff off the only thing we know which is "us". At the current state we haven't hit the level of "cave art" yet.
Posted by: cwolf | December 22, 2005 at 11:20 PM
Speaking of Wikipedia and Google,
"They're designed to scale, and to improve with size. And a little slop at the microscale is the price of such efficiency at the macroscale."
But, what if it matters that your use of the information is "right" or "wrong"? When your use of information from these sources is judged by others who "know" the information, it does matter whether one micro-fact is correct or not!
Posted by: Larry Irons | December 26, 2005 at 09:11 AM
"When professionals--editors, academics, journalists--are running the show, we at least know that it's someone's job to look out for such things as accuracy."
That's the same old "Crazy Yenta Gossip Line" argument that Harlan Ellison was on about ten years ago.
It's still wrong. Journalists aren't any smarter than the rest of us, and every time I've run across a story about something I'm an expert in, they've gotten it so wrong that these days I treat them as no more reliable than blogs... if they can't get it right when I can thell they're wrong, how can I trust them to get it right when I don't know better?
See my response to Ellison by following the link...
Posted by: Peter da Silva | December 27, 2005 at 12:17 PM
Wikipedia is so inaccurate that its writers are referred to as wikiling writers. Another problem that the article above overlooks is that wikiling administrators actually ban writers who post correct info that is disliked by the administrators. That skews all "accuracy" discussions. http://rexcurry.net/wikipedialies.html
Recently, someone who posts to Wikipedia wised up and improved the "Roman salute" article somewhat so that it recognizes and repeats some of Dr. Rex Curry's discoveries. http://rexcurry.net/wikipedia-lies.html
Wikiling writers cover up new discoveries by Professor Curry that the salute of the horrid National Socialist German Workers' Party originated from the USA's Pledge of Allegiance. http://rexcurry.net/book1a1contents-pledge.html
Wikiling writers cover up Dr. Curry's discovery that although the swastika was an ancient symbol, it was also used sometimes by German National Socialists to represent "S" letters for their "socialism." Hitler altered his own signature to use the same stylized "S" letter for "socialist" and similar alphabetic symbolism still shows on Volkswagens. http://rexcurry.net/book1a1contents-swastika.html
Posted by: Pointer Institute for Media Studies | December 27, 2005 at 01:30 PM
Chris,
The claim that humans don't get statistics is something that we should be able to verify--perhaps by looking at 'experimental economics' and Kahneman's work. Perhaps this is something you can do to evidence that claim.
The wikipedia vs britannica is a funny debate. Surely the point is that both wikipedia and britannica are jumping off points, unless you are checking the really really trivial.
In most cases, the wikipedia is a 'good enough' jumping off point but for anyone doing anything more detailed, you may want to be furnished by longer bibliographies (of both journals and books) which EB may (or may not) help you with.
Even for canonical resources, there is revisionism and post-revisionism.
Posted by: azeem | December 28, 2005 at 04:49 AM
The problem with Wikipedia is exactly that at a microscopic scale, it's often wrong. As someone pointed out above, there is no built in aggregator to display group wisdom; you merely see the last edit at the given point in time you read it. Further, it's often (perhaps always?) worse to 'know' something that is wrong than not know anything (the so-called failing of Britannica.)
For example, I'm a mathematician. I was reading a topology book and wanted examples of a particular idea. I went and looked on Wikipedia and found such examples; unfortunately, the five examples listed were entirely wrong. I had the ability to recognize this but in general, most of the audience for the post wouldn't (or hence wouldn't need to look at the post in the first place.) In mathematics, knowing something wrong is definitely worse than not knowing anything; hence, the problem that Wikipedia isn't trustworthy means it's close to useless for many subject fields in which correctness is important.
Worse yet, I annotated the five examples with a note explaining how they were wrong, said that I didn't have time to fix them, then gave one correct example and a diving line (a horizontal rule) to offset my one correct example. Some retard then came along and removed the horizontal line, so there was a note saying "the following examples are all wrong for blah blah blah reasons but here is a correct example" with no divider between the correct example and the incorrect ones. The article remained this way for another month or so, IIRC. Hence, for subjects such as mathematics where there is a distinct right and wrong, mathworld is far more useful than wikipedia will ever be until they find authorities and start locking articles.
earl
Posted by: Earl | December 28, 2005 at 01:32 PM
Earl,
You raise an interesting point when you say "As someone pointed out above, there is no built in aggregator to display group wisdom; you merely see the last edit at the given point in time you read it."
But I think there is, in fact, a cumulative effect that comes from "community of ownership" that is formed by the contributors. If someone takes the trouble to edit an entry, they're more likely to put it on their watchlist and take an interest in its further development. Thus over time more contributors equals more people invested in improving and protecting the quality of the entry.
That's why they tend to get better, not worse, as times goes on.
Posted by: chris anderson | December 28, 2005 at 01:52 PM
Chris,
You're wrong -- you still only see a snapshot. Now, community interest may (though often doesn't) mean that errors in wikipedia are corrected. Nonetheless, the key difference is that while google *always* displays aggregated community knowledge, wikipedia *always* displays some instance of one person's edit. Thus google always approximates correctness while wikipedia is often quite wrong. And that, of course, is the reason wikipedia isn't at all useful for finding correct facts -- you may well visit during one of the periods of incorrectness and these periods last a highly variable (and potentially quite long) amount of time.
earl
Posted by: Earl | December 28, 2005 at 02:46 PM
Earl,
Well, of course it's just a snapshot. But my point was that as the community of ownership grows, the change delta between snapshots and the average length of time to correct errors will shrink. No guarantees for any particular moment, but over time the entry becomes statistically more likely to be accurate.
Posted by: chris anderson | December 28, 2005 at 02:54 PM
If you keep thinking in terms of probability instead of simple either/or logic, another issue comes up which REALLY unnerves some people:
Countless other universes.
I know, I know: the existence of other universes (with their own natural laws, energies, matter, life and other things) cannot be proved with the old either/or logic.
But in terms of probability, the idea that ONLY this universe exists, and still has intelligent life in it, is absurdly improbable.
The "Shmintelligent Design" crowd loves to point this out as "proof" of God's objective existence... but they hate the idea of many other universes because it annihilates the ID argument: if our cosmos is one of (infinitely) many, our existence is not just likely -- it's inevitable.
Mention the probability argument for the existence of other universes at a party, and watch the other guests explode: "YOU CAN'T PROVE THAT!!"
You can of course reply: "I can't logically prove that other people really exist as thinking beings and not as automatons, but probability logic argues that they DO exist. In your case, though, I'll make an exception..."
Then run -- fast.
;)
Posted by: A.R.Yngve | January 12, 2006 at 05:35 AM
I just wanted to take a moment to thank you for writing your article, The Probabilistic Age. I read it about a month ago and it really expanded my thinking about how to operate a content-based Web site in the age of the blog.
I run the American Marketing Association's Web site at http://www.marketingpower.com. I've been aware of blogs for a while, but I didn't really get the concept.
Based on some of the ideas I found in your article, I'm developing a new content strategy for the site that ties into all the blog and audience generated content activity.
I've even started my own blog -- all of one day ago -- to work through the ideas, Little Wolf (http://littlewolfpack.blogspot.com).
You have a fascinating site, thanks again!
James Heckman
Posted by: James Heckman | January 27, 2006 at 06:31 PM