As always, the gnarly questions that come up when I give a speech are the best part of the experience. On Thursday I spoke at The Media Center's Emerging Technology conference at Stanford University, where I was asked a toughie: could bad actors drown the Tail in a tide of spam and other robotic graft?
The fear, to put it simply, is that the Tail is fragile and that niche quality can easily be overwhelmed by industrial-scale automated marketing, fraud and vandalism. After all, the signal-to-noise ratio of the Tail is already low; could digital grafitti push it to insignificance? The fall of the once-great Usenet, which collapsed under the weight of autospam and trolls, is the cautionary example. Will the same happen to the blogosphere and all the trust networks and other recommendation mechanisms that we use to pull diamonds from the rough?
I think the answer is no. At the conference I said, controversially, that I felt a combination of technologies had finally come together to turn the corner on the email spam problem and I didn't see why the same approach wouldn't work for comment spam and other automated web scams. In short, I think the good guys, backed by smart technology, will win in the end. (Ross Mayfield did a fine job covering the talk here.)
My confidence comes primarily from seeing the spam problem, which was until recently deemed an out-of-control epidemic, get reduced to a very manageable two-or-three-a-day annoyance in my own case. I use four spam filters, two on the server and two on the client, and the result is the graph above, taken from just one day. Although more spam than real email is sent to me, almost all of it is filtered out with virtually no false positives.
(Note: I know I may not be typical, but I do think I'm not a total outlier either. I have a public email address, so I get a good bit of spam. And of those four filters, three are automatically provided by my company, ISP or email software. I only pay for the fourth. In other words, most people's problem are not as bad as mine and their solutions are equally within reach, so their experience shouldn't be much worse than my own.)
I call this multiple-filter approach a "cocktail therapy", intentionally invoking the drug combinations that have shown such success in fighting AIDS. The best strategy is to attack simultaneously on many front.
My spam-blockers are a combination of Bayesian filters and networked blacklists. Although they all play a role, I think the most powerful among them is Cloudmark's SafetyBar, which uses collaborative filtering. Its client software fingerprints email that you identify as spam and sends that fingerprint to all the other Cloudmark clients out there. If enough people with enough good track records call something spam, it probably is. The client software automatically moves a blacklisted email into the spam folder for everyone else, saving them the trouble of clicking on it themselves. (If someone mistakenly calls something spam and others reverse it, the mistaken user's rating will go down, reducing their influence until they earn enough confirmatory credit again). This is The Wisdom of Crowds at work, market forces applied to fight automated crime.
I think we can combat the problem of comment and trackback spam using the same forces--democracy and shared information. The computers that are depositing links to porn sites in my comments and trackbacks (now all deleted, albeit by hand) are doing the same at other blogs. Yet the implicit data contained in my deletions is lost if it isn't shared with those other sites. A little bit of software can make all the difference, recording the thumbs-down "votes" of vigilant bloggers pruning their comments of spam and automatically passing that information on to other blogs, where the deletions can happen with no human intervention.
There are, to be sure, differences between email and comment spam. For starters, most comment spam occurs in the parts of the web only visited by machines: dusty archives patroled just by spiders and, unlike your own inbox, rarely seen by humans. Furthermore, Google has added financial fuel to the fire by making links, even if nobody follows them, essentially free money, which is more than can be said for most herbal Viagra email spams.
But the similarities are even more striking. Already, an open source project called MT Blacklist (MT=Moveable Type, the blogging software from SixApart that underlies the TypePad hosting services that I use, too) is showing how collaborative comment-spam filtering might work. It's still pretty crude and not yet automatic, but Jay Allen, its creator, describes the promising future of the project here. Another advance is Google's "no-follow" tag, which allows bloggers to deprive comment spammers of Google juice
The low-hanging fruit for this is hosted blogging services such as TypePad, where everything is done on a central server and the information about who-is-deleting-what can be easily shared. I was pleased to hear that the Cloudmark and SixApart folks have met to talk about this. "We're going to be adapting the lessons of the email spam world to comment spam," Michael Sippey, SixApart's VP for Products told me. "We're looking at ways to include community feedback mechanisms into that filtering process. " No announcements yet, but I'd be surprised if within the year my trackback and comment spam wasn't automagically cleaned up as rapidly as my inbox now is.
I think Thunderbird warrants a mention here. It has free spam blocking which is good but has to be trained. If the Mozilla foundation could set up a system where marking something as spam would send information to their server to be downloaded by others it may make a real difference. Microsoft is doing something similar with their Anti-Spyware beta.
Posted by: Kirk | February 14, 2005 at 11:03 AM
I'm not sure I know what the question means? Does mean anything to in the context eBay, Netflix, Amazon, abeBooks? Does it mean anything in the context of the wealth distribution? Or is this just about some important subset of the systems that leverage long tail dynamics?
One of the key features of the long tail systems is that the tail is full of junk - stuff that by most measures doesn't rise above some quality bar or another. So asking if bad actors can pollute it seems a bit strange.
Posted by: Ben Hyde | February 14, 2005 at 12:02 PM
I'm not convinced that the email spam issue has been 'solved' at all; maybe it's just paranoia, but I'm beginning to see this as the calm before the storm. Hotmail & Gmail's spam filters have been growing progessively worse over time. A lot of spam I've seen lately appears to be chunked with random text (physics texts are popular for some reason) which can only be an attempt to degrade the effect of the filters.
Posted by: Alex Dante | February 14, 2005 at 03:39 PM
I always wonder what the point of that is. Whenever I get an e-mail with a bunch of nonsense words in the subject, even if it makes it through my filters, I just delete it without opening it. Are spammers sending these things out specifically to mess with the filtering, or do they actually expect people to open their e-mails?
Posted by: David | February 14, 2005 at 04:08 PM
I dunno. I think you would actually want a lot more spam filters with more false positives, so that it'll throw out real mail and make that graph look more like the Long Tail. :)
Posted by: fling93 | February 14, 2005 at 07:09 PM
I asked that question, and while your post here is more well-thought-out than the on-your-feet response you gave at the conference, I still don't buy that e-mail spam is conquered. And I don't think recommendation spam will be easily defeated.
If/when spammers realize recommendations are the key to navigating the long tail, they're going to be pulling out all of the stops to influence that navigation. MT Blacklist is a cool application and it's serving me pretty well, but I still have to wade through and delete/report a lot of crap each day. There also are setup problems that I'm still fighting through. This isn't turnkey.
And remember: It's the people who aren't using Blacklist and who own blogs packed with comment spam that encourage/enable this spamming.
While I think your argument has merit and I hope it goes a long way toward solving the problem, I wouldn't underestimate how cunning and innovative the spammers can be ...
Posted by: Benz | February 15, 2005 at 10:54 AM
Only to say that, educated and literate as I am, I can't make heads or tails--hah!--of what you're talking about on your blog here. "Gnarly" I know but "robotic graft" and "signal-to-noise-ratio!" Totally foreign. And I got here from some nice parenting blog. How could it be!? Good luck.
Posted by: Eve | February 20, 2005 at 10:33 PM
I generally agree with your answer, but I'm still struggling with the premise of the question. Doesn't it turn on the definition of "spam"? By definition, a lot of content in the tail is irrelevant to the majority of consumers. We might call that spam, or we might just call it content that appeals to a minority interest. I've never been clear on the difference. Eric.
Posted by: Eric Goldman | February 21, 2005 at 01:37 PM
Eric, it seems to me that the difference between spam and other irrelevant content is that the latter sits out there on the web, and the spam ends up in your mailbox. Likewise w/ programs (rather than people) that deposit comments on blogs; it's not just irrelevant content, it's delivered to your door in bulk.
And Eve, I think any good parenting blog should address the issue of signal-to-noise ratio.
Posted by: David Palmer | February 21, 2005 at 08:51 PM
David, not sure the push/pull distinction works--especially in the day of RSS readers, email alerts like Google News alerts, etc. In the end, the problem is wanted v. unwanted; the medium we use to get there seems irrelevant to me. Eric.
Posted by: Eric Goldman | February 23, 2005 at 12:19 PM
amateurxxx
analsex
animesex
asiansex
bbwsex
bdsmsex
bigboobssex
bigcockssex
bigtitssex
bisexualsex
bizarresex
blacksex
blondessex
ablowjobsex
bondagesex
brunettesex
bustysex
artoonsex
acenterfoldsex
cumshotsex
doggystylesex
doublesex
drunkgirlssex
bonysex
ethnicsex
facialsex
fatsex
feetsex
femdomsex
fetishsex
gangbangsex
gayasiansex
gaybearsex
gayblacksex
gaychatsex
gaycollegesex
gayebonysex
gayfreesex
gayhairysex
gayhunkssex
gaylatinosex
gaymaturesex
gayoldersex
gaytwinksex
gaywebcams
gayyoungsex
groupsex
hairygirlssex
handjobsex
hardporn
hentaisex
hotgirlsex
ahotguysex
indiansex
interracialxxx
latinosex
legssex
lesbianxxx
livesex
masturbationsex
maturesex
milfsex
olderwomen
orgysex
pantiessex
pantyhosesex
petitesex
pornmoviessex
pornstarssex
pornvideossex
realitysex
redheadsex
sextoyssex
exvideosporn
shavedsex
ahemalesex
mokingsex
spankingsex
teensexporn
trannysex
transsexualsex
upskirtsex
voyeursex
webcamssex
youngsexporn
Posted by: - | March 26, 2006 at 01:16 AM
amateursex
anal
animesex
asiansex
bbwsex
bdsmsex
bigboobssex
bigcockssex
bigtitssex
bisexualsex
bizarresex
blacksex
blondessex
blowjobsex
bondagesex
brunettesex
bustysex
cartoonsex
centerfoldsex
cumshotsex
doggystylesex
doublepenetrate
drunkgirlssex
ebonysex
ethnicsex
facialsex
fatsex
feetsex
femdomsex
fetishsex
gangbangsex
gayasiansex
gaybearsex
gayblacksex
gayfreechatsex
gaycollegesex
gayebonysex
gayfreesex
gayhairysex
gayhunkssex
gaylatinosex
gaymaturesex
gayoldersex
gaytwinksex
gaywebcamssex
gayyoungsex
groupsex
hairygirlssex
handjobsex
hardcoresex
hentaisex
hotgirlsex
hotguysex
indiansex
interracialsex
latinosex
legssex
lesbiansex
livesex
mangasex
masturbationsex
maturesex
milfsex
olderwomensex
orgysex
pantiessex
pantyhosesexporn
petitesex
pornstarssex
pornvideossex
realitysex
redheadsex
sextoyssex
sexvideossex
shavedsex
shemalesex
smokingsex
spankingsex
teensex
trannysex
transsexualsex
upskirtsex
voyeursex
webcamssex
youngsex
Posted by: - | March 26, 2006 at 01:17 AM
I'll second Bad Behavior. I installed it, as well a reCAPTCHA, after my blog was knocked out by a massive attack of comment spam (more than 20K an hour at the height of it) and it's done an amazing job of picking the spammers off. Only a few get through and Akismet gets those and, unfortunately, Howard's comments, but I no longer have to wade through pages and pages of comment spam to find the occasional false negative.
Posted by: xmas gifts | November 09, 2009 at 03:08 AM