There's been some interesting discussion in the comments about what
kind of filtering works best and under which conditions. Reading it, it
occurred to me that I haven't seen a good taxonomy of filter types,
which would probably help tremendously. (Yes, I know, taxonomies
are so last century; feel free to collectively remix this into a much
more modern folksonomy out there on your own blogs). I'll lay out a first pass at a family tree here and then revise based on feedback.
There are, as I see it, two main categories of filters (or, to be precise, "post-filters")--Software and
People--with several subtypes and loads of different variation in the
wild. In the following construction, I'm probably missing some sub-species (or an
entire genus or two), so please offer suggestions and corrections in
the comments.
The Two Families of Filters:
- Software (recommendations)
- Wisdom of crowds (algorithms that measure collective opinion or behavior)
- Buzz: Google PageRank, del.icio.us, Technorati
- Ratings (active*): Netflix recommendations, eBay ratings
- Behavior (passive**): Amazon recommendations, Audioscrobbler, Yahoo Music custom radio stations
- AI: Semantic analysis, audio analysis
- Wisdom of crowds (algorithms that measure collective opinion or behavior)
- People (tastemakers, influentials)
- Pros (editors): Critics, celebrities, Rhapsody category editors, librarians, SavageBeast (Pandora) musicologists (+)
- Amateurs (mavens): Blogs, playlists, customer reviews, Wikipedia contributors, smart friends (++)
- Mobs (distributed intelligence): taggers, linkers, file-traders (+++)
-------
Notes:
* "Active" filtering: These are systems that ask you if you like or dislike things, so they can offer you better choices in the future. Because they require users to make an effort to participate, typically by clicking on some rating system, they're vulnerable to sampling bias. That can range from who participates to how they choose to interpret their choices, be it thumbs up and thumbs down or one to five stars..
** "Passive" filtering: Also known as collaborative filtering,
this is software that tracks the behavior and actions of lots of
people and extracts meaning and recommendations from that. Amazon's
"people like you" recommendations are the best-known example, but
hundreds of others services, from Netflix to Yahoo, do the same. The
advantage of passive filtering is that it's based on what people do,
not what they say. As such, it tends to avoid the participation bias
mentioned above.
---
+ By "pros", I mean people typically working within some
service to help guide consumer, such as Rhaposdy editors, although they could also be independent
domain experts who have built a business around their influence.
++ By "amateurs", I mean power consumers, people who are passionate and informed enough about a category to advise their peers.
+++ People who are part of a "mob" may not think of themselves as
offering recommendations or guidance at all. They're just doing what
they do. The fact that other people are following their example is a
consequence of network effects at work, not necessarily the intention
of the members.
"Active filtering" might be a poor phrase choice. The operative word Active is potentially confusing because anyone can easily make the obvious literary arguement that anything that is a filter is a filter; a "device that removes something from whatever passes through it." Filters are there for action and, because of this, activity is closely associated with filters.
I understand that the etymology of "active filters" is maybe from the field of electrical engineering and is usually covered there under the topic of electronic filters as a noise reduction technique. Unless there is an intentional comparison to "Q" Factors, I think "Active filters" may be an unsmooth phrase to typify a type of filter as such.
Electrical engineering and electronic filters are also a unique form of filtering that is seperate from what you have listed. For instance, both MPEG Audio Layer-3 (MP3) and Ogg Vorbis (OGG) files are lossy compression for compressing audio data into storeable file formats. However, in the process of compression, MP3s throw out trebble noise while OGGs throw out bass noise. This manipulates the electronic filters used in acoustic engineering. The point I am making is that "Software" as a family of filters seems to exclude "Hardware" based upon how "Software" is generally defined. Of course, any electronics guru will understand that the hardware is both electrical components and embedded-systems software. At the same time, the guru knows that electrical engineering side-effects in the hardware can manipulate the presentation of the software. These side-effects are pronounced and powerfully demonstrated by the software itself; acoustics hardware engineered without sufficient bass boost will distort the quality in the presentation of OGG files. Likewise, insufficient trebble will distort the quality in the presentation of MP3 files.
Additionally, being part of a "family of filters" means very little, doesn't it? It doesn't seem as though you have them grouped by technique, although your initial grouping does suggest you might feel its incomplete as well. Grouping filters by technique would probably be the preferred way, on account of the point made earlier that a filter is a "device that removes something from whatever passes through it."
Another phrase I have to examine closely is "tastemakers." Do they really make the tastes or simply provide taste samples? I would think that the number of people who actually make new tastes is actually quite few for any given taste: for instance, how many people invented Ogg Vorbis audio data compression? A critic does not make the taste but rather aggregates it through narrowcasting and broadcasting. In broadcasting, there is a difference between creating a fad and creating the product that makes the fad possible.
Am I being too obtuse? I hope not. These comments are intended to provide pedagogical/andragogical issues for when you are writing your book for the greatest clarity.
Posted by: John "Z-Bo" Zabroski | July 28, 2005 at 03:05 PM
Seperately;
I think closer attention should be paid to Mark Sigal's complaint (emphasis mine):
Mark is discussing a far more important issue than the classification of filters; he is mentioning the impact of filters. It is great to have a concept or a theory, or even an execution strategy. However, those things mean little if the impact is not discussed. I think Mark was suggesting the eventual desire from at least some consumers for sub-filtering or re-filtering and custom filtering. Let us address each of these desires seperately;
(1) Custom filtering comes from a desire, which I want to say is a desire for http://www.google.com/search?hl=en&lr=&safe=off&q=define%3A+censorship&btnG=Searchcensoring; "deleting parts of publications or correspondence or theatrical performances." This is different from normal filtering, because the intention is to block something. Adblock is a form of custom filtering (following emphasis mine): "Adblock allows the user to specify filters, which remove unwanted content based on the source-address." An entire source of content is being blocked at the individual user level.
(2) Sub-filtering or re-filtering comes from another (and perhaps different) desire, which I want to say is a desire for current filters to become easier for the user to use. I also suggest this desire comes primarily from "Experienced" usability zealots and not the "Beginner" user. Jef Raskin, the creator of the Apple Macintosh project and its interface, was quoted for an interview in Doctor Dobb's Journal, saying, "Imagine if every Thursday your shoes exploded if you tied them the usual way. This happens to us all the time with computers, and nobody thinks of complaining." I think most "Beginner" users have a hard enough time understanding how to get a product to work to know how to effectively critisize its design. Examples of sub-filtering are Google's personalized search and Yahoo's MyWeb 2.0. Re-filtering, which is something I cannot find a clear example of that happens day-to-day, would be something like Google's Re-indexing of its PageRank, which is simultaneously the greatest strength and weakness for any search engine optimization/marketing strategy.
Posted by: John "Z-Bo" Zabroski | July 28, 2005 at 03:45 PM
Chris,
What about the 'clipping service' filters? This are targetting a specific user with a known need in mind, instead of some group activity.
The prime example here is Egosearch, where one puts his name into PubSub/Technorati/Google Alerts and get notified automatically of all the webpages and blog entries that mention the name. Book authors love this one.
In terms of consumption opportunities, I might be interested in an emerging topic that does not yet have a dedicated news gatherer (e.g. Machinima 9 months ago).
Clipping service allows me to track the happenings, comment on and contribute to them with ease, thus speeding up the growth of the niche until it catches up enough interest to be trackable via other means.
The focus here is how easy I can setup my personal filters. If it is hard or requires consious effort (e.g. weekly topic search), I probably will just chase other interesting items that are more established. But if it takes 1 minute to setup a filter, I might do it for multiple very obscure, very low volume interests.
Regards,
Alex.
Posted by: Alexandre Rafalovitch | July 29, 2005 at 10:07 AM
last.fm falls under the ratings section above, but with a very low threshold for participation (or none if you choose not to) - you can choose to love, hate or skip a track.
As an aside, taxonomy is the oldest profession, not the one usually referred to. God's first job for Adam was to name the animals. To name them you need to differentiate species so that you give them different names.
Posted by: Paul Morriss | July 29, 2005 at 12:31 PM
[Mike Vicic (vicicm@prodigy.net) emailed an alternative taxonomy that he's allowed me to post here:]
The examples for your two categories (people, software) seem to focus on three types of data:
A. opinions/ratings;
B. behavior;
C. facts/features.
Furthermore, this data is either centralized (1) or distributed (2).
This setup gives a nice 3x2 matrix with the role of people and software
(actors) for different use cases in each box of the matrix.
A1. People provide opinions/ratings to a site A1. Software collects and presents (Netflix recs, eBay recs)
A2. People post reviews/opinions/ratings locally A2. Software finds, collects, anlayzes and presents (???)
B1. People use features (bookmark, search, buy, upload) at a site B1. Software collects, analyzes and presents (del.icio.us, google, Amazon recs, Audioscrobbler)
B2. People use features locally (bookmark, playlists, links, etc.) B2. Software finds, collects, analyzes and presents (some antivirus software, google, others?)
C1. People categorize, catalog and deconstruct facts and features at a site (imdb, wikipedia, tv.com).
C1. Software analyzes content and further deconstructs to find relationships & similarities that people did not (or cannot) find or use.
C2. People categorize, catalog and deconstruct facts and features locally.
C2. Software find, collects, anlayzes and presents (???)
Of course, this isn't quite that clean since _google_ is a combination of B1 (search eval at site) and B2 (links on individual pages). A single piece of software can span multiple boxes. But maybe it's that feature that made google so good at the start.
Posted by: chris anderson | July 30, 2005 at 11:44 AM
A very impressive effort by Mike Vivic. I think it is more robust than the original, and obviously by the question marks "???" there are still some issues to be worked out.
As an aside, I was making my way through various chains of links in the User Comments section and was reading Paul Morriss's blog, when something caught my interest.
The Problem Living in Thames Valley, 2005 July 13, Paul Morriss
Morriss shows some of the effects of an electronic filter and it's "Q" Factor.
Posted by: John "Z-Bo" Zabroski | July 31, 2005 at 07:18 AM
Take a look at B2 in the above taxonomy. (The question marks for B2 and C2 were meant as placeholders for examples.) I recently read at Popgadget about StumbleUpon. StumbleUpon is software that apparently analyzes your bookmarks (behavior) and ratings (opinions) in a distributed sense. You use your own browser as you would normally--unlike del.icio.us, which requires that you visit a centralized site.
I'm surprised that StumbleUpon didn't use some other metric (time spent per visit, # of visits per week) instead of user rating so that the recommendations are completely based on behavior and are completely transparent to the user.
Posted by: Mike Vicic | August 05, 2005 at 11:16 AM
"I’d rather read my articles at Klikhir.com because sophisticated filtering offers me products I am probably looking for - in the same page! That saves me time searching for trusted brands elsewhere."
Inspired by the teachings of Chris Anderson, Klikhir.com is a Web 2.0 recommendation filter that receives a daily article feed and combines each article with related (to what the user is reading) products and services, automatically. The articles themselves are the filter.
The article feeds arrive in emails at a rate of around 150-200 per day. Klikhir’s engine scans the email and extracts the subject, content, author, URL and date. Using sophisticated word processing Klikhir scans the content for classification keywords. Klikhir currently creates accurate keywords 93% of the time.
The keywords are required for Index Classification, Google Adsense, Commission Junction, Amazon, and eBay Web 2.0 Services. The combined web services are queried simultaneously using the keywords generated by Klikhir's articles. The result is a quality article of interest wrapped with products and services directly relating to what the user is reading - in the same web page.
In essence I trying to create a New Economy Factory that adds value to articles, by using them to filter affiliate products, on autopilot.
What do you think?
Posted by: Steven Rich | February 13, 2007 at 12:00 AM