APML, a new standardized format for expressing attention preferences, has been receiving a lot of buzz in recent weeks. Mashable covers the topic here, Brad Feld here, Jeff Nolan here, and Read/WriteWeb here.
It’s great to see an increasing number of folks getting behind the concept of ‘standardized structured attention’ and embracing this emerging standard.
Attention has always been a topic of interest to me, something I’ve blogged about in the past, on a number of occasions. At my company Orchestr8, we’ve been working on solutions that can automatically capture the ‘context’ of a user’s attention and leverage this data in various ways. We’re currently implementing APML support into the next version of our software, which should provide for some really interesting capabilities.
The thing that excites me about APML is that it’s a relatively straight-forward standard (far, far simpler than the many RSS/ATOM variants). This will ease adoption and simplify portability of attention preference data across many products / services. Since APML expresses attention in a relatively abstract way, multiple products (even product domains, for instance Web versus Email) can leverage the same attention data.
Additional tech. note: Thank you, APML authors, for strictly standardizing the date format in the APML spec (ISO8601). If only we could have been so lucky with RSS/ATOM. Now lets hope people actually stick to the date formats!
Implicit, automatic, passive, or whatever you call it, this form of content analysis is starting to be recognized as a powerful tool in both a business and consumer/prosumer context. Companies like Adaptive Blue are using automatic content analysis techniques to personalize and improve the consumer shopping experience, while others like TapeFailure are leveraging this technology to enable more powerful web analytics.
Content analysis takes clickstream processing one step further, providing a much deeper level of insight into user activity on the Web. By peering into web page content (in the case of Adaptive Blue) or user behavioral data (as with TapeFailure), all sorts of new & interesting capabilities can be provided to both end-users and businesses. One capability that I’ll focus on in this post is automatic republishing.
Automatic republishing is the process of taking some bit of consumed/created information (a web page, mouse click, etc.) and leveraging it in a different context.
Let me give an example:
I read Slashdot headlines. Yes, I know. Slashdot is old-hat, Digg is better. Yadda-yadda. That’s beside the point of this example.
Note that I said “I read Slashdot headlines.” This doesn’t include user comments. There’s simply too much junk. Even high-ranked posts are often not worthy of reading or truly relevant. But alas, there is some good stuff in there — if you have the time to search it out. I don’t.
So this is a great example of a situation where passive/implicit content analysis can be extremely useful. Over the course of building and testing my company’s AlchemyPoint mashup platform, I decided to play with this particular example to see what could be done.
What I was particularly interested in addressing related to the “Slashdot comments problem” was the ability to extract useful related web links from the available heap of user comments. Better yet, I wanted to be able to automatically bookmark these links for later review (or consumption in an RSS reader), generating appropriate category tags without any user help.
What I ended up with was a passive content analysis mashup that didn’t modify my web browsing experience in any way, but rather just operated in the background, detecting interactions with the Slashdot web site.
When it sees me reading a Slashdot article, it scans through the story’s user comments looking for those that meet my particular search criteria. In this case, it is configured to detect any user comment that has been rated 4+ and labeled “Informative” or “Interesting.”
Upon finding user comments that match the search criteria, the mashup then searches the comment text for any URL links to other web sites. It then passively loads these linked pages in the background, extracting both the web page title and any category tags that were found. If the original Slashdot article was given category tags, these also are collected.
The mashup then uses the del.icio.us API to post these discovered links to the web, “republishing them” for future consumption.
Using an RSS reader (Bloglines), I subscribe to the del.icio.us feed generated by this mashup. This results in a filtered view of interesting/related web links appearing in the newsreader shortly after I click on a Slashdot story via my web browser.
This is a fairly basic example of content analysis in a user context, but does prove to be interesting because the entire process (filtering user comments, harvesting links from comment text, crawling any discovered links, extracting title/tag information from linked pages, and posting to del.icio.us) happens automatically, with no user intervention.
I think we will see this this type of automatic prefiltering/republishing become increasingly prevalent as developers and Internet companies continue to embrace “implicit web” data-gathering techniques.
There’s been a lot of buzz recently surrounding the “Implicit Web” concept, something I’ve blogged about in the past.
ReadWriteWeb has a great write-up on the subject, with case studies focusing on several popular websites that incorporate implicit data collection techniques (Last.fm, etc.).
Even more interesting is a “live attention-stream” viewer created by Stan James of Lijit. This neat little webapp utilizes clickstream data gathered by the Cluztr social browsing plugin, allowing Internet users to “follow along” with another user’s web browsing session.
This and other recent work on leveraging implicit data-flows is pretty exciting stuff, and we’re really only starting to scratch the surface as to what’s possible.
I’ve been toying around with implicit data gathering techniques for the last six months or so, using my company’s AlchemyPoint platform to gain access to clickstreams and other information. Because the AlchemyPoint system operates as a transparent proxy-server, it makes it easy to build simple analysis/data-mining applications that “jack in” to web browsing, email, and instant messaging activity.
So what’s possible if you’re “jacked in”? Let’s start with something very basic: gathering statistics on the usage of various web sites.
Above is a snippit from something I’ve been calling a dash-up. So what’s a dash-up?
Think dashboard + mash-up.
Essentially, a dash-up is a presentation-level mashup that collects data from multiple sources and presents it in a useful graphical dashboard view (in this case, dynamically updating activity charts). The above screenshot is showing both a general web traffic history, and more detailed statistics on my music-listening activity on the popular Internet Radio web site, Pandora.com.
OK, statistics-gathering is kinda neat, but what about something more useful? One of my favorites is passive tagging or implicit blogging activity. Stay tuned for my next post, which will detail some of the ways passive/implicit data collection enhances (through filtering, tagging, etc.) the Internet sites I use on a daily basis.
If you haven’t already done so, check out David Henderson’s recent post on Attention Mashups. He points to All Crazy Style, a mashup that integrates last.fm usage data with Upcoming.org listings to discover information on local band performances matching a user’s tastes.
This is great stuff.
Properly leveraged attention streams can be incredibly powerful. Google and Amazon are great examples of companies that utilize attention data with significant economic success. Mashups like All Crazy Style allow users to more directly benefit from their attention streams in the form of recommendations and other personalization features.
One of my favorite types of attention mashups is the auto-tagger. These do exactly what you might expect — categorize content automatically, using attention stream data to improve tagging accuracy. Tagging is extremely popular in web 2.0 and media applications, but users don’t seem to like entering tags manually. A recent post on BijanBlog discusses the need for good auto-tagging solutions, mentioning attempts like Riya to solve the problem:
“Riya tried to do photo autotagging but they wanted to replace stuff that we all use already with a new service. That’s really hard.”
I totally agree with this point. The most successful solutions will incorporate themselves into a user’s existing habits and browsing activity, not force the usage of a distinct service.