Content Analysis & the Implicit Web

Posted in Attention, ImplicitWeb, Mashups at 3:46 pm by admin

Implicit, automatic, passive, or whatever you call it, this form of content analysis is starting to be recognized as a powerful tool in both a business and consumer/prosumer context. Companies like Adaptive Blue are using automatic content analysis techniques to personalize and improve the consumer shopping experience, while others like TapeFailure are leveraging this technology to enable more powerful web analytics.

Content analysis takes clickstream processing one step further, providing a much deeper level of insight into user activity on the Web. By peering into web page content (in the case of Adaptive Blue) or user behavioral data (as with TapeFailure), all sorts of new & interesting capabilities can be provided to both end-users and businesses. One capability that I’ll focus on in this post is automatic republishing.

Automatic republishing is the process of taking some bit of consumed/created information (a web page, mouse click, etc.) and leveraging it in a different context.

Let me give an example:

I read Slashdot headlines. Yes, I know. Slashdot is old-hat, Digg is better. Yadda-yadda. That’s beside the point of this example. :)

Note that I said “I read Slashdot headlines.” This doesn’t include user comments. There’s simply too much junk. Even high-ranked posts are often not worthy of reading or truly relevant. But alas, there is some good stuff in there — if you have the time to search it out. I don’t.

So this is a great example of a situation where passive/implicit content analysis can be extremely useful. Over the course of building and testing my company’s AlchemyPoint mashup platform, I decided to play with this particular example to see what could be done.

What I was particularly interested in addressing related to the “Slashdot comments problem” was the ability to extract useful related web links from the available heap of user comments. Better yet, I wanted to be able to automatically bookmark these links for later review (or consumption in an RSS reader), generating appropriate category tags without any user help.

What I ended up with was a passive content analysis mashup that didn’t modify my web browsing experience in any way, but rather just operated in the background, detecting interactions with the Slashdot web site.

When it sees me reading a Slashdot article, it scans through the story’s user comments looking for those that meet my particular search criteria. In this case, it is configured to detect any user comment that has been rated 4+ and labeled “Informative” or “Interesting.”

Upon finding user comments that match the search criteria, the mashup then searches the comment text for any URL links to other web sites. It then passively loads these linked pages in the background, extracting both the web page title and any category tags that were found. If the original Slashdot article was given category tags, these also are collected.

The mashup then uses the del.icio.us API to post these discovered links to the web, “republishing them” for future consumption.

Using an RSS reader (Bloglines), I subscribe to the del.icio.us feed generated by this mashup. This results in a filtered view of interesting/related web links appearing in the newsreader shortly after I click on a Slashdot story via my web browser.

This is a fairly basic example of content analysis in a user context, but does prove to be interesting because the entire process (filtering user comments, harvesting links from comment text, crawling any discovered links, extracting title/tag information from linked pages, and posting to del.icio.us) happens automatically, with no user intervention.

I think we will see this this type of automatic prefiltering/republishing become increasingly prevalent as developers and Internet companies continue to embrace “implicit web” data-gathering techniques.pills online usadiscount code 5%:_879981pharmacy online usa


  1. 1 said,

    November 30, -0001 at 12:00 am

    Yes, this is pretty cool and I agree that implicit is good. My other thought is that perhaps there is a family of heuristics for determining what you liked that can be leveraged.


  2. 1 said,

    November 30, -0001 at 12:00 am

    Nice!!! Yes, I agree that this passive analysis will be huge. The trick is finding the right way to do it.

    Speaking of delicious, I’ve recently been wondering about the whole “bookmarking” concept: Maybe we need something between the whole clickstream provided by the attentiontrust recorder (full disclaimer: which I authored), and the time/effort/seriousness of a real “bookmark”…

    What I’d like to do is, when bored, surf through the sites that my friend Todd marked as noteworthy in the last day or so. “noteworthy” could be something as simple as a funny video. Then you could make multi-user filters where i get a feed of only those urls that were marked by at least 2 of my friends, and so forth.

    Just a thought…keep those experiments going!nn1

  3. Aqua Regia - Mashups and the Implicit Web » Automated Content Monitoring said,

    February 14, 2008 at 12:59 pm

    [...] a big fan of leveraging automated techniques to optimize my daily workflow — many of my previous blog posts have focused on this topic. [...]

Leave a Comment