05.20.08

Automated Content Alerting

Posted in Coding, ImplicitWeb, Orchestr8, Protocols at 10:15 am by elliot

Too many websites. Too little syndication.

This is a theme that’s been bouncing around my subconscious for months; something I’ve blogged about in the past.

But really, syndication is only part of the problem. Syndication normalizes data, and makes it readily accessible to 3rd parties — but it doesn’t push data where you want it. It’s a pull-focused technology.

For push, we need some sort of alerting capability.

Recently, I’ve been in the habit of checking delegate counters for the 2008 Presidential Election Primary Races. I check them daily; seeing updates to pledged delegates, super-delegates, etc.

Checking for updates doesn’t take a significant amount of time, but it’s yet another activity that can conceivably interrupt my work flow. Leveraging of automation would be a much better way to do this.

Recently, my company added Alerts capability to our AlchemyGrid beta service. You can create an alert based on anything — any sort of web content (syndicated or not). Alerts can travel over many communications mediums (Email, AIM, SMS, Twitter, etc.). They support lots of customization options (regarding how often to check for updates, what’s considered a “unique update”, etc.).

I used this new service to create an Alert that monitors delegate counts for the Democratic Presidential candidates (GOP has already chosen their candidate). Any updates to delegate counts are automatically posted to the twitter account “demdelegate08″. You can see the Twitter feed (and follow it if you wish) here:

http://twitter.com/demdelegate08

If you aren’t a Twitter user, and are interested in getting delegate updates via Email, SMS, or AIM, you can “Subscribe to / Follow” my Alert here:

DCW Delegate Alert

A few implementation notes, for the geeks out there: We’re using a custom-engineered AIM, Email, and SMS backend for our Alerts implementation. We’re interacting with external services directly at the Protocol/API level, not piggy-backing off 3rd-party gateways or using other unreliable modes of communication.

05.15.08

New Sidebar Widget

Posted in Clipping, Contextual, ImplicitWeb, NLP, Orchestr8, Uncategorized, Widgets at 9:02 am by elliot

I’ve just added a new sidebar widget to my blog: “Related Content”.

This is a demonstration of “contextual widgets” from my company’s AlchemyGrid service.

Contextual widgets utilize a custom-engineered “statistical topic keyword extraction from Natural Language Text” facility we’ve recently integrated into our products. If you’re familiar with the “Yahoo Term Extraction” API, our system is essentially doing the same sort of stuff. Natural language processing is fun (and challenging) stuff. Here’s a few notes regarding our implementation:

1. AlchemyGrid’s Term Extraction facility supports multiple languages (English, German, French, Italian, Spanish, and Russian!). This was an important requirement for us, to enable contextual content generation for non-English websites/blogs. There are significant differences between languages in terms of punctuation rules, word stemming, and other details. Hats off to our Term Extraction developers, you’ve done a great job ensuring good initial language coverage.

2. Our Term Extraction facility is entirely statistical in its basis, not using a hard-coded lexicon, etc. This enables it to extract contextually-relevant topic keywords even when they’re (a) new topics, (b) rarely used common nouns/people-names, or (c) misspelled.

We’ve just integrated Term Extraction into our Grid service, so there may be a few minor kinks to work out in the coming weeks — but overall we’re happy with the initial results. Contextual capability vastly expands the utility of AlchemyGrid widgets, as their content can now be automatically customized to relate to your content. This applies to *any* input-enabled widget in the grid (ALL widgets are contextual). Here’s another contextual example (a related Amazon book):

We’ll be enabling the other supported languages in coming weeks, as well as rolling out some additional enhancements to our text processing algorithms (for the geeks in the audience, enhancements to our sentence boundaries detector, inline punctuation processor, etc.).

02.14.08

Automated Content Monitoring

Posted in ImplicitWeb, Orchestr8, Scraping at 12:54 pm by elliot

I use the Internet constantly during the course of my everyday life — looking up telephone numbers, reading restaurant reviews, etc.

One task I’m frequently engaged in is Content Monitoring; that is, checking (and re-checking) websites of interest for updates and new information.

Now wait a sec — wasn’t syndication (RSS, ATOM, etc.) supposed to do this for me? Sure, if a website actually exposes data feeds. If they don’t, you’re mostly out of luck.

Alas, there are many websites out there with no form of syndicated access. This is just plain irritating.

Luckily tools are starting to appear that can eliminate this irritant. My company released a new service earlier this week, which makes great strides at solving this problem.

This new service performs Automated Content Monitoring: a way of programmatically monitoring information sources that currently lack syndication features.

I’m a big fan of leveraging automated techniques to optimize my daily workflow — many of my previous blog posts have focused on this topic. Leveraging algorithms to improve efficiency, access to information, and integration of data is a central theme to the Implicit Web, both a personal interest of mine and business interest of my company. Automated Content Monitoring fits perfectly within this arena.

I’m currently using our new Automated Content Monitoring service to track a variety of information sources: new events at my preferred concert venues, special deals offered by local radio stations, etc. Monitoring each of these information sources automatically frees up my time for more useful activities, and gives notification of website updates far sooner than if I were performing these tasks manually.

We’ll likely see increased uptake of Automated Content Monitoring solutions in the near future, as more individuals succumb to the dreaded Attention Crash.

11.04.07

Defrag Tomorrow!

Posted in Defrag, Denver, ImplicitWeb at 11:05 pm by elliot

Tomorrow (Nov.5th) is the beginning of the Defrag Conference, here in Denver Colorado.

Bottom line: Defrag is a conference about solving the “augmentation” of how we turn loads of information into layers of knowledge; about the “aha” moment of the brainstorm. As such, it encompasses many technologies we’re all familiar with (wikis, blogs, search) and many new, developing technologies (context, relevance, next-level discovery) — and tries to see them all through a new prism.

I’ll be there, so if you’re attending — be sure and say Hello!

10.11.07

Defrag Conference

Posted in Defrag, Denver, ImplicitWeb at 10:17 am by elliot

If you haven’t heard already, a new conference on Implicit Web topics will be held Nov. 5-6, in Denver, Colorado:

Defrag is the first conference focused solely on the internet-based tools that transform loads of information into layers of knowledge, and accelerate the “aha” moment. Defrag is about the space that lives in between knowledge management, “social” networking, collaboration and business intelligence. Defrag is not a version number. Rather it’s a gathering place for the growing community of implementers, users, builders and thinkers that are working on the next wave of software innovation.

This conference is being organized by Eric Norlin, who has been blogging on implicit web topics for a while now. I’ve started getting to know Eric via e-mail and look forward to meeting him and other Implicit Web folks in person at this conference.

If you haven’t registered for Defrag already, do it here. See everyone at the conference!

09.21.07

Mashups in the Middle – Bridging the Gap to the Semantic Web

Posted in ImplicitWeb, Mashups, SemanticWeb at 9:45 am by elliot

Yesterday brought an enlightening post by Alex Iskold, entitled “Top-Down: A New Approach to the Semantic Web“:

“While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve because of the engineering, scientific and business challenges. The lack of specific and simple consumer focus makes it mostly an academic exercise.”

The post touches upon some of the practical issues keeping semantic technology out of the hands of end-users, and potential ways around these roadblocks. Summaries are given for three top-down “mechanisms” that may provide workarounds to some issues:

  • Leveraging Existing Information
  • Using Simple / Vertical Semantics
  • Creating Pragmatic / Consumer-Centric Apps.

I can’t agree more with the underlying principle of this post: top-down approaches are necessary in order to expose end-users to semantic search & discovery (at least in the near-term).

However, this isn’t to say that there isn’t value in bottom-up semantic web technologies like RDF, OWL, etc. On the contrary, these technologies can provide extremely high quality data, such as categorization information. In the past year, there’s been significant growth in the amount of bottom-up data that’s available. This includes things like the RDF conversion of Wikipedia structured data (DBpedia), the US Census, and other sources. Indeed, the “W3C Linking Open Data” project is working on interlinking these various bottom-up sources, further increasing their value for semantic web applications. What’s the point of all this data collection/linking? “It’s all about enabling connections and network effects.”

My personal feeling is that neither a bottom-up or top-down approach will attain complete “success” in facilitating the semantic web. Top-down approaches are good enough for some applications, but sometimes generate dirty results (incorrect categorizations, etc.) Bottom-up approaches can generate some incredible results when operating within a limited domain, but can’t deal with messy data. What’s needed is a “bridging the gap” between the two modes: leveraging top-down approaches for initial dirty classification, and incorporating cleaner bottom-up sources when they’re available.

So how do we bridge the gap? Here’s what I’m betting on: Process-oriented, or agent-based mashups. These sit between the top-down and bottom-up stacks, filtering/merging/sorting/classifying information. More on this soon.

09.17.07

Company Website / Tech. Preview Launch

Posted in ImplicitWeb, Mashups, Orchestr8 at 11:49 am by elliot

My company (Orchestr8) launched its corporate website today:

We’re letting a select number of individuals take a “first look” at our AlchemyPoint platform in the form of a Technology Preview. We’re looking for user feedback to help us expand and improve the AlchemyPoint system before its final release.

Access to the Technology Preview is being provided in a staged fashion with priority given to users who apply first, so sign up today!

Please note: The AlchemyPoint Technology Preview is pre-Beta software. It may contain software bugs or other limitations.

08.23.07

The Cut-and-Paste Web

Posted in Clipping, ImplicitWeb at 1:59 pm by elliot

A few days ago I noticed a great quote from Steve Rubel @ Micro Persuasion:

“Imagine for a moment that you can take any piece of online content that you care about – a news feed, an image, a box score, multimedia, a stream of updates from your friends – and easily pin it wherever you want.”

[...snip...]

“This isn’t some far off vision. It’s the near-term future. It’s the coming era of the Cut and Paste Web.”

It’s exciting to see discussion on this topic, as this is something my company has been working towards for some time now. Our AlchemyPoint mashup platform enables the visual cutting and pasting of web content, even dynamic content (like search results). “Clipped” content can be inserted anywhere — into your home page or blog, Google results pages, CNN articles, etc.

Below are several screencasts that illustrate cut-and-paste clipping of web content:

Adding Yahoo Image Search Results into Google Search Results

Integrating the Google News Top-Story into the Rocky Mountain News Homepage

These screencasts illustrate two things:

  1. Grabbing content from a page via the mouse, and storing it in a “Clipboard” for later reuse.
  2. Inserting content into a new page, selecting from the available “Clipboard” of previously grabbed content.

Using this methodology one can clip any arbitrary piece of web content (images, articles, headlines, blog posts, etc.) and insert it into any other web page. It’s worth noting that this process occurs almost entirely using the mouse; the only keyboard interaction required involves typing out a name to identify the clipped content.

On a technical level, cutting and pasting web content is difficult; one cannot simply grab and re-insert raw HTML fragments into web pages. There are a number of hurdles to overcome in order to perform these types of manipulations reliably. A few items that must be considered include: relative URL links, CSS content, Javascript, name/class/id conflicts between a web page and any pasted content, character set differences, how remote servers deal with Referrer headers, etc. We’ve had a good time working out solutions to these issues and others not mentioned above.

For those interested in playing around with cutting-and-pasting web content, we’re going to be opening up invitatations to our AlchemyPoint Technology Preview in the next few weeks. This preview supports the ability to perform all sorts of web manipulations, cut-and-paste of web content being just one example.

06.18.07

Information Retrieval & the Implicit Web

Posted in ImplicitWeb, Mashups at 2:54 pm by elliot

One of the central themes surrounding the Implicit Web is the power of the electronic footprint.

Wherever we go, we leave footprints. In the real world, these are quickly washed away by erosion and other natural forces. The electronic world, however, is far, far different: footprints often never disappear. Every move we make online, every bit of information we post, every web link we click, can be recorded.

The Implicit Web is about leveraging this automatically-recorded data to achieve new and useful goals.

One area that’s particularly exciting to me is the utility provided by merging implicit data collection/analysis and automatic information retrieval.

The folks at Lijit have done some pretty interesting work in this arena, with their “Re-Search” capability. Using some Javascript magic, Lijit Re-Search detects when you visit a web blog via an Internet search. For example, if I visit my blog via a Google search for “implicit web,” Re-Search activates. My original Google query is then utilized to look up additional related content via the Lijit Network of trusted information sources. Any discovered content is then automatically shown on-screen.

Neat stuff! I love the idea of “re-searching” automatically, leveraging an Internet user’s original search query.

A few days ago I decided to mess around with this “re-search” idea and ended up with something that I’ve been calling “pre-search.”

Pre-search is the concept of preemptive search, or retrieving information before a user asks for it (or even knows to ask). This idea can be of particular use with blogs and other topical information sources.

I created two basic pre-search mashups for Feedburner-enabled blogs, using the Feedburner FeedFlare API:

Both of these are pretty straightforward, doing the following:

1. For every blog post, use the Yahoo Terms Extraction API to gather ‘key terms’ from the post title.

2. Use Google Blog Search or Lijit Network Search to find related content for the previously extracted ‘key terms.’

3. Formulate the top three results into a clickable link and show them below the blog post.

This results in the automatic display of related content for a given blog post, using a combination of content analysis (on the blog post title) and information retrieval (Lijit/Google Blog Search).

I’ve enabled both Google and Lijit Pre-Search on my blog. You can add them to your Feedburner-enabled blog by visiting here and here.

06.15.07

Content Analysis & the Implicit Web

Posted in Attention, ImplicitWeb, Mashups at 3:46 pm by elliot

Implicit, automatic, passive, or whatever you call it, this form of content analysis is starting to be recognized as a powerful tool in both a business and consumer/prosumer context. Companies like Adaptive Blue are using automatic content analysis techniques to personalize and improve the consumer shopping experience, while others like TapeFailure are leveraging this technology to enable more powerful web analytics.

Content analysis takes clickstream processing one step further, providing a much deeper level of insight into user activity on the Web. By peering into web page content (in the case of Adaptive Blue) or user behavioral data (as with TapeFailure), all sorts of new & interesting capabilities can be provided to both end-users and businesses. One capability that I’ll focus on in this post is automatic republishing.

Automatic republishing is the process of taking some bit of consumed/created information (a web page, mouse click, etc.) and leveraging it in a different context.

Let me give an example:

I read Slashdot headlines. Yes, I know. Slashdot is old-hat, Digg is better. Yadda-yadda. That’s beside the point of this example. :)

Note that I said “I read Slashdot headlines.” This doesn’t include user comments. There’s simply too much junk. Even high-ranked posts are often not worthy of reading or truly relevant. But alas, there is some good stuff in there — if you have the time to search it out. I don’t.

So this is a great example of a situation where passive/implicit content analysis can be extremely useful. Over the course of building and testing my company’s AlchemyPoint mashup platform, I decided to play with this particular example to see what could be done.

What I was particularly interested in addressing related to the “Slashdot comments problem” was the ability to extract useful related web links from the available heap of user comments. Better yet, I wanted to be able to automatically bookmark these links for later review (or consumption in an RSS reader), generating appropriate category tags without any user help.

What I ended up with was a passive content analysis mashup that didn’t modify my web browsing experience in any way, but rather just operated in the background, detecting interactions with the Slashdot web site.

When it sees me reading a Slashdot article, it scans through the story’s user comments looking for those that meet my particular search criteria. In this case, it is configured to detect any user comment that has been rated 4+ and labeled “Informative” or “Interesting.”

Upon finding user comments that match the search criteria, the mashup then searches the comment text for any URL links to other web sites. It then passively loads these linked pages in the background, extracting both the web page title and any category tags that were found. If the original Slashdot article was given category tags, these also are collected.

The mashup then uses the del.icio.us API to post these discovered links to the web, “republishing them” for future consumption.

Using an RSS reader (Bloglines), I subscribe to the del.icio.us feed generated by this mashup. This results in a filtered view of interesting/related web links appearing in the newsreader shortly after I click on a Slashdot story via my web browser.

This is a fairly basic example of content analysis in a user context, but does prove to be interesting because the entire process (filtering user comments, harvesting links from comment text, crawling any discovered links, extracting title/tag information from linked pages, and posting to del.icio.us) happens automatically, with no user intervention.

I think we will see this this type of automatic prefiltering/republishing become increasingly prevalent as developers and Internet companies continue to embrace “implicit web” data-gathering techniques.

« Previous entries Next Page » Next Page »