Mashup fans, rejoice! Microsoft has opened up access to their PopFly mashup beta.
Techcrunch has coverage here, CNET has coverage here, O’Reilly Radar has a post here, Mashable here, Webpronews here, and more is sure to come. I’ve also blogged about Popfly in the past.
Popfly should do great things for increasing consumer awareness of mashup technology. It also has a great name. [Though, I really prefer "Microsoft Visual Mashup Creator Express, October 2007 Community Tech Preview Internets Edition"]
One item of particular interest is the ability for developers to code new Popfly ‘modules’ using Microsoft Visual Studio. That’s a great feature, indeed. It’s somewhat a shame, however, that Microsoft has tied this software so heavily to MS tools and runtimes. Requiring MS Visual Studio for in-depth development and for your users to all have the Silverlight run-time isn’t ideal, but regardless — Popfly is pretty cool stuff!
I’m looking forward to getting my hands on Popfly and coding up some AlchemyPoint modules to allow for really fun back-end processing. Combining these two systems should provide for some interesting possibilities.
Techcrunch has posted an interesting write-up on a new feature in OS X Leopard: Web Clipping
It’s great seeing innovative companies like Apple embracing web clipping technology. I’m a big believer in the “cut-and-paste web” and my company has been working for some time now to make this concept a reality.
I’ve blogged on this subject previously, discussing some of the technical hurdles that must be overcome to reliably clip arbitrary web content. Regarding clipping in Leopard: Apple’s solution is somewhat limited in that it only displays clipped content in a mini-browser; it isn’t capable of inserting clipped content into other web pages or applications.
For those interested in seeing mouse-based clipping of web content in action, check out any of these screencasts:
Clipping a 10-day Weather Forecast and Inserting It Into Another Webpage
Clipping Search Results from Yahoo News and Integrating Into Google Search Results
A tutorial on how to clip web content is also available here.
Yesterday brought an enlightening post by Alex Iskold, entitled “Top-Down: A New Approach to the Semantic Web“:
“While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve because of the engineering, scientific and business challenges. The lack of specific and simple consumer focus makes it mostly an academic exercise.”
The post touches upon some of the practical issues keeping semantic technology out of the hands of end-users, and potential ways around these roadblocks. Summaries are given for three top-down “mechanisms” that may provide workarounds to some issues:
- Leveraging Existing Information
- Using Simple / Vertical Semantics
- Creating Pragmatic / Consumer-Centric Apps.
I can’t agree more with the underlying principle of this post: top-down approaches are necessary in order to expose end-users to semantic search & discovery (at least in the near-term).
However, this isn’t to say that there isn’t value in bottom-up semantic web technologies like RDF, OWL, etc. On the contrary, these technologies can provide extremely high quality data, such as categorization information. In the past year, there’s been significant growth in the amount of bottom-up data that’s available. This includes things like the RDF conversion of Wikipedia structured data (DBpedia), the US Census, and other sources. Indeed, the “W3C Linking Open Data” project is working on interlinking these various bottom-up sources, further increasing their value for semantic web applications. What’s the point of all this data collection/linking? “It’s all about enabling connections and network effects.”
My personal feeling is that neither a bottom-up or top-down approach will attain complete “success” in facilitating the semantic web. Top-down approaches are good enough for some applications, but sometimes generate dirty results (incorrect categorizations, etc.) Bottom-up approaches can generate some incredible results when operating within a limited domain, but can’t deal with messy data. What’s needed is a “bridging the gap” between the two modes: leveraging top-down approaches for initial dirty classification, and incorporating cleaner bottom-up sources when they’re available.
So how do we bridge the gap? Here’s what I’m betting on: Process-oriented, or agent-based mashups. These sit between the top-down and bottom-up stacks, filtering/merging/sorting/classifying information. More on this soon.
My company (Orchestr8) launched its corporate website today:
We’re letting a select number of individuals take a “first look” at our AlchemyPoint platform in the form of a Technology Preview. We’re looking for user feedback to help us expand and improve the AlchemyPoint system before its final release.
Access to the Technology Preview is being provided in a staged fashion with priority given to users who apply first, so sign up today!
Please note: The AlchemyPoint Technology Preview is pre-Beta software. It may contain software bugs or other limitations.
Here’s another FeedFlare mashup that I put together for Feedburner-enabled blogs:
This mashup utilizes Yahoo’s Term Extraction API to extract ‘key phrases’ from blog post titles, then uses Google’s Cross-Language Search to locate related content in the desired foreign language (select from 15 available languages). Discovered ‘related content’ results are then displayed below each blog post as a clickable link, with any text displayed as automatically translated English. For example, if you are a Chinese student living in the USA, you can integrate automatic Chinese-language pre-searches into your English-language blog.
On a side note, congratulations to my friends over at Cerulean Studios, whose Trillian Instant Messenger product was a winner in the Communications category of the recent WebWare 100 Awards (ranked #2, beaten only by GMail!). Speaking of this, does anyone know why the WebWare folks took down their individual product rankings for each category? These category rankings were up a few days ago, but now they’re gone and all winners have been rearranged in alphabetic order. Considering that this is a listing of Awards winners, I don’t really see the utility of an alphabetic ordering.
One of the central themes surrounding the Implicit Web is the power of the electronic footprint.
Wherever we go, we leave footprints. In the real world, these are quickly washed away by erosion and other natural forces. The electronic world, however, is far, far different: footprints often never disappear. Every move we make online, every bit of information we post, every web link we click, can be recorded.
The Implicit Web is about leveraging this automatically-recorded data to achieve new and useful goals.
One area that’s particularly exciting to me is the utility provided by merging implicit data collection/analysis and automatic information retrieval.
Neat stuff! I love the idea of “re-searching” automatically, leveraging an Internet user’s original search query.
A few days ago I decided to mess around with this “re-search” idea and ended up with something that I’ve been calling “pre-search.”
Pre-search is the concept of preemptive search, or retrieving information before a user asks for it (or even knows to ask). This idea can be of particular use with blogs and other topical information sources.
I created two basic pre-search mashups for Feedburner-enabled blogs, using the Feedburner FeedFlare API:
Both of these are pretty straightforward, doing the following:
1. For every blog post, use the Yahoo Terms Extraction API to gather ‘key terms’ from the post title.
2. Use Google Blog Search or Lijit Network Search to find related content for the previously extracted ‘key terms.’
3. Formulate the top three results into a clickable link and show them below the blog post.
This results in the automatic display of related content for a given blog post, using a combination of content analysis (on the blog post title) and information retrieval (Lijit/Google Blog Search).
I’ve enabled both Google and Lijit Pre-Search on my blog. You can add them to your Feedburner-enabled blog by visiting here and here.
Implicit, automatic, passive, or whatever you call it, this form of content analysis is starting to be recognized as a powerful tool in both a business and consumer/prosumer context. Companies like Adaptive Blue are using automatic content analysis techniques to personalize and improve the consumer shopping experience, while others like TapeFailure are leveraging this technology to enable more powerful web analytics.
Content analysis takes clickstream processing one step further, providing a much deeper level of insight into user activity on the Web. By peering into web page content (in the case of Adaptive Blue) or user behavioral data (as with TapeFailure), all sorts of new & interesting capabilities can be provided to both end-users and businesses. One capability that I’ll focus on in this post is automatic republishing.
Automatic republishing is the process of taking some bit of consumed/created information (a web page, mouse click, etc.) and leveraging it in a different context.
Let me give an example:
I read Slashdot headlines. Yes, I know. Slashdot is old-hat, Digg is better. Yadda-yadda. That’s beside the point of this example.
Note that I said “I read Slashdot headlines.” This doesn’t include user comments. There’s simply too much junk. Even high-ranked posts are often not worthy of reading or truly relevant. But alas, there is some good stuff in there — if you have the time to search it out. I don’t.
So this is a great example of a situation where passive/implicit content analysis can be extremely useful. Over the course of building and testing my company’s AlchemyPoint mashup platform, I decided to play with this particular example to see what could be done.
What I was particularly interested in addressing related to the “Slashdot comments problem” was the ability to extract useful related web links from the available heap of user comments. Better yet, I wanted to be able to automatically bookmark these links for later review (or consumption in an RSS reader), generating appropriate category tags without any user help.
What I ended up with was a passive content analysis mashup that didn’t modify my web browsing experience in any way, but rather just operated in the background, detecting interactions with the Slashdot web site.
When it sees me reading a Slashdot article, it scans through the story’s user comments looking for those that meet my particular search criteria. In this case, it is configured to detect any user comment that has been rated 4+ and labeled “Informative” or “Interesting.”
Upon finding user comments that match the search criteria, the mashup then searches the comment text for any URL links to other web sites. It then passively loads these linked pages in the background, extracting both the web page title and any category tags that were found. If the original Slashdot article was given category tags, these also are collected.
The mashup then uses the del.icio.us API to post these discovered links to the web, “republishing them” for future consumption.
Using an RSS reader (Bloglines), I subscribe to the del.icio.us feed generated by this mashup. This results in a filtered view of interesting/related web links appearing in the newsreader shortly after I click on a Slashdot story via my web browser.
This is a fairly basic example of content analysis in a user context, but does prove to be interesting because the entire process (filtering user comments, harvesting links from comment text, crawling any discovered links, extracting title/tag information from linked pages, and posting to del.icio.us) happens automatically, with no user intervention.
I think we will see this this type of automatic prefiltering/republishing become increasingly prevalent as developers and Internet companies continue to embrace “implicit web” data-gathering techniques.
There’s been a lot of buzz recently surrounding the “Implicit Web” concept, something I’ve blogged about in the past.
ReadWriteWeb has a great write-up on the subject, with case studies focusing on several popular websites that incorporate implicit data collection techniques (Last.fm, etc.).
Even more interesting is a “live attention-stream” viewer created by Stan James of Lijit. This neat little webapp utilizes clickstream data gathered by the Cluztr social browsing plugin, allowing Internet users to “follow along” with another user’s web browsing session.
This and other recent work on leveraging implicit data-flows is pretty exciting stuff, and we’re really only starting to scratch the surface as to what’s possible.
I’ve been toying around with implicit data gathering techniques for the last six months or so, using my company’s AlchemyPoint platform to gain access to clickstreams and other information. Because the AlchemyPoint system operates as a transparent proxy-server, it makes it easy to build simple analysis/data-mining applications that “jack in” to web browsing, email, and instant messaging activity.
So what’s possible if you’re “jacked in”? Let’s start with something very basic: gathering statistics on the usage of various web sites.
Above is a snippit from something I’ve been calling a dash-up. So what’s a dash-up?
Think dashboard + mash-up.
Essentially, a dash-up is a presentation-level mashup that collects data from multiple sources and presents it in a useful graphical dashboard view (in this case, dynamically updating activity charts). The above screenshot is showing both a general web traffic history, and more detailed statistics on my music-listening activity on the popular Internet Radio web site, Pandora.com.
OK, statistics-gathering is kinda neat, but what about something more useful? One of my favorites is passive tagging or implicit blogging activity. Stay tuned for my next post, which will detail some of the ways passive/implicit data collection enhances (through filtering, tagging, etc.) the Internet sites I use on a daily basis.
OK, I just had to re-post the above comment from a TechCrunch preview of Microsoft’s Popfly — it made me laugh.
Taking a quick look at the Popfly screencast, I’m totally impressed. This looks like a super application and a great compliment to existing tools within the Mashup market space.
I especially like the ability to construct custom modules and integrate them into Popfly creations. At Orchestr8 we can’t wait to get our hands on this app, to try out building some fun presentation-level mashups that incorporate our AlchemyPoint Server for doing behind-the-scenes processing. Popfly’s incredible UI should be a great compliment to some of the fun stuff we’re doing on the backend.
Mashups just keeps getting more interesting! I’m still waiting for Google to get its feet wet, now that both Yahoo and Microsoft have officially entered the pool.
Over the past few days I’ve seen the term “Intelligence Amplification” thrown around quite a bit. This is an interesting technology theme that seems to be progressing under a variety of names: Dynamics of Information, Digital Cortex, Context Aggregators, etc. I like the “Intelligence Amplification” name; I’ve been referring to this same theme around the office as “Contextual Information Awareness,” but it never really stuck.
What is this stuff all about, anyway? Ryan McIntyre describes it as:
software that lets computers do what they excel at (fast computation, “perfect memory”, gruelingly repetitive tasks, statistical analysis, etc) [while leveraging] what humans are far better at (face recognition, voice recognition, any cognition, matters of cultural discernment, language generation, etc).
David Henderson also strikes at the core of this subject, under the “Context Aggregation” moniker:
Web 3.0, the semantic web, Intelligence Amplification, Return on Attention, the wisdom of your crowd, context aggregators, whatever you want to call it is the new king! And the new king works on behalf of the user not the content creator or distributor! And the new king will use ALL meta data, implicit and explicit, about users interactions (attention and intentions) with people, services and content (created and consumed), not just metadata around siloed content to supply the value.
I love this stuff — automated mechanisms that utilize meta-data, click-streams, tags or any other information to keep the user better informed as they use a computer. This is the core of what my company has been working on for some time now. Here’s an example of two mashups we’ve created during product testing that might fall into this Intelligence Amplification theme:
Popup-Politicians – a widget from Sunlight Labs that adds popup mini-profiles for Members of Congress to your blog. A neat tool, but only useful if a blog author or web-site creator actually chooses to embed it in their site. Here’s a modified version of the widget that lets you see information on Members of Congress for any web-site you visit (CNN, blogs, etc.):
I find this modified widget useful when reading political news to check on the campaign contributions and voting records for those individuals quoted in articles, etc. Mashups like these are great because they only bring information to the surface when it’s contextually relevent to the user. Price comparison widgets and similar tools also fit into the same category.
Here’s another example of a mashup that I think fits into the IA theme:
Menu-izer – sorts/colors/rewrites a restaurant menu based on a user’s preferences and allergies. In this scenario, I’m telling the system that my favorite ingredient is tomatoes and that I’m allergic to nuts:
Menu-izer is something I’ve been using around the office as an example of a “specialized purpose” or “single use” mashup. These likely hold value only for the person using them, and should be capable of being created extremely quickly and being thrown away after being used. Use of specialized IA mashups like Menu-izer and more generalized ones (such as Pop-up Politicians) is likely to increase as better tools for creating, manipulating and sharing mashups are made available — especially if these tools can be non-intrusively integrated into a user’s existing habits.