Here’s a fun little demo app myself and a co-worker built:
Thisapplication leverages the MS Kinect to manipulate 3d visualizations of social media data. The application tracks 3d motion of a person’s hand, using it as a virtual mouse cursor.
Social media data mined from tens of millions of news articles and blog posts over a period of 1+ month, using natural language processing algorithms to analyze article/blog contents, identify named entities and trends, and track momentum over time.
Info on this app:
real-time 3d visualization of social media data, represented as a force-directed-graph.
social media data was mined from tens of millions of news articles and blog posts over a 1+ month period.
news / blog data analyzed using natural language processing (NLP) algorithms including: named entity extraction, keyword extraction, concept tagging, sentiment extraction.
high-performance temporal data-store enables visualization of connections between named entities (eg, “Nicolas Sarkozy -> Francois Hollande”)
system tracks billions of data-points (persons, companies, organizations, …) for tens of millions of pieces of content.
This is an example “20% time” employee project at my company, AlchemyAPI. We do fun projects like this to spur the imagination and as a creative diversion. Other projects (which I’ll get around to posting at some point) involve speech recognition, robots, and other geektacular stuff.
This is a theme that’s been bouncing around my subconscious for months; something I’ve blogged about in the past.
But really, syndication is only part of the problem. Syndication normalizes data, and makes it readily accessible to 3rd parties — but it doesn’t push data where you want it. It’s a pull-focused technology.
For push, we need some sort of alerting capability.
Recently, I’ve been in the habit of checking delegate counters for the 2008 Presidential Election Primary Races. I check them daily; seeing updates to pledged delegates, super-delegates, etc.
Checking for updates doesn’t take a significant amount of time, but it’s yet another activity that can conceivably interrupt my work flow. Leveraging of automation would be a much better way to do this.
Recently, my company added Alerts capability to our AlchemyGrid beta service. You can create an alert based on anything — any sort of web content (syndicated or not). Alerts can travel over many communications mediums (Email, AIM, SMS, Twitter, etc.). They support lots of customization options (regarding how often to check for updates, what’s considered a “unique update”, etc.).
I used this new service to create an Alert that monitors delegate counts for the Democratic Presidential candidates (GOP has already chosen their candidate). Any updates to delegate counts are automatically posted to the twitter account “demdelegate08″. You can see the Twitter feed (and follow it if you wish) here:
A few implementation notes, for the geeks out there: We’re using a custom-engineered AIM, Email, and SMS backend for our Alerts implementation. We’re interacting with external services directly at the Protocol/API level, not piggy-backing off 3rd-party gateways or using other unreliable modes of communication.
RFC2396 specifies support for “URI parameters” within segments of a URI “path”. For those of you who don’t enjoy reading RFCs with your morning coffee, URI parameters are the “;SESSION=f12aa” looking thingys that sometimes appear in HTTP URLs. Yahoo and some other well-known websites use them, but most don’t.
I see the usefulness of URI parameters, but making them allowed on a per-segment basis just seems like overkill to me. I’ve also seen no production environment that makes use of per-segment parameters (if you have, let me know!).
With per-segment URI params, you can do lovely stuff like this:
Looking through URI parsing routines in some common open source code, it seems not everybody is handling this scenario. It’s stated as allowed, though not really expanded upon with any meaningful examples, in RFC2396 (Section 3.3). Many folks who are parsing URI params seem to be assuming they can only appear at the last URI segment, but this is definitely not the case.
These sort of encoding/decoding tediums remind me of my days building Network Intrusion Detection Systems, where even a slight encoding error can have truly disastrous results (see the ’98 Ptacek/Newsham paper on NIDS Insertion/Evasion for the reasons why). I’ve gone ahead and implemented per-path-segment URI parameter support in our AlchemyPoint core code, but I shudder to think of how many HTTP application layer proxies, firewalls, and NIDS systems aren’t handling this sort of thing correctly.