05.11.12

3d Visualization of Semantic Data using MS-Kinect

Posted in Coding, NLP, Personal, SemanticWeb at 1:02 pm by elliot

Here’s a fun little demo app myself and a co-worker built:

This application leverages the MS Kinect to manipulate 3d visualizations of social media data. The application tracks 3d motion of a person’s hand, using it as a virtual mouse cursor.

Social media data mined from tens of millions of news articles and blog posts over a period of 1+ month, using natural language processing algorithms to analyze article/blog contents, identify named entities and trends, and track momentum over time.

Info on this app:

  • real-time 3d visualization of social media data, represented as a force-directed-graph.
  • social media data was mined from tens of millions of news articles and blog posts over a 1+ month period.
  • news / blog data analyzed using natural language processing (NLP) algorithms including: named entity extraction, keyword extraction, concept tagging, sentiment extraction.
  • high-performance temporal data-store enables visualization of connections between named entities (eg, “Nicolas Sarkozy -> Francois Hollande”)
  • system tracks billions of data-points (persons, companies, organizations, …) for tens of millions of pieces of content.

This is an example “20% time” employee project at my company, AlchemyAPI. We do fun projects like this to spur the imagination and as a creative diversion. Other projects (which I’ll get around to posting at some point) involve speech recognition, robots, and other geektacular stuff.

ジャパンシーフーズ
Mgid

03.27.09

AlchemySnap – OCR+Photo Search for TMobile G1

Posted in API, Coding, Contextual, NLP, Technology, Twitter at 11:56 am by elliot

Here’s a demo app I created for the T-Mobile G1, to show off my company’s AlchemyAPI image / text mining infrastructure service.

Watch the video for more info:

05.20.08

Automated Content Alerting

Posted in Coding, ImplicitWeb, Orchestr8, Protocols at 10:15 am by elliot

Too many websites. Too little syndication.

This is a theme that’s been bouncing around my subconscious for months; something I’ve blogged about in the past.

But really, syndication is only part of the problem. Syndication normalizes data, and makes it readily accessible to 3rd parties — but it doesn’t push data where you want it. It’s a pull-focused technology.

For push, we need some sort of alerting capability.

Recently, I’ve been in the habit of checking delegate counters for the 2008 Presidential Election Primary Races. I check them daily; seeing updates to pledged delegates, super-delegates, etc.

Checking for updates doesn’t take a significant amount of time, but it’s yet another activity that can conceivably interrupt my work flow. Leveraging of automation would be a much better way to do this.

Recently, my company added Alerts capability to our AlchemyGrid beta service. You can create an alert based on anything — any sort of web content (syndicated or not). Alerts can travel over many communications mediums (Email, AIM, SMS, Twitter, etc.). They support lots of customization options (regarding how often to check for updates, what’s considered a “unique update”, etc.).

I used this new service to create an Alert that monitors delegate counts for the Democratic Presidential candidates (GOP has already chosen their candidate). Any updates to delegate counts are automatically posted to the twitter account “demdelegate08″. You can see the Twitter feed (and follow it if you wish) here:

http://twitter.com/demdelegate08

If you aren’t a Twitter user, and are interested in getting delegate updates via Email, SMS, or AIM, you can “Subscribe to / Follow” my Alert here:

DCW Delegate Alert

A few implementation notes, for the geeks out there: We’re using a custom-engineered AIM, Email, and SMS backend for our Alerts implementation. We’re interacting with external services directly at the Protocol/API level, not piggy-backing off 3rd-party gateways or using other unreliable modes of communication.

12.31.07

RFC2396: Per-Path-segment URI Parameters.

Posted in Coding, Protocols at 4:22 pm by elliot

OK, just a short rant.

RFC2396 specifies support for “URI parameters” within segments of a URI “path”.  For those of you who don’t enjoy reading RFCs with your morning coffee, URI parameters are the “;SESSION=f12aa” looking thingys that sometimes appear in HTTP URLs.  Yahoo and some other well-known websites use them, but most don’t.

I see the usefulness of URI parameters, but making them allowed on a per-segment basis just seems like overkill to me.  I’ve also seen no production environment that makes use of per-segment parameters (if you have, let me know!).

With per-segment URI params, you can do lovely stuff like this:

/foo;blah=x/bar;blah2=y/abc.txt;blah3=z

Which converts down to an actual URI path of:

/foo/bar/abc.txt (params: blah=x, blah2=y, blah3=z)

sheesh!

Looking through URI parsing routines in some common open source code, it seems not everybody is handling this scenario.  It’s stated as allowed, though not really expanded upon with any meaningful examples, in RFC2396 (Section 3.3).  Many folks who are parsing URI params seem to be assuming they can only appear at the last URI segment, but this is definitely not the case.

These sort of encoding/decoding tediums remind me of my days building Network Intrusion Detection Systems, where even a slight encoding error can have truly disastrous results (see the ’98 Ptacek/Newsham paper on NIDS Insertion/Evasion for the reasons why).  I’ve gone ahead and implemented per-path-segment URI parameter support in our AlchemyPoint core code, but I shudder to think of how many HTTP application layer proxies, firewalls, and NIDS systems aren’t handling this sort of thing correctly.