Here’s a fun little demo app myself and a co-worker built:
Thisapplication leverages the MS Kinect to manipulate 3d visualizations of social media data. The application tracks 3d motion of a person’s hand, using it as a virtual mouse cursor.
Social media data mined from tens of millions of news articles and blog posts over a period of 1+ month, using natural language processing algorithms to analyze article/blog contents, identify named entities and trends, and track momentum over time.
Info on this app:
real-time 3d visualization of social media data, represented as a force-directed-graph.
social media data was mined from tens of millions of news articles and blog posts over a 1+ month period.
news / blog data analyzed using natural language processing (NLP) algorithms including: named entity extraction, keyword extraction, concept tagging, sentiment extraction.
high-performance temporal data-store enables visualization of connections between named entities (eg, “Nicolas Sarkozy -> Francois Hollande”)
system tracks billions of data-points (persons, companies, organizations, …) for tens of millions of pieces of content.
This is an example “20% time” employee project at my company, AlchemyAPI. We do fun projects like this to spur the imagination and as a creative diversion. Other projects (which I’ll get around to posting at some point) involve speech recognition, robots, and other geektacular stuff.
I’ve just added a new sidebar widget to my blog: “Related Content”.
This is a demonstration of “contextual widgets” from my company’s AlchemyGrid service.
Contextual widgets utilize a custom-engineered “statistical topic keyword extraction from Natural Language Text” facility we’ve recently integrated into our products. If you’re familiar with the “Yahoo Term Extraction” API, our system is essentially doing the same sort of stuff. Natural language processing is fun (and challenging) stuff. Here’s a few notes regarding our implementation:
1. AlchemyGrid’s Term Extraction facility supports multiple languages (English, German, French, Italian, Spanish, and Russian!). This was an important requirement for us, to enable contextual content generation for non-English websites/blogs. There are significant differences between languages in terms of punctuation rules, word stemming, and other details. Hats off to our Term Extraction developers, you’ve done a great job ensuring good initial language coverage.
2. Our Term Extraction facility is entirely statistical in its basis, not using a hard-coded lexicon, etc. This enables it to extract contextually-relevant topic keywords even when they’re (a) new topics, (b) rarely used common nouns/people-names, or (c) misspelled.
We’ve just integrated Term Extraction into our Grid service, so there may be a few minor kinks to work out in the coming weeks — but overall we’re happy with the initial results. Contextual capability vastly expands the utility of AlchemyGrid widgets, as their content can now be automatically customized to relate to your content. This applies to *any* input-enabled widget in the grid (ALL widgets are contextual). Here’s another contextual example (a related Amazon book):
We’ll be enabling the other supported languages in coming weeks, as well as rolling out some additional enhancements to our text processing algorithms (for the geeks in the audience, enhancements to our sentence boundaries detector, inline punctuation processor, etc.).