Here’s a fun little demo app myself and a co-worker built:
Thisapplication leverages the MS Kinect to manipulate 3d visualizations of social media data. The application tracks 3d motion of a person’s hand, using it as a virtual mouse cursor.
Social media data mined from tens of millions of news articles and blog posts over a period of 1+ month, using natural language processing algorithms to analyze article/blog contents, identify named entities and trends, and track momentum over time.
Info on this app:
real-time 3d visualization of social media data, represented as a force-directed-graph.
social media data was mined from tens of millions of news articles and blog posts over a 1+ month period.
news / blog data analyzed using natural language processing (NLP) algorithms including: named entity extraction, keyword extraction, concept tagging, sentiment extraction.
high-performance temporal data-store enables visualization of connections between named entities (eg, “Nicolas Sarkozy -> Francois Hollande”)
system tracks billions of data-points (persons, companies, organizations, …) for tens of millions of pieces of content.
This is an example “20% time” employee project at my company, AlchemyAPI. We do fun projects like this to spur the imagination and as a creative diversion. Other projects (which I’ll get around to posting at some point) involve speech recognition, robots, and other geektacular stuff.
I’ve always been fascinated by data — both of the companies I’ve founded have addressed aspects of the “data overload” problem. The first, MimeStar, developed NIDS (Network Intrusion Detection System) technology that analyzed gigabits of network traffic every second, reconstructing every IP frame, TCP session, and application-layer protocol stream — looking for computer intrusions and other inappropriate activity. MimeStar was acquired in early 2000 and our products are still protecting government and corporate networks 10 years later. NIDS is fascinating technology, reducing massive packet flows down to intelligible event/activity streams & security alerts.
My present company builds natural language processing (computational linguistics) technology to make sense of the huge quantities of unstructured text residing across the web and within company data warehouses. We’re helping build the semantic web, by “bootstrapping” unstructured content into a form that is understandable by machines. NLP is an exciting space, with real disruption potential. It’s becoming a critical technology for Semantic & Web 3.0 applications/services.
What’s that? You haven’t heard of the Semantic Web? Check out this fantastic video, created by Kate Ray of NYU. Her short documentary does a great job of summing up many of the drivers behind the Semantic Web (such as data overload), and touches upon many of the future applications of this technology.
If disruptive innovation, artificial intelligence, and Web 3.0 are your bread-and-butter, AlchemyAPI is currently hiring. We’re based in Denver, CO and are growing rapidly. Join our team and help build the next generation of semantic technology!
APML, a new standardized format for expressing attention preferences, has been receiving a lot of buzz in recent weeks. Mashable covers the topic here, Brad Feld here, Jeff Nolan here, and Read/WriteWeb here.
It’s great to see an increasing number of folks getting behind the concept of ‘standardized structured attention’ and embracing this emerging standard.
Attention has always been a topic of interest to me, something I’ve blogged about in the past, on a number of occasions. At my company Orchestr8, we’ve been working on solutions that can automatically capture the ‘context’ of a user’s attention and leverage this data in various ways. We’re currently implementing APML support into the next version of our software, which should provide for some really interesting capabilities.
The thing that excites me about APML is that it’s a relatively straight-forward standard (far, far simpler than the many RSS/ATOM variants). This will ease adoption and simplify portability of attention preference data across many products / services. Since APML expresses attention in a relatively abstract way, multiple products (even product domains, for instance Web versus Email) can leverage the same attention data.
Additional tech. note: Thank you, APML authors, for strictly standardizing the date format in the APML spec (ISO8601). If only we could have been so lucky with RSS/ATOM. Now lets hope people actually stick to the date formats!
“While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve because of the engineering, scientific and business challenges. The lack of specific and simple consumer focus makes it mostly an academic exercise.”
The post touches upon some of the practical issues keeping semantic technology out of the hands of end-users, and potential ways around these roadblocks. Summaries are given for three top-down “mechanisms” that may provide workarounds to some issues:
Leveraging Existing Information
Using Simple / Vertical Semantics
Creating Pragmatic / Consumer-Centric Apps.
I can’t agree more with the underlying principle of this post: top-down approaches are necessary in order to expose end-users to semantic search & discovery (at least in the near-term).
However, this isn’t to say that there isn’t value in bottom-up semantic web technologies like RDF, OWL, etc. On the contrary, these technologies can provide extremely high quality data, such as categorization information. In the past year, there’s been significant growth in the amount of bottom-up data that’s available. This includes things like the RDF conversion of Wikipedia structured data (DBpedia), the US Census, and other sources. Indeed, the “W3C Linking Open Data” project is working on interlinking these various bottom-up sources, further increasing their value for semantic web applications. What’s the point of all this data collection/linking? “It’s all about enabling connections and network effects.”
My personal feeling is that neither a bottom-up or top-down approach will attain complete “success” in facilitating the semantic web. Top-down approaches are good enough for some applications, but sometimes generate dirty results (incorrect categorizations, etc.) Bottom-up approaches can generate some incredible results when operating within a limited domain, but can’t deal with messy data. What’s needed is a “bridging the gap” between the two modes: leveraging top-down approaches for initial dirty classification, and incorporating cleaner bottom-up sources when they’re available.
So how do we bridge the gap? Here’s what I’m betting on: Process-oriented, or agent-based mashups. These sit between the top-down and bottom-up stacks, filtering/merging/sorting/classifying information. More on this soon.