05.16.12

Google Knowledge Graph: A Tipping Point

Posted in Contextual, News, SemanticWeb at 2:05 pm by elliot

The semantic web / NLProc world is abuzz today with the news of Google’s Knowledge Graph.

I’m thrilled and fascinated by Google’s work in this arena.  They’re taking a true “web scale” approach towards knowledge extraction.  My company (AlchemyAPI) has been working in this area intensely over the past year, examining large swaths of the web (billions of pages), performing language / structural analysis, and extracting reams of factual and ontological data.  We’re using gathered data for different purposes than Google (we’re enhancing our semantic analysis service, AlchemyAPI — whereas Google is improving the search experience for their customers), but we are both using some analogous approaches to find and extract this sort of information.

What’s interesting to me, however, is how this is really a sort of tipping point for Google. We’re witnessing their evolution from “search engine” to “knowledge engine”, something many have expected for years — but which carries a number of consequences (intended and unintended).

Google has always maintained a careful balance of risk/reward with content owners/creators. They provide websites with referral traffic (web page hits), while performing what some may argue is wholesale copyright infringement (copying entire web pages, images, even screenshots of web pages).

This has historically worked out quite well for Google. Website owners get referral traffic — thus can show ads, sell subscriptions, and get paid. Google copies their content (showing snippits/images/etc on Google.com properties) to make this virtuous cycle happen.

Stuff like the “Knowledge Graph” potentially torpedoes this equation. Instead of pointing users to the web page that contains the answer to their search, Google’s semantic algorithms can directly display an answer, without the user ever leaving Google.com.

Say you’re a writer for About.com — spending your time gathering factual information on your topic of choice (aka, “Greek Philosophers”). You carefully curate your About.com page, and make money on ads shown to users who read your content (many of whom are referred from Google.com).

If Google can directly extract the “essence” of these pages (the actual entities and facts contained within), and show this information to users — what incentive do these same individuals have to visit your About.com page? And where does this leave content creators?

The risk here isn’t necessarily a legal one — there’s quite a bit of established precedent which states that “facts” cannot be easily owned or copyrighted. But sites could start blocking Google’s crawlers. Noone is likely to do this anytime soon as Google’s semantic features are only just getting started and “referral traffic” is still the biggest game in town. But what does the future hold?

I’m guessing Google will work out these sort of bumps in the road on their path towards becoming a true Knowledge Engine. But it’s an interesting point to think about.

PS: Google Squared could be argued as an earlier “tipping point”, but was largely more of an experiment. The Google Knowledge Graph represents a true, web-scale commercial effort in this arena. A real tipping point.

ジャパンシーフーズ
mgid advertising

03.27.09

AlchemySnap – OCR+Photo Search for TMobile G1

Posted in API, Coding, Contextual, NLP, Technology, Twitter at 11:56 am by elliot

Here’s a demo app I created for the T-Mobile G1, to show off my company’s AlchemyAPI image / text mining infrastructure service.

Watch the video for more info:

05.15.08

New Sidebar Widget

Posted in Clipping, Contextual, ImplicitWeb, NLP, Orchestr8, Uncategorized, Widgets at 9:02 am by elliot

I’ve just added a new sidebar widget to my blog: “Related Content”.

This is a demonstration of “contextual widgets” from my company’s AlchemyGrid service.

Contextual widgets utilize a custom-engineered “statistical topic keyword extraction from Natural Language Text” facility we’ve recently integrated into our products. If you’re familiar with the “Yahoo Term Extraction” API, our system is essentially doing the same sort of stuff. Natural language processing is fun (and challenging) stuff. Here’s a few notes regarding our implementation:

1. AlchemyGrid’s Term Extraction facility supports multiple languages (English, German, French, Italian, Spanish, and Russian!). This was an important requirement for us, to enable contextual content generation for non-English websites/blogs. There are significant differences between languages in terms of punctuation rules, word stemming, and other details. Hats off to our Term Extraction developers, you’ve done a great job ensuring good initial language coverage.

2. Our Term Extraction facility is entirely statistical in its basis, not using a hard-coded lexicon, etc. This enables it to extract contextually-relevant topic keywords even when they’re (a) new topics, (b) rarely used common nouns/people-names, or (c) misspelled.

We’ve just integrated Term Extraction into our Grid service, so there may be a few minor kinks to work out in the coming weeks — but overall we’re happy with the initial results. Contextual capability vastly expands the utility of AlchemyGrid widgets, as their content can now be automatically customized to relate to your content. This applies to *any* input-enabled widget in the grid (ALL widgets are contextual). Here’s another contextual example (a related Amazon book):

We’ll be enabling the other supported languages in coming weeks, as well as rolling out some additional enhancements to our text processing algorithms (for the geeks in the audience, enhancements to our sentence boundaries detector, inline punctuation processor, etc.).