AI services for developers

I wanted to better understand what Artificial Intelligence APIs are available for developers. What follows are the services available today via the big providers (Amazon, Google, IBM, and Microsoft), organized into three layers:

  • Applications - pre-trained deep learning models for a specific purpose such as: vision, speech, and speech recognition.
  • Platforms - fully-provisioned services ready to be trained on big data; often pulled from a warehouse, file system, or database.
  • Frameworks - the technologies underpinning the platforms and applications such as TensorFlow on Hadoop or Caffe on Spark.

Applications

  • Amazon Lex - provides automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text.
  • Amazon Polly - converts text into lifelike speech including dozens of lifelike voices across a variety of languages.
  • Amazon Rekognition - detects objects, scenes, faces; searches and compares faces; and identifies inappropriate content in images.
  • Google Cloud Speech API - convert audio to text; recognizes over 80 languages and variants.
  • Google Cloud Vision API - classifies images into thousands of categories, detects individual objects and faces within images, and finds and reads printed words contained within images.
  • IBM Watson Conversation - allows you to quickly build, test and deploy a bot or virtual agent.
  • IBM Watson Speech to Text - converts audio voice into written text.
  • IBM Watson Text to Speech - converts written text into natural sounding audio in a variety of languages and voices.
  • IBM Watson Visual Recognition - understands and tags visual concepts recognizes food, finds human faces, approximates age and gender, and finds similar images in a collection. 
  • Microsoft Computer Vision API - extracts rich information from images to categorize and process visual data; and machine-assisted moderation of images to help curate your services.
  • Microsoft Bing Speech API - converts audio to text, understands intent, and converts text back to speech for natural responsiveness.
  • Microsoft Recommendations API - recommends items your customers want
    by learning from previous transactions.

Platforms

Frameworks


Notes:

  • Microsoft has an extensive (and growing) list of special-purpose AI APIs. I've just listed a few of the most generally useful.
  • IBM does offer more services than listed above, though most deal with natural language processing - likely as part of it's AlchemyAPI acquisition.
  • Google, IBM, and Microsoft all offer language translation services (not listed here).
  • Google and Microsoft both offer services for working with video (Microsoft appears to offer a real-time video analysis offering).

A look at the daily routines of nine workers

I really enjoy getting a glimpse into how other people live their lives - like wandering history museums reading the placards or visiting people's homes - so this piece by The Outline really resonated with me.

Here's an excerpt from non-fiction writer, Deborah Baker:

What are your typical hours?
I get up at 6:15 a.m., swim at the Y if I'm in Brooklyn, or at the public pool if I'm in India. I'm generally at my desk by 8 a.m. and don't leave it again until about 6 p.m., except for a break for lunch or to reheat my coffee. Regular office hours.
What do you spend most of your workday doing?
These days it is fine tuning sentences or figuring out if I need to cut some details or re-arrange the order of sentences in which a paragraph unfolds. A month ago it was reframing entire chapters, moving stuff around, paying attention to the pace and asking myself whether, in this bit or that bit, I was expecting too much from the reader. Most of my job takes place in my head. To call this labor makes it sound grander than it is.
What do you find to be hardest part of your job?
If I was pressed to answer I would say that the hardest part for me of being a writer is finding a subject that will sustain my interest for the three or so years it takes to get from initial idea to complete manuscript, but also one which I feel I can convince other people (editors, readers) to be as engaged with as I am.

Authors: Rawiya Kameir, Khalila Douze, Ann-Derrick Gaillot

Ben Edwards on the benefits of self-driving cars

Ben Edwards, of Alt Text, shares a terrific overview of the many benefits we will gain with autonomous cars: lives saved, the end of traffic, a better use of our time, improved health, a cleaner earth, and much more. But I'm still worried that it's going to be awhile before we get there - not because the technology won't be ready, but because people won't be ready.

Here's Ben's take:

There will be some significant hurdles to get over with regards to public adoption, as many recent polls suggest the U.S. population is not yet ready to give up the wheel.
Three out of four U.S. drivers said they would feel “afraid” to ride in self-driving cars, according to the AAA survey released on 1 Mar 2016. Just one in five said they would actually trust a driverless vehicle to drive itself with them inside.
There was likely a time when people didn’t trust calculators to add and subtract for them either. I’m sure we’ll soon be in the “break-in” period, where people get used to the numerous semi-automated features already in place on many vehicles, as a way to build trust. But as I like to say, autonomous cars don’t need to be 100% perfect in order for them to be a vast improvement over human drivers.

The world will be a better place when humans are no longer driving.

Google Lens brings search to the physical world

Here is Sandar Pachai, on Google Lens, at Google's I/O keynote yesterday (text from Stratechery):

We are clearly at an inflection point with vision, and so today, we are announcing a new initiative called Google Lens. Google Lens is a set of vision-based computing capabilities that can understand what you’re looking at and help you take action based on that information. We’ll ship it first in Google Assistant and Photos, and then other products.

How does it work? If you run into something and you want to know what it is, say a flower, you can invoke Google Lens, point your phone at it and we can tell you what flower it is…Or if you’re walking on a street downtown and you see a set of restaurants across you, you can point your phone, because we know where you are, and we have our Knowledge Graph, and we know what you’re looking at, we can give you the right information in a meaningful way.

As you can see, we are beginning to understand images and videos. All of google was built because we started understanding text and web pages, so the fact that computers can understand images and videos has profound implications for our core mission.

And Ben Thompson's reaction::

The profundity cannot be overstated: by bringing the power of search into the physical world, Google is effectively increasing the addressable market of searchable data by a massive amount, and all of that data gets added back into that virtuous cycle. The potential upside is about more than data though: being the point of interaction with the physical world opens the door to many more applications, from things like QR codes to payments.

Ben's excitement is contagious: AI provides a transport for information between the digital and physical worlds. We're on the cusp of amazing change.

Amazon and virtuous monopolies

Ben Thompson, of Stratechery, has argued previously (Antitrust and Aggregation) that today's modern aggregators (Facebook for social interactions, WeChat for China commerce and business, and Amazon in retail) are in effect modern monopolies unlike the traditional monopolies of the last century:

...consumers are attracted to an aggregator through the delivery of a superior experience, which attracts modular suppliers, which improves the experience and thus attracts more consumers, and thus more suppliers in the aforementioned virtuous cycle... 
Thanks to these virtuous cycles, the big get bigger; indeed, all things being equal the equilibrium state in a market covered by Aggregation Theory is monopoly: one aggregator that has captured all of the consumers and all of the suppliers.
This monopoly, though, is a lot different than the monopolies of yesteryear: aggregators aren’t limiting consumer choice by controlling supply (like oil) or distribution (like railroads) or infrastructure (like telephone wires); rather, consumers are self-selecting onto the Aggregator’s platform because it’s a better experience.

This emphasis (mine) offers a key insight into how these modern monopolies are different: we consumers enjoy being a part of them because we get better prices and better service. This is opposed to a traditional definition of a monopoly which often resulted in consumers being subjected to higher prices or reduced investment in the offering (or both).

A second difference we are seeing is that these platforms are dominating more than one business - for example, Amazon in retail and cloud - and using their size in each to generate synergies in excess of what each could produce independently. As noted, an oft-shared example of this is Amazon's use of AWS to power their retail business.

Zach Canter, writing for TechCrunch, has identified a third key difference:

In the 10+ years since AWS’s debut, Amazon has been systematically rebuilding each of its internal tools as an externally consumable service. A recent example is AWS’s Amazon Connect — a self-service, cloud-based contact center platform that is based on the same technology used in Amazon’s own call centers. Again, the “extra revenue” here is great — but the real value is in honing Amazon’s internal tools.
If Amazon Connect is a complete commercial failure, Amazon’s management will have a quantifiable indicator (revenue, or lack thereof) that suggests their internal tools are significantly lagging behind the competition. Amazon has replaced useless, time-intensive bureaucracy like internal surveys and audits with a feedback loop that generates cash when it works — and quickly identifies problems when it doesn’t. They say that money earned is a reasonable approximation of the value you’re creating for the world, and Amazon has figured out a way to measure its own value in dozens of previously invisible areas.
But this much is obvious — we all know about AWS. The incredible thing here is that this strategy — in one of the most herculean displays of effort in the history of the modern corporation — has permeated Amazon at every level. Amazon has quietly rolled out external access in nooks and crannies across their entire ecosystem, and it is this long tail of external service availability that I think will be nearly impossible to replicate.

Amazon's approach of commercializing and externalizing all aspects of it's business is clever: using market forces to ensure that relied upon systems and processes continue to remain the most efficient. This approach likely works best in high volume, price-sensitive businesses; this may not work as well for luxury goods or high-touch services. But as a match for Amazon - that continually focuses on getting product to customers faster and cheaper - it's a near-perfect fit.