AI services for developers

I wanted to better understand what Artificial Intelligence APIs are available for developers. What follows are the services available today via the big providers (Amazon, Google, IBM, and Microsoft), organized into three layers:

  • Applications - pre-trained deep learning models for a specific purpose such as: vision, speech, and speech recognition.
  • Platforms - fully-provisioned services ready to be trained on big data; often pulled from a warehouse, file system, or database.
  • Frameworks - the technologies underpinning the platforms and applications such as TensorFlow on Hadoop or Caffe on Spark.


  • Amazon Lex - provides automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text.
  • Amazon Polly - converts text into lifelike speech including dozens of lifelike voices across a variety of languages.
  • Amazon Rekognition - detects objects, scenes, faces; searches and compares faces; and identifies inappropriate content in images.
  • Google Cloud Speech API - convert audio to text; recognizes over 80 languages and variants.
  • Google Cloud Vision API - classifies images into thousands of categories, detects individual objects and faces within images, and finds and reads printed words contained within images.
  • IBM Watson Conversation - allows you to quickly build, test and deploy a bot or virtual agent.
  • IBM Watson Speech to Text - converts audio voice into written text.
  • IBM Watson Text to Speech - converts written text into natural sounding audio in a variety of languages and voices.
  • IBM Watson Visual Recognition - understands and tags visual concepts recognizes food, finds human faces, approximates age and gender, and finds similar images in a collection. 
  • Microsoft Computer Vision API - extracts rich information from images to categorize and process visual data; and machine-assisted moderation of images to help curate your services.
  • Microsoft Bing Speech API - converts audio to text, understands intent, and converts text back to speech for natural responsiveness.
  • Microsoft Recommendations API - recommends items your customers want
    by learning from previous transactions.




  • Microsoft has an extensive (and growing) list of special-purpose AI APIs. I've just listed a few of the most generally useful.
  • IBM does offer more services than listed above, though most deal with natural language processing - likely as part of it's AlchemyAPI acquisition.
  • Google, IBM, and Microsoft all offer language translation services (not listed here).
  • Google and Microsoft both offer services for working with video (Microsoft appears to offer a real-time video analysis offering).