AlexaClientSDK  1.16.0
A cross-platform, modular SDK for interacting with the Alexa Voice Service
What is the Alexa Voice Service (AVS)?

The Alexa Voice Service (AVS) enables developers to integrate Alexa directly into their products, bringing the convenience of voice control to any connected device. AVS provides developers with access to a suite of resources to build Alexa-enabled products, including APIs, hardware development kits, software development kits, and documentation.

Learn more »

Overview of the AVS Device SDK

The AVS Device SDK provides C++-based (11 or later) libraries that leverage the AVS API to create device software for Alexa-enabled products. It's modular and abstracted, providing components for handling discrete functions such as speech capture, audio processing, and communications, with each component exposing the APIs that you can use and customize for your integration. It also includes a sample app, which demonstrates the interactions with AVS.

Get Started

You can set up the SDK on the following platforms:

You can also prototype with a third party development kit:

Or if you prefer, you can start with our SDK API Documentation.

Learn More About The AVS Device SDK

Watch this tutorial to learn about the how this SDK works and the set up process.

SDK Architecture

This diagram illustrates the data flows between components that comprise the AVS Device SDK for C++.

SDK Architecture Diagram

Audio Signal Processor (ASP) - Third-party software that applies signal processing algorithms to both input and output audio channels. The applied algorithms produce clean audio through features including, acoustic echo cancellation (AEC), beam forming (fixed or adaptive), voice activity detection (VAD), and dynamic range compression (DRC). If a multi-microphone array is present, the ASP constructs and outputs a single audio stream for the array.

Shared Data Stream (SDS) - A single producer, multi-consumer buffer that allows for the transport of data between a single writer and one or more readers. SDS performs two key tasks:

  1. Passes audio data between the audio front end (or Audio Signal Processor), the wake word engine, and the Alexa Communications Library (ACL) before sending to AVS
  2. Passes data attachments sent by AVS to specific capability agents via the ACL

SDS uses a ring buffer on a product-specific (or user-specified) memory segment, allowing for interprocess communication. Keep in mind, the writer and reader(s) might be in different threads or processes.

Wake Word Engine (WWE) - Software that spots wake words in an input stream. Two binary interfaces make up the WWE. The first handles wake word spotting (or detection), and the second handles specific wake word models (in this case "Alexa"). Depending on your implementation, the WWE might run on the system on a chip (SOC) or dedicated chip, like a digital signal processor (DSP).

Audio Input Processor (AIP) - Handles audio input sent to AVS via the ACL. These include on-device microphones, remote microphones, an other audio input sources.

The AIP also includes the logic to switch between different audio input sources. AVS can receive one audio input source at a given time.

Alexa Communications Library (ACL) - Serves as the main communications channel between a client and AVS. The ACL performs two key functions:

  1. Establishes and maintains long-lived persistent connections with AVS. ACL adheres to the messaging specification detailed in Managing an HTTP/2 Connection with AVS.
  2. Provides message sending and receiving capabilities, which includes support JSON-formatted text, and binary audio content. For more information, see Structuring an HTTP/2 Request to AVS.

Alexa Directive Sequencer Library (ADSL): Manages the order and sequence of directives from AVS, as detailed in the AVS Interaction Model. This component manages the lifecycle of each directive, and informs the Directive Handler (which might be a Capability Agent) to handle the message.

Activity Focus Manager Library (AFML): Provides centralized management of audiovisual focus for the device. Focus uses channels, as detailed in the AVS Interaction Model, to govern the prioritization of audiovisual inputs and outputs.

Channels can either be in the foreground or background. At any given time, one channel can be in the foreground and have focus. If more than one channels are active, you need to respect the following priority order: Dialog > Alerts > Content. When a channel that's in the foreground becomes inactive, the next active channel in the priority order moves into the foreground.

Focus management isn't specific to Capability Agents or Directive Handlers, and non-Alexa related agents can also use it. This allows all agents using the AFML to have a consistent focus across a device.

Capability Agents: Handle Alexa-driven interactions; specifically directives and events. Each capability agent corresponds to a specific interface exposed by the AVS API. These interfaces include:

Security Best Practices

All Alexa products should adopt the Security Best Practices for Alexa. When building Alexa with the SDK, you should adhere to the following security principles.

Important Considerations

Release Notes and Known Issues

Note: Feature enhancements, updates, and resolved issues from previous releases are available to view in "".

v1.16.0 released 10/25/2019:


Bug Fixes

Known Issues

AlexaClientSDK 1.16.0 - Copyright 2016-2019, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0