SpeechRecognizer Interface¶
The SpeechRecognizer
interface is one of the key required interfaces in the Alexa experience. Integrate with the SpeechRecognizer
AASB message interface (and the additional required interfaces as described in the following sections) to allow the user to invoke Alexa.
Provide audio data to the Engine¶
At Engine startup time, the SpeechRecognizer
component in the Engine opens an audio input channel of type VOICE
for your application to provide the user speech to Alexa. Your application subscribes to the AudioInput.StartAudioInput
and AudioInput.StopAudioInput
messages as outlined in the AudioInput interface documentation. When the Engine expects to receive audio from your application, the Engine publishes a StartAudioInput
message with audioType
set to VOICE
. Your application provides the voice audio input until the Engine publishes the StopAudioInput
message for the same audio type.
The user decides when to speak to Alexa by invoking her with a tap-to-talk GUI button press, a push-to-talk physical button press, or—in vehicles supporting voice-initiated listening—an "Alexa" utterance.
Invoke Alexa with tap-and-release¶
For button press-and-release Alexa invocation, your application publishes the SpeechRecognizer.StartCapture
message with initiator
set to TAP_TO_TALK
to tell the Engine that the user pressed the Alexa invocation button and wants to speak to Alexa. When requested, your application provides audio to the Engine until Alexa detects the end of the user's speech. The Engine publishes a SpeechRecognizer.EndOfSpeechDetected
message to your application and requests your application to stop providing audio if no other Engine components require it.
Invoke Alexa with press-and-hold¶
For button press-and-hold Alexa invocation, your application publishes the SpeechRecognizer.StartCapture
message with initiator
set to HOLD_TO_TALK
to tell the Engine that the user is holding down the Alexa invocation button and wants to speak to Alexa until releasing the button. When requested, the application provides audio to the Engine. When the user finishes speaking and releases the button, your application notifies the Engine by publishing the SpeechRecognizer.StopCapture
message, and the Engine requests your application to stop providing audio if no other Engine components require it.
Invoke Alexa with voice using Amazonlite wake word engine¶
Note: To use the Amazonlite wake word engine in your application, contact your Amazon Solutions Architect or partner manager.
When the application uses the Amazonlite
Auto SDK module for wake word detection, your application notifies the Engine when the user has hands-free listening enabled (i.e., privacy mode is off) by publishing a PropertyManager.SetProperty
message with property
set to aace.alexa.wakewordEnabled
and value
set to true
. The Engine enables Amazonlite wake word detection and requests audio input from your application. Your application provides audio to the Engine for continuous wake word detection until your application disables hands-free listening by setting the aace.alexa.wakewordEnabled
property to false
. After disabling Amazonlite wake word detection, the Engine requests your application to stop providing audio if there no other Engine components require it.
When Amazonlite detects the "Alexa" wake word in the continuous audio stream provided by your application, the Engine publishes the SpeechRecognizer.WakewordDetected
message and starts an interaction similar to one triggered by tap-to-talk invocation. When Alexa detects the end of the user's speech, the Engine publishes the SpeechRecognizer.EndOfSpeechDetected
message but keeps the audio input stream open for further wake word detection.
Reduce data usage with audio encoding¶
To save bandwidth when the Engine sends user speech to Alexa in SpeechRecognizer.Recognize
events, you can configure the Engine to encode the audio with the Opus audio encoding format by adding the following object to your Engine configuration:
{
"aace.alexa": {
"speechRecognizer": {
"encoder": {
"name": "opus"
}
}
}
}
Recognize
event.
Click to expand or collapse details— Generate the configuration programmatically with the C++ factory function
If your application generates Engine configuration programmatically instead of using a JSON file, you can use the aace::alexa::config::AlexaConfiguration::createSpeechRecognizerConfig
factory function to create the EngineConfiguration
object.
#include <AACE/Alexa/AlexaConfiguration.h>
std::vector<std::shared_ptr<aace::core::config::EngineConfiguration>> configurations;
auto speechRecognizerConfig = aace::alexa::config::AlexaConfiguration::createSpeechRecognizerConfig("opus");
configurations.push_back(speechRecognizerConfig);
// ... create other EngineConfiguration objects and add them to configurations...
m_engine->configure(configurations);