Answering Machine Detection (AMD) is a feature focused on detecting whether or not a call was picked up by a human or by an answering machine. There are two main outcomes of this analysis, the determination that a machine answered (answering machine, fax, SIT tones) or the determination that a human answered. There are three common use cases:
Calls generated via POST to the /Calls API, POST to the /Participants API, <Dial><Number>, or <Dial><Sip> calls can enable AMD.
Elastic SIP trunking calls cannot utilize AMD because they bypass the Programmable Voice infrastructure. <Dial><Client>, <Dial><Conference>, and <Dial><Queue> calls cannot utilize AMD because the destination is a Twilio internal service.
There are two aspects of AMD that are important to understand. The first is tone detection; i.e. the ability to detect machine beeps, fax tones, busy tones, in-band DTMF, etc.
The second is voice activity detection, or human speech detection; i.e. the ability to separate human speech from background noise, isolate it from barking dogs, etc. AMD analyzes the timing, pattern, and frequency information to make its determination.
Twilio's AMD analyzes the inbound media leg returned from the called party and, depending on the configuration parameters provided, will return the results of its analysis or notify your application when a voicemail beep has been detected.
Out of the box with default settings we see accurate detection to US destinations >90% of the time, though there are a lot of caveats here, especially considering that Twilio provides multiple parameters to control performance characteristics of the detection.
DetectMessageEnd
accuracy is close to 100% accurate, the tones emitted by voicemail boxes and answering machines are distinctly different from human speech patterns of frequency and amplitude; however, it is possible to set the timeout to be too short which will result in the detection never happening because the AMD engine stopped listening.
As with accuracy, there is a big caveat to keep in mind which is that AMD can be configured to perform in a manner that results in slower detection. On average Twilio's AMD will return results within ~4 seconds after the call was answered. Once the determination has been made, Twilio will make a request to the provided webhook with the AnsweredBy
results. This network transit to the server and the reciprocal response from the application are the most common culprits when investigating "slow" AMD times; best practices are to specify the edge for webhook egress to eliminate as much latency as possible. If you are able to run the application in the same AWS that your webhook egresses you can eliminate significant latency.
It's important to note that there are trade-offs between speed and accuracy. If you set the timeout to an extremely short duration you may not be giving the system enough time to gather sufficient data to make a determination, which will negatively impact accuracy.
Twilio's AMD is tuned for accuracy over speed, but even with the parameters provided, it's possible to return a false positive or false negative for detection. A single failure here or there is unavoidable and to be expected, but you can track AMD detection results using Voice Insights Call Summary records via the Annotation API. Doing so may help you identify commonalities between inaccurate detection results; e.g. you may discover that a majority of your false positives are associated with a single destination carrier in a specific region, or that a handful of agents in a contact center are disproportionally represented in the inaccurate results indicating a process adherence or data integrity problem. In any case, once the commonalities between failed detections are identified you can A:B test different configurations based on those commonalities; i.e. if you know that calls going to Comcast landlines in the US need a slightly longer timeout than other landline carriers, you can adjust the parameters when you know the destination is a Comcast landline, etc.
MachineDetection=Enable
is useful for reducing agent idle time use cases. Twilio will return AnsweredBy
as soon as a determination is made.
MachineDetection=DetectMessageEnd
is geared toward the "leave a message in the voicemail box" use case. If a machine is detected Twilio will wait until we hear a beep to return AnsweredBy
.
Normally using the /Calls API AMD occurs before Twilio fetches TwiML instructions. The /Calls API also provides an AsyncAMD
boolean that allows TwiML instructions to execute and the call to progress normally while AMD occurs in the background.
AMD on the /Participants API and <Dial><Number> or <Dial><Sip> are asynchronous by default and cannot be configured to behave otherwise.
Humans answering phones as individuals, either at residences or on their mobile; e.g. "Hello?" or "Hi, this is Michael." These greetings are typically pretty short, <1800ms.
Businesses answer phones like "Thanks for calling Duct Tape Warehouse. This is Howard." These greetings are typically longer, ~1800-3000ms.
Answering machines answer with longer messages commonly punctuated with beeps which contain audio frequencies outside of normal speech range; "Hi you've reached the Fletchers. We're not here right now please leave a message after the beep. [BEEP]". These greetings are typically longer, >3000ms.
Use Programmable Voice's call recording capabilities and capture recordings from ringing. You can then open the recording file in an audio editor like Audacity or Garageband and explore the precise timings of how long things like the gap between answer and initial audio, how long the initial utterance lasts, how long there is silence before the AMD determination has been made, and then use those values to adjust performance, but see our warning about hypertuning performance to a single destination below.
It's not possible to completely eliminate unknowns, as those are calls where the thresholds and timeouts provided the AMD algorithm have not provided enough information to the engine to make a decision. This is most commonly due to people/machines answering with silence that lasts longer than the provided speech end or machine detection timeouts. The more aggressive you are in trying to get responses faster, the more answered_by: unknown
you will receive.
MachineDetectionTimeout
is only relevant for
MachineDetection=DetectMessageEnd
and shortening this value does not speed up detection for
MachineDetection=Enable
.
MachineDetection=DetectMessageEnd
. Our default is 30 seconds, but almost all of the changes to this configuration option we see are to
decrease
this value, not increase it. If you are only trying to land messages in residential voicemail boxes, 30 seconds is probably sufficient for the majority of cases. If you are trying to land messages in business voicemail boxes, 30 seconds is frequently not enough time. Also, some residences have bizarrely long messages, so expect some outliers.
Use MachineDetection=DetectMessageEnd
, make sure you provide ample time for the beep to occur (some people have very long answering machine messages), and return TwiML that utilizes <Play> (or <Say>) to deliver the message.
Use MachineDetection=Enable
. If you are calling individuals, residences, or mobile phones, customers have had good results with setting MachineDetectionSpeechThreshold
to 1800-2000 and MachineDetectionSpeechEndThreshold
to 1400-1500.
If you are calling businesses, you will want to set MachineDetectionSpeechThreshold
somewhere between 1800 and 3000 with MachineDetectionSpeechEndThreshold
set to 1400-1500.
To mitigate the impact of false positives your default behavior should be to leave a message, and tune your application to use the same MachineDetection=Enable
parameters above, but have a handler that is listening for the AnsweredBy
parameter. In the event that AnsweredBy = human
is received by your application, modify the call via API to point to a new TwiML instruction that connects the called party to a waiting agent.