Skip to content
Tech News
← Back to articles

Voice Modems

read original get Voice Modem USB Adapter → more articles
Why This Matters

Voice modems highlight the legacy of early cellular and PC architectures, where call audio was managed independently from the main device, impacting modern smartphone design and functionality. Understanding this history explains current challenges in call recording and audio integration, emphasizing the importance of evolving hardware and software integration in the tech industry. This insight underscores the ongoing need for innovation to improve user experience and device interoperability.

Key Takeaways

If you've done much with modern cellphones, you've probably noticed just how odd the architecture can be around audio. Specifically, I mean call audio: modern smartphones have made call audio less of a special case (mostly by just becoming more complicated in general), but in older phones you would often find arrangements where the cellular modem had direct analog audio to the microphone and speaker, perhaps via some switching to share amplifiers. That design meant that the cellular modem functioned basically as a completely independent device, a fully-capable "cellular phone" with the ability to make and receive voice calls. The role of the rest of the smartphone, and its operating system, was just to provide control messages for starting and ending calls.

In modern phones the audio path to and from the modem is digital and it's more integrated into the operating system audio service, but still not fully. You might have noticed, for example, that it is excessively difficult to record call audio on most phones. Regulatory and liability pressures are one reason for this, but another is that it's actually kind of difficult: there may not be any physical way for software running on the main processor to receive audio from the cellular modem. The designer has to put in explicit effort to make that work, effort that only became common more recently to facilitate automatic transcription—and VoLTE, a whole complication that I will simply ignore for the sake of a cleaner historical narrative. You come here to read about old phones, not new ones.

You've probably read enough of my writing to know where this is going: the design of cellular radios, which assume call audio to be part of Their Exclusive Domain, is a legacy of an age-old architectural decision traceable to the original Hayes Smartmodem. It relates to a feature of modems that was widely available, but sparsely used, for much of the PC revolution. The details are odd!

First, for context, let's recede into our mind palaces and travel back to the 1980s. AT&T-designed modems like the Bell 103 had created a standardized family of protocols for data over voice lines, and a company called Hayes introduced a Bell 103-like implementation called the Smartmodem. The Smartmodem was quite successful on its own, but it was more significant for having introduced a common control interface between the modem and the computer. Previous modems had acted as transparent devices that expected Something Else to perform call setup tasks, while the Hayes Smartmodem could pick up the line and dial all on its own. That required that the computer send commands to the modem to configure and start a call.

Hayes designed a simple scheme for sending commands to the modem and switching it in and out of transparent data mode, and that protocol was then widely copied by other modem manufacturers. You could call it the "Hayes command set," and older documents often do, but these days it's more commonly known by the two characters that prefix most commands: the AT protocol.

From its origin in 1981, AT has shown remarkable staying power. Virtually all computer-connected modems, to this very day, continue to use AT commands for basic configuration. Likewise, the basic architecture of the Smartmodem persists: the Smartmodem connected to the host computer using a single RS-232 link that switched between carrying control messages and data. The very latest 5G modems still work the same way, complicated by the addition of multiple separate UART serial channels (so that, for example, control commands, data, and GNSS data can each have their own separate channel) and the adoption of the USB communications device class "Abstract Control Model," a standard UART-over-USB implementation mostly intended to simplify modems. Plug a modern 5G modem into a Linux machine and you can easily observe this: virtually all cellular modems are USB-attached and will appear as a USB composite device with multiple serial adapters, usually attached as /dev/ttyACM* due to the USB-CDC ACM class.

Courtesy of the V.250 standard (a formalization of AT commands) and considerable effort by driver implementers, USB-attached modems "Just Work" as network interfaces on modern Linux—but under the hood, the kernel is communicating with the modem over separate serial interfaces. Back in the olden days, it was common to run PPP (point-to-point protocol) over one of the serial interfaces to use the actual data (bearer) channel, but now PPP has mostly given way to "Direct IP" where you just push packets over the serial link.

Just to complicate things a touch more, there are vendor-specific standards like QMI (Qualcomm) that completely replace AT and find use in modern smartphones, but they're messy with regards to Linux support. If you are personally interacting at this layer, messing with modems or writing communications software or whatever, you are almost certainly going to stick to AT commands. Modem vendors continue to build on AT. If you look at LTE modems made for IoT applications, for example, it's common for them to provide a complete HTTP implementation (and sometimes MQTT, and sometimes some kind of proprietary message broker protocol) accessible via AT commands. That means you can implement an IoT device without a network stack at all, deferring all network operations to the modem itself. With a JSON-over-HTTP backend, for example, you might send AT commands with JSON payloads over the serial control channel and then get JSON back. You never interact with the network at all, the modem is a completely self-contained system. At the extreme, you might implement your entire device using exclusively the modem. This is a common approach for telematics devices like GPS trackers: they consist of nothing but a cellular modem, the telemetry application is built for the modem using an SDK from its vendor, and you interact with it using AT commands. IoT-class modems frequently provide GPIO and user flash for just this purpose.

None of that is actually what this article is about, but I want to make clear how profound the implications of the Smartmodem heritage are. In 1981, the Smartmodem was a standalone device controlled over serial because the limitations of the era's computer made that a practical necessity. Processors weren't fast enough to run the modem DSP alongside other workloads, certification requirements for telephone-connected devices were stricter, etc. Despite the late-'90s detour into "winmodems," most of those constraints still exist, just in the different forms of the cellular network. Today's modems are less v.54 and more 5G, but they still act as standalone devices controlled over serial channels.

Most telephone modems of the 1980s were exclusively data modems. You could use AT commands to make a call, switch into data mode, and then you basically had a very long serial cable from your device to the computer on the other end of the call. That was all these modems did; their only interaction with "The Telephone System" besides as a pair of wires was for basic call control like detecting dial tone and sending DTMF dialing. That was quite natural considering their evolution from acoustic coupler modems (where you dialed the phone yourself and then set the handset on the modem), but by the late '80s, as devices like the Smartmodem with their own call control were common, it started to feel primitive. With Carterfone and the breakup of the AT&T monopoly, computers were starting to feel like first-class citizens on the telephone system. Shouldn't they have more complete support for, well, telephone things?

... continue reading