Apple trained an AI to recognize previously unseen hand gestures from wearable sensors

In the new study, Apple taught an AI model to recognize hand gestures that weren’t part of its original training dataset. Here are the details.

What is EMG?

Apple has published a new study in its Machine Learning Research blog, called EMBridge: Enhancing Gesture Generalization from EMG Signals through Cross-Modal Representation Learning. This study will be presented at the ICLR 2026 Conference in April.

In it, the researchers explain how they trained an AI model to recognize hand gestures, even when those specific hand gestures weren’t part of its original dataset.

To achieve this, they developed EMBridge, “a cross-modal representation learning framework that bridges the modality gap between EMG and pose.”

EMG, or Electromyography, measures the electrical activity generated by muscles during contraction. Its practical applications span from medical diagnosis and physical therapy to prosthetic limb control.

More recently (although this is definitely not a new area), it has been more widely explored in wearables and AR/VR systems.

Meta’s Ray-Ban Display glasses, for instance, use EMG technology in the form of what Meta calls a Neural Band, a wrist-worn device that “interprets your muscle signals to navigate Meta Ray-Ban Display’s features,” per the company’s description.

In Apple’s study, the EMG signals used for training weren’t detected by a wrist-worn device. Instead, the researchers used two datasets:

emg2pose : “[…] a large-scale open-source EMG dataset containing 370 hours of sEMG and synchronized hand pose data across 193 consenting users, 29 different behavioral groups that include a diverse range of discrete and continuous hand motions such as making a fist or counting to five. The hand pose labels are generated using a high-resolution motion capture system. The full dataset contains over 80 million pose labels and is of similar scale to the largest computer vision equivalents. Each user completed four recording sessions per gesture category, each with a different EMG-band placement. Each session lasted 45–120 s, during which users repeatedly performed a mix of 3–5 similar gestures or unconstrained freeform movements. We use non-overlapping 2-second windows as input sequences. EMG is instance-normalized, band-pass filtered (2–250 Hz), and notch-filtered at 60 Hz.”

... continue reading