Tech News
← Back to articles

Mosh (Mobile Shell)

read original related products more articles

Remote-shell protocols traditionally work by conveying a byte-stream from the server to the client, to be interpreted by the client's terminal. (This includes TELNET, RLOGIN, and SSH.) Mosh works differently and at a different layer. With Mosh, the server and client both maintain a snapshot of the current screen state. The problem becomes one of state-synchronization: getting the client to the most recent server-side screen as efficiently as possible.

This is accomplished using a new protocol called the State Synchronization Protocol, for which Mosh is the first application. SSP runs over UDP, synchronizing the state of any object from one host to another. Datagrams are encrypted and authenticated using AES-128 in OCB3 mode. While SSP takes care of the networking protocol, it is the implementation of the object being synchronized that defines the ultimate semantics of the protocol.

Roaming with SSP becomes easy: the client sends datagrams to the server with increasing sequence numbers, including a "heartbeat" at least once every three seconds. Every time the server receives an authentic packet from the client with a sequence number higher than any it has previously received, the IP source address of that packet becomes the server's new target for its outgoing packets. By doing roaming “statelessly” in this manner, roaming works in and out of NATs, even ones that may themselves be roaming. Roaming works even when the client is not aware that its Internet-visible IP address has changed. The heartbeats allow Mosh to inform the user when it hasn't heard from the server in a while (unlike SSH, where users may be unaware of a dropped connection until they try to type).

Mosh runs two copies of SSP, one in each direction of the connection. The connection from client to server synchronizes an object that represents the keys typed by the user, and with TCP-like semantics. The connection from server to client synchronizes an object that represent the current screen state, and the goal is always to convey the client to the most recent server-side state, possibly skipping intermediate frames.

Because SSP works at the object layer and can control the rate of synchronization (in other words, the frame rate), it does not need to send every byte it receives from the application. That means Mosh can regulate the frames so as not to fill up network buffers, retaining the responsiveness of the connection and making sure Control-C always works quickly. Protocols that must send every byte can't do this.

Careful terminal emulation

One benefit of working at the terminal layer was the opportunity to build a clean UTF-8 terminal emulator from scratch. Mosh fixes several Unicode bugs in existing terminals and in SSH, and was designed as a fresh start to try to be robust and correct even for pathological inputs.

Tricky unicode Only Mosh and the OS X Terminal correctly handle a Unicode combining character in the first column.

xterm: circumflex on wrong letter. xterm: circumflex on wrong letter.

GNOME Terminal: no circumflex at all. GNOME Terminal: no circumflex at all.

... continue reading