Why are neural networks and cryptographic ciphers so similar? (2025)

Why are neural networks and cryptographic ciphers so similar?

At first glance, training language models and encrypting data seem like completely different problems: one learns patterns from examples to generate text, the other scrambles information to hide it. Yet their underlying algorithms share a curious resemblance, and it’s not for lack of creativity.

Sequence processing: the sequential version

Consider the venerable recurrent neural network, feeding text token by token into a recurrent state before generating the output text:

f in 0 in 1 in n <S> out 0 out m out 0 out 1 <E> encoder decoder

This is structurally identical to the Sponge construction in SHA-3, absorbing bytes into a state before squeezing out the hash:

f in 0 in n out 0 out m absorbing squeezing rate capacity

Perhaps this similarity isn’t surprising: to process variable-length input into a fixed-size state, absorbing sequentially is a natural choice.

Sequence processing: the parallel version

Modern hardware is parallel all the way down, so sequential absorbing wastes performance. Both fields found the same solution: run the expensive function f f f on all chunks in parallel rather than sequentially, then combine with simple addition:

... continue reading