Why are neural networks and cryptographic ciphers so similar?
At first glance, training language models and encrypting data seem like completely different problems: one learns patterns from examples to generate text, the other scrambles information to hide it. Yet their underlying algorithms share a curious resemblance, and it’s not for lack of creativity.
Sequence processing: the sequential version
Consider the venerable recurrent neural network, feeding text token by token into a recurrent state before generating the output text:
f in 0 in 1 in n <S> out 0 out m out 0 out 1 <E> encoder decoder
This is structurally identical to the Sponge construction in SHA-3, absorbing bytes into a state before squeezing out the hash:
f in 0 in n out 0 out m absorbing squeezing rate capacity
Perhaps this similarity isn’t surprising: to process variable-length input into a fixed-size state, absorbing sequentially is a natural choice.
Sequence processing: the parallel version
Modern hardware is parallel all the way down, so sequential absorbing wastes performance. Both fields found the same solution: run the expensive function f f f on all chunks in parallel rather than sequentially, then combine with simple addition:
... continue reading