Going faster than memcpy
Going faster than memcpy While profiling Shadesmar a couple of weeks ago, I noticed that for large binary unserialized messages (>512kB) most of the execution time is spent doing copying the message (using memcpy ) between process memory to shared memory and back. I had a few hours to kill last weekend, and I tried to implement a faster way to do memory copies. Autopsy of memcpy Here’s the dumb of perf when running pub-sub for messages of sizes between 512kB and 2MB. Children Self Shared Ob