Hyperpb: Faster dynamic Protobuf parsing that's faster than generated code

Today we’re announcing public availability of hyperpb, a fully-dynamic Protobuf parser that is 10x faster than dynamicpb, the standard Go solution for dynamic Protobuf. In fact, it’s so efficient that it’s 3x faster than parsing with generated code! It also matches or beats vtprotobuf’s generated code at almost every benchmark, without skimping on correctness.

Don’t believe us? We think our parsing benchmarks speak for themselves.

Here, we show two benchmark variants for hyperpb: out-of-the-box performance with no optimizations turned on, and real-time profile-guided optimization (PGO) with all optimizations we currently offer enabled.

This may seem like a niche issue. However, at Buf we believe that schema-driven development is the future, and this means enabling services that are generic over all Protobuf message types.

Building a dynamic Protobuf parser with the throughput to match (or outperform) ahead-of-time generated code unlocks enormous possibilities. Products that were previously not possible at scale become ordinary, even essential.

Specifically, hyperpb enables us to process and validate large amounts of arbitrary streamed data in a type-aware manner. This is a bottleneck we encountered while building Bufstream.

Broker-side validation

We have long been vocal about client-side validation in the world of Kafka. The downstream costs of invalid data slipping into a topic are very high, since it introduces server-side failure modes — but the high compute cost of broker-side validation is often cited as the reason for mitigating data corruption on an ongoing basis. This is the real reason that broker-side, schema-aware validation isn’t a big-ticket item for the big cloud players: they can’t figure it out.

But we can’t accept the status quo.

We built Bufstream to enable broker-side validation with Protobuf, an industry standard for high-performance, schema-enforced serialization. We also maintain Protovalidate, the gold-standard semantic validation library for Protobuf. In a nutshell, Bufstream uses schemas to parse incoming data from our customers, and runs Protovalidate on the result, to determine whether or not the data producer sent us a bad message. The poor state of dynamic Protobuf parsing would otherwise make this process slow and resource-intensive.

... continue reading