Benjamin Atkin

MessagePack and Cap’n Proto are two different serialization tools. MessagePack is a format for serialization. Cap’n Proto creates different formats for serialization, using a .capnp specification. The .capnp specification has a format that is understood by the capnp command-line tool, which is implemented in C++, but Cap’n Proto itself doesn’t come with a built-in universal serialization format. This is similar to Protocol Buffers. MessagePack, on the other hand, has client libraries with the equivalents of JSON.parse() and JSON.stringify().

Cap’n Proto is known for making it possible to skip the serialization/deserialization step, and to access the raw data in memory. This could also be done with MessagePack, though many libraries don’t support it.

For the use case of sending a bunch of JSON data, the default way of using MessagePack will involve sending each key in a JSON key/value pair N times. For this use case in Cap’n Proto, it prescribes creating a struct, which makes it so it won’t need to be sent at all with the data. It will only need to be included in the code that uses the protocol. This can be got around in MessagePack by using extensions or just by changing the data layout, perhaps by using two dimensional arrays like a CSV file, with a header.

With MessagePack you can have a client simply use a MessagePack library. With Cap’n Proto, you need to ship a serializer/deserializer. That is because it’s a self-describing format, like JSON.

Cap’n Proto pays attention to alignment, while MessagePack uses just enough bytes for the data. It’s worth looking at the spec – numbers from -32 to 127 only need a byte a piece. The byte alignment of Cap’n Proto makes it so an array of the same type of number could presumably be stored efficiently, but a array of mixed values would likely take more space. However, the byte alignment makes in-memory access of raw Cap’n Proto data very fast.

The creators of each are very accomplished software engineers. The author of MessagePack founded Treasure Data, which was acquired by Arm. The author of Cap’n Proto founded Sandstorm and now works at Cloudflare.

MessagePack is similar to tnetstrings. It was used by Socket.io but they stopped using it, using JSON and a minimal custom netstrings-like format for efficiently storing binary data. That shows that the techniques can be used without the tools.

I think that MessagePack and Cap’n Proto are very good for teaching some different serialization techniques. They also should be easy to use them together, because unlike JSON, both can store binary data efficiently. With JSON you have the overhead of converting it to and from base64. Also it can be nested efficiently. If you put JSON data inside a JSON string, all the double quotes need to be escaped, and this can result in inflated sizes if it’s nested deeply. So if you make a library that uses MessagePack but some want to use it with Cap’n Proto, I say keep using MessagePack, and vice versa. However, currently for me the edge goes to Cap’n Proto because it is so fast, and is more space efficient under typical use. However, Cap’n Proto’s efficient formats could possibly be partially recreated in MessagePack and might be preferable to many.

Benjamin Atkin

MessagePack and Cap’n Proto

20230612

20230611

San Francisco 20230413

Santa Cruz 20230409

stikers 20230324

Palo Alto 20230402

20230119

20221128

20221124

20221123a