json-to-msgpack

Last updated: 2024-02-15 19:35:21 +0000

Upstream URL: git clone http://chriswarbo.net/git/json-to-msgpack.git

Repo

View repository

View issue tracker

Contents of README follows


JSON to MessagePack

A simple program which outputs MessagePack data when given the path to a file containing JSON data:

json-to-msgpack myFile.json > myFile.msgpack

The aim is to convert large JSON files (> 1GB) without running out of memory. To do this we need to avoid loading entire datastructures into memory. Instead we use multiple passes across the input file (hence why we need a real file rather than stdin).

Note that we still use ordinary recursion to parse nested datastructures, so nesting thousands of objects and arrays will blow the stack.

We also have no support for numbers at the moment; this is simply because the author didn’t need it, and nobody’s asked for it yet ;)

The major difference between JSON and MessagePack is that JSON is delimited, so we don’t know how large each value is until we’ve parsed the whole thing (e.g. to find the length of a string we need to keep reading bytes until we find an unescaped ‘“’; for arrays and objects we keep reading values until we hit ‘]’ or ‘}’). MessagePack uses length fields instead, which is better suited to streaming applications. To turn a JSON value into MessagePack we first emit the relevant MessagePack ‘tag’ byte to indicate whether it’s true/false/null or a string/array/object. For the latter, we scan through the JSON to see how large the value is, we emit this length as a 32bit integer, then we rewind the JSON to start processing the contents (recursively, for arrays and objects).

Note that JSON has a lot of redudancy, so our algorithm will accept lots of invalid data without complaint. For example we treat ‘,’ and ‘:’ as whitespace and we only look at the first byte of constants (true, false and null).