This is a proposal for a new file format.

The TSV file format is very simple. It requires that fields be delimited by tabs, and has a limitation that tabs can’t appear within fields. Newlines are also forbidden inside a field.

The JSON file format is both simple and powerful, but it can be a tiny bit clumsy compared to TSV. Also there is a tradition of putting one JSON object on a line, but that format isn’t JSON, and it doesn’t have a name that’s caught on.

The TSJ file format is a specialization of TSV, where a field is either a JSON expression or a string. Here is how you tell if it’s a string or a JSON expression:

  1. Fields that start with “{“, ‘”‘, “[“, “-“, or the digits (0-9) are treated as JSON expressions. If they don’t parse, it’s invalid TSJ. If you want to store such a value, use a JSON string expression.
  2. Fields that are exactly one of the words “true”, “false”, or “null” are treated as JSON expressions.
  3. All other fields are treated strings.

Strings and JSON expressions are both UTF-8. Fields are not trimmed after they are split by tab characters. A file containing “true\t false\n” will be read as [[true, ” false”]].

2 thoughts on “the tsj file format

  1. Escaped tabs and newlines will be allowed in the JSON. I won’t need to write a new parser in the first version because I’ll be running the JSON parser inside of my TSJ parser. But to make it more efficient I’ll need to take the JSON parser and modify part of it.

Comments are closed.