Unifying jsonpointer and unix file paths

Update 2013-11-10: I added some edge cases to show the completeness of this format.

Kris Zyp’s jsonpointer is a tool for addressing part of a JSON object, that’s designed to be both powerful and familiar. It supports any value for a key that JSON supports, yet it looks like a path from a URL or a filesystem.

A jsonpointer is the path from the root node of a JSON document to either itself or an interior node. A jsonpointer starts with a “/”. A “/” is also the simplest jsonpointer. After that, it contains one or more keys or indexes (in JavaScript an index is a key, but the JSON spec makes a distinction between keys and indexes), separated by “/” characters. To include a “/” character in a key (JSON supports this), use “~1”. To include a “~” character in a key, use “~0”.

What it lacks are relative paths and home paths (“~”). Since jsonpointer requires that it start with a “/”, this frees me to prepend anything but “/” to the start of the expression. Here are the things I can prepend:

  1. A dot (.), which makes it a relative path, like “./foo”.
  2. Two dots (..), which makes it a relative path from the parent node, like “../foo”.
  3. More than two dots (…), which will go up extra levels. “…/foo” would be analogous to “../../foo” in UNIX file paths. “../../foo” in my augmented json pointer would go to “baz” to quux in this json object: {“foo”: “not here”, “a”: {“b”: {“c”: “baz”}, “..”: {“foo”: “quux”}}} That is, instead of jumping up three levels, it would jump up two levels and interpret the second “..” as a key in a json object. Two dots go up one level, three dots go up two levels, four dots go up three levels, and so on.
  4. A tilde (~) makes it a path from the home directory.

Some examples of edge cases:

  1. the absolute path [“.”, “hello”] to access foo in {“.”: {“hello”: “foo”}} will be /./hello
  2. the relative path [“.”, hello”] to access foo in {“quux”: {“.”: {“hello”: “foo”}}} where the current path is /quux will be “././hello”.
  3. the relative path [“..”, “baz”] to access “foo” in {“quux”: {“..”: “hello”}, “baz”: “foo”} when where the current path is “/quux” where there is a “..” in the current node will be “../baz”. To access “hello” using a relative path from “/quux”, use “./..”.

the tsj file format

This is a proposal for a new file format.

The TSV file format is very simple. It requires that fields be delimited by tabs, and has a limitation that tabs can’t appear within fields. Newlines are also forbidden inside a field.

The JSON file format is both simple and powerful, but it can be a tiny bit clumsy compared to TSV. Also there is a tradition of putting one JSON object on a line, but that format isn’t JSON, and it doesn’t have a name that’s caught on.

The TSJ file format is a specialization of TSV, where a field is either a JSON expression or a string. Here is how you tell if it’s a string or a JSON expression:

  1. Fields that start with “{“, ‘”‘, “[“, “-“, or the digits (0-9) are treated as JSON expressions. If they don’t parse, it’s invalid TSJ. If you want to store such a value, use a JSON string expression.
  2. Fields that are exactly one of the words “true”, “false”, or “null” are treated as JSON expressions.
  3. All other fields are treated strings.

Strings and JSON expressions are both UTF-8. Fields are not trimmed after they are split by tab characters. A file containing “true\t false\n” will be read as [[true, ” false”]].

json has elements too

XML/HTML are centered around elements. An element is something that has an opening tag and a closing tag, or simply has a single self-closing tag (The slash before the closing angle bracket is required in XML and XHTML but not HTML). For example, this has six elements:

<!DOCTYPE html>
    <meta charset="utf-8" />
    <title>Hello, world.</title>
    <p>Hello, world.</p>

In this example each tag name is only used once. The elements are (by order of first appearance) html, head, meta, title, body, p. The DOCTYPE definition is special and is not considered to be an HTML or XML element. The charset part of this document is an attribute.

What does element mean in JSON? Something a bit different. An XML element is a compound data type. A JSON element is part of a compound data type.

If you look at the white box on the right part of json.org detailing the grammar, you can see that it has names for items in JSON’s compound data types, objects and arrays. An item in an object is called a pair, and an item in an array is called an element.

It helps when building APIs for processing data formats to have names for things. The terminology for XML is used in XML tools, and has made them easier to understand. For instance, the jQuery API docs use the terms element and attribute heavily.

I think the JSON community should adopt the terms pair and element and use them more frequently, to enhance understanding of the data. It’s useful to talk about a pair rather than a key and a value, because when you add something to a JavaScript object that’s what your adding. You aren’t adding just a value. A JavaScript array can include multiple copies of the same value, so when you’re removing a single occurrence of a value, it makes more sense to say you’re removing an element than removing a value.

JSON has been popular for years now, in many vital technology communities, but it is still lacking in processing support compared to XML. I think clear terminology is one thing that XML has right, and indeed the JSON spec has clear terminology. What’s lacking is widespread use of the terminology.

ANN: js2json and json2js (for CouchApps)

Here’s a copy of the email I sent to the CouchDB users mailing list:

I made a couple of npm (node.js) modules for editing CouchDB design documents that involves fewer files than python CouchApp, but like python CouchApp supports two-way sync. The function source is left intact when converting between JavaScript source and JSON. The JavaScript source version just shows it in an embedded function expression, which makes it syntax highlightable.


Here’s a quick example. If I stick this in books.js:

module.exports = {
  "views": {
    "author": {
      "map": function(doc) {
        if (doc.type == 'book')
          emit(doc.author, null);
    "title": {
      "map": function(doc) {
        if (doc.type == 'book')
          emit(doc.title, null);

…and run this (after npm install json2js js2json):

var js2json = require('js2json');
var json2js = require('json2js');
var fs = require('fs');

var jsSource = fs.readFileSync('books.js', 'utf8');
var jsonValue = js2json.convert(jsSource);
fs.writeFileSync('books.json', JSON.stringify(jsonValue, null, 2) + "\n");
var jsSourceFromJson = json2js.convert(jsonValue);
fs.writeFileSync('books-from-json.js', jsSourceFromJson + "\n");

…I get the following in books.json:

  "views": {
    "author": {
      "map": "function(doc) {\n  if (doc.type == 'book')\n
emit(doc.author, null);\n}"
    "title": {
      "map": "function(doc) {\n  if (doc.type == 'book')\n
emit(doc.title, null);\n}"

…and books-from-json.js is exactly the same as books.js.

I explain it more in my blog post (linked at the top of this message). I need to add a cli tool that syncs, a way to handle attachments, and a way to handle embedded multiline strings for it to be a full-featured design doc editor. I have much bigger plans for this, though: I want to break up CouchApps into a bunch of smaller documents! The source and tests for these two modules is programmed in the same style. I think storing functions in JSON makes CouchDB just a little bit like Smalltalk, with a much more familiar language.

Thanks for reading. Feedback welcome and appreciated.

A JSON Column Coder for Rails 3.1

Rails 3.1 has a serialize function that can take a custom column coder. A custom coder needs to have dump and load methods set, or else it will be recognized as a required type for the built-in YAML coder called YAMLColumn.

While the JSON class has the two required methods, it doesn’t allow specifying a default. So I created a custom coder. I don’t know where the best file and module locations to put the class in are, so I won’t include them here. This is the class, though:

class JSONColumn
  def initialize(default={})
    @default = default

  # this might be the database default and we should plan for empty strings or nils
  def load(s)
    s.present? ? JSON.load(s) : @default.clone

  # this should only be nil or an object that serializes to JSON (like a hash or array)
  def dump(o)
    JSON.dump(o || @default)

Since load and dump are instance methods, an instance of JSONColumn needs to be passed rather than the class. Here’s an example that works for me inside of the rails console:

class Person < ActiveRecord::Base
  validate :name, :pets, :presence => true
  serialize :pets, JSONColumn.new([])

Update: Added .clone to the load method. HT @miyagawa.