Unifying jsonpointer and unix file paths

Update 2013-11-10: I added some edge cases to show the completeness of this format.

Kris Zyp’s jsonpointer is a tool for addressing part of a JSON object, that’s designed to be both powerful and familiar. It supports any value for a key that JSON supports, yet it looks like a path from a URL or a filesystem.

A jsonpointer is the path from the root node of a JSON document to either itself or an interior node. A jsonpointer starts with a “/”. A “/” is also the simplest jsonpointer. After that, it contains one or more keys or indexes (in JavaScript an index is a key, but the JSON spec makes a distinction between keys and indexes), separated by “/” characters. To include a “/” character in a key (JSON supports this), use “~1”. To include a “~” character in a key, use “~0”.

What it lacks are relative paths and home paths (“~”). Since jsonpointer requires that it start with a “/”, this frees me to prepend anything but “/” to the start of the expression. Here are the things I can prepend:

  1. A dot (.), which makes it a relative path, like “./foo”.
  2. Two dots (..), which makes it a relative path from the parent node, like “../foo”.
  3. More than two dots (…), which will go up extra levels. “…/foo” would be analogous to “../../foo” in UNIX file paths. “../../foo” in my augmented json pointer would go to “baz” to quux in this json object: {“foo”: “not here”, “a”: {“b”: {“c”: “baz”}, “..”: {“foo”: “quux”}}} That is, instead of jumping up three levels, it would jump up two levels and interpret the second “..” as a key in a json object. Two dots go up one level, three dots go up two levels, four dots go up three levels, and so on.
  4. A tilde (~) makes it a path from the home directory.

Some examples of edge cases:

  1. the absolute path [“.”, “hello”] to access foo in {“.”: {“hello”: “foo”}} will be /./hello
  2. the relative path [“.”, hello”] to access foo in {“quux”: {“.”: {“hello”: “foo”}}} where the current path is /quux will be “././hello”.
  3. the relative path [“..”, “baz”] to access “foo” in {“quux”: {“..”: “hello”}, “baz”: “foo”} when where the current path is “/quux” where there is a “..” in the current node will be “../baz”. To access “hello” using a relative path from “/quux”, use “./..”.

the tsj file format

This is a proposal for a new file format.

The TSV file format is very simple. It requires that fields be delimited by tabs, and has a limitation that tabs can’t appear within fields. Newlines are also forbidden inside a field.

The JSON file format is both simple and powerful, but it can be a tiny bit clumsy compared to TSV. Also there is a tradition of putting one JSON object on a line, but that format isn’t JSON, and it doesn’t have a name that’s caught on.

The TSJ file format is a specialization of TSV, where a field is either a JSON expression or a string. Here is how you tell if it’s a string or a JSON expression:

  1. Fields that start with “{“, ‘”‘, “[“, “-“, or the digits (0-9) are treated as JSON expressions. If they don’t parse, it’s invalid TSJ. If you want to store such a value, use a JSON string expression.
  2. Fields that are exactly one of the words “true”, “false”, or “null” are treated as JSON expressions.
  3. All other fields are treated strings.

Strings and JSON expressions are both UTF-8. Fields are not trimmed after they are split by tab characters. A file containing “true\t false\n” will be read as [[true, ” false”]].

json has elements too

XML/HTML are centered around elements. An element is something that has an opening tag and a closing tag, or simply has a single self-closing tag (The slash before the closing angle bracket is required in XML and XHTML but not HTML). For example, this has six elements:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <title>Hello, world.</title>
  </head>
  <body>
    <p>Hello, world.</p>
  </body>
</html>

In this example each tag name is only used once. The elements are (by order of first appearance) html, head, meta, title, body, p. The DOCTYPE definition is special and is not considered to be an HTML or XML element. The charset part of this document is an attribute.

What does element mean in JSON? Something a bit different. An XML element is a compound data type. A JSON element is part of a compound data type.

If you look at the white box on the right part of json.org detailing the grammar, you can see that it has names for items in JSON’s compound data types, objects and arrays. An item in an object is called a pair, and an item in an array is called an element.

It helps when building APIs for processing data formats to have names for things. The terminology for XML is used in XML tools, and has made them easier to understand. For instance, the jQuery API docs use the terms element and attribute heavily.

I think the JSON community should adopt the terms pair and element and use them more frequently, to enhance understanding of the data. It’s useful to talk about a pair rather than a key and a value, because when you add something to a JavaScript object that’s what your adding. You aren’t adding just a value. A JavaScript array can include multiple copies of the same value, so when you’re removing a single occurrence of a value, it makes more sense to say you’re removing an element than removing a value.

JSON has been popular for years now, in many vital technology communities, but it is still lacking in processing support compared to XML. I think clear terminology is one thing that XML has right, and indeed the JSON spec has clear terminology. What’s lacking is widespread use of the terminology.

ANN: js2json and json2js (for CouchApps)

Here’s a copy of the email I sent to the CouchDB users mailing list:

I made a couple of npm (node.js) modules for editing CouchDB design documents that involves fewer files than python CouchApp, but like python CouchApp supports two-way sync. The function source is left intact when converting between JavaScript source and JSON. The JavaScript source version just shows it in an embedded function expression, which makes it syntax highlightable.

http://benatkin.com/2012/09/04/js2json-and-json2js/
https://github.com/benatkin/js2json
https://github.com/benatkin/json2js

Here’s a quick example. If I stick this in books.js:

module.exports = {
  "views": {
    "author": {
      "map": function(doc) {
        if (doc.type == 'book')
          emit(doc.author, null);
      }
    },
    "title": {
      "map": function(doc) {
        if (doc.type == 'book')
          emit(doc.title, null);
      }
    }
  }
}

…and run this (after npm install json2js js2json):

var js2json = require('js2json');
var json2js = require('json2js');
var fs = require('fs');

var jsSource = fs.readFileSync('books.js', 'utf8');
var jsonValue = js2json.convert(jsSource);
fs.writeFileSync('books.json', JSON.stringify(jsonValue, null, 2) + "\n");
var jsSourceFromJson = json2js.convert(jsonValue);
fs.writeFileSync('books-from-json.js', jsSourceFromJson + "\n");

…I get the following in books.json:

{
  "views": {
    "author": {
      "map": "function(doc) {\n  if (doc.type == 'book')\n
emit(doc.author, null);\n}"
    },
    "title": {
      "map": "function(doc) {\n  if (doc.type == 'book')\n
emit(doc.title, null);\n}"
    }
  }
}

…and books-from-json.js is exactly the same as books.js.

I explain it more in my blog post (linked at the top of this message). I need to add a cli tool that syncs, a way to handle attachments, and a way to handle embedded multiline strings for it to be a full-featured design doc editor. I have much bigger plans for this, though: I want to break up CouchApps into a bunch of smaller documents! The source and tests for these two modules is programmed in the same style. I think storing functions in JSON makes CouchDB just a little bit like Smalltalk, with a much more familiar language.

Thanks for reading. Feedback welcome and appreciated.

A JSON Column Coder for Rails 3.1

Rails 3.1 has a serialize function that can take a custom column coder. A custom coder needs to have dump and load methods set, or else it will be recognized as a required type for the built-in YAML coder called YAMLColumn.

While the JSON class has the two required methods, it doesn’t allow specifying a default. So I created a custom coder. I don’t know where the best file and module locations to put the class in are, so I won’t include them here. This is the class, though:

class JSONColumn
  def initialize(default={})
    @default = default
  end

  # this might be the database default and we should plan for empty strings or nils
  def load(s)
    s.present? ? JSON.load(s) : @default.clone
  end

  # this should only be nil or an object that serializes to JSON (like a hash or array)
  def dump(o)
    JSON.dump(o || @default)
  end
end

Since load and dump are instance methods, an instance of JSONColumn needs to be passed rather than the class. Here’s an example that works for me inside of the rails console:

class Person < ActiveRecord::Base
  validate :name, :pets, :presence => true
  serialize :pets, JSONColumn.new([])
end

Update: Added .clone to the load method. HT @miyagawa.

Early WordPress Plugin Preview: Resource Infobox

I’m working on a plugin for displaying infoboxes for URLs called Resource Infobox. The interface is a shortcode with a single url property. To include an infobox, use an expression like the following:

[resource-infobox url="https://github.com/benatkin/wordpress-resource-infobox"]

This will show an Infobox with some basic information:

Examples of Infoboxes include those on Wikipedia, in the upper-right corner of pages, and the old Crunchbase ones I saw on TechCrunch:

The neat thing about the ones built by this WordPress Plugin is that it renders based on some JSON rules. The current rule file has two resource types, GitHub Repositories and GitHub Users. These are not limited to GitHub, though; with the current implementation they could be written for twitter, Crunchbase, and any URL that has straightforward mapping from the HTML url to the JSON url. Here are the current rules:

{
  "github": {
    "repository": {
      "url": "https://github.com/:owner/:repo",
      "api_url": "http://github.com/api/v2/json/repos/show/:owner/:repo",
      "fields": [
        {
          "label": "Repository",
          "param": "repo",
          "url": "https://github.com/:owner/:repo/"
        },
        {
          "label": "Owner",
          "param": "owner",
          "url": "https://github.com/:owner/"
        },
        {
          "label": "Last Commit",
          "path": "repository/pushed_at",
          "type": "date",
          "format": "Y-m-d"
        },
        {
          "label": "Watchers",
          "path": "repository/watchers"
        }
      ]
    },
    "user": {
      "url": "https://github.com/:username",
      "api_url": "http://github.com/api/v2/json/user/show/:username",
      "fields": [
        {
          "label": "Username",
          "param": "username",
          "url": "https://github.com/:username/"
        },
        {
          "label": "Followers",
          "path": "user/followers_count"
        },
        {
          "label": "Respositories",
          "path": "user/public_repo_count"
        }
      ]
    }
  }
}

It’s a JSON DSL inspired by the DSL for Django’s admin interface. It has few features now, but following Django’s lead, it can have more features added to it while retaining its essence. The nice thing about using a JSON DSL rather than custom code is that it can easily be validated, and due to its declarative nature, XSS attacks and other undesired behavior can be avoided. It uses WordPress’s esc_attr and esc_url functions to render strings.

I’d love feedback on this. My goals for it include:

  • Caching (WP Total Cache greatly reduces requests to GitHub’s API on this page)
  • Adding more resource types
  • Adding more field types
  • Adding more display customization
  • Adding instructions for using code from this plugin as a base for other Infobox embedding plugins
  • Improving the resource type DSL
  • Building an editor for creating custom resource types
  • Supporting grabbing a resource type definition based on a custom header of a resource (like X-Infobox-Resource-Type)

I’d also like to make this work with more than just WordPress.