I’ve been reading quite a bit about parsing and templating in ruby as I attempt to port a templating engine from JavaScript to Ruby. Here are some scattered links:

Jison

Jison is a parser generator for JavaScript that has separate lex and bnf definitions. It’s used by Handlebars.js.

Repositoryjison
Ownerzaach

Treetop

Treetop is a parser generator for Ruby. It’s installed with Ruby On Rails, through the mail gem which is installed by ActionMailer.

Repositorytreetop

Citrus

Citrus is another promising parsing gem for Ruby. It seems to be very easy to get started with, and I like many of the design decisions.

Repositorycitrus

Temple

Temple is a templating-specific library that helps with a lot of the AST transformation. It doesn’t seem to have a CFG syntax so it seems that using treetop or citrus would make sense for complex grammars. Otherwise, strscan could be used.

Repositorytemple
Ownerjudofyr

temple-mustache

This is an implementation of a mustache renderer in . It uses strscan to generate the initial parse tree, and Temple to generate the ruby code. It supports mustache sections.

Repositorytemple-mustache
Ownerminad

Slim

Slim is a Haml-like templating library for Ruby that’s used in production by many. It uses Temple, with a line-based parser, which works well because it uses significant indentation for nesting.

Repositoryslim
Ownerstonean

During the last few years, while working as a web developer, I’ve seen awareness of the potential for XSS increase. I’ve also seen frameworks and open source applications come with better tools for avoiding XSS. There is still work to be done, though.

The most common way to avoid XSS is to have a very convenient way to put the escaped contents of a variable into HTML. What this does, is instead of inserting the HTML, <script>, it inserts the text, <script>, using the HTML, &lt;script&gt;. In Ruby On Rails and Django, variables are now by default escaped. In Django if you want a variable to not be escaped, you either need to mark it as safe in the python code, or include the template tag “|safe” in the template code. In mustache, you have double-mustaches for escaped ({{name}}) and triple-mustaches for unescaped ({{{html}}}). These encourage using the escaped option. With WordPress, using standard interpolated PHP, you have to be more careful. The simple <? echo $name; ?> will result in an unescaped value. To get an escaped value, the slightly longer <?php echo esc_attr($name); ?> or <?php esc_attr_e($name ?> can be used.

The above methods work for values inside and outside of tags, as they escape, at a minimum, <>"'. The angle bracket escaping deals with variables inserted outside of tags, and the quotes deal with data inside of tags (such as attributes).

There is one important case that they don’t cover, though: sometimes attribute values can do bad things. An href attribute can be set to javascript code, and if the users click on it, it will run. This is a security problem.

This means that, assuming the data isn’t validated first, the following mustache code is insufficient:

<a href="{{url}}">{{name}}</a>

Before it gets rendered, the URL must be sanitized to remove the javascript: URL scheme if it’s present.

Different tools have different ways of dealing with this:

  • Django doesn’t appear to have anything built-in, but the Bleach library covers this in its sanitization of HTML, at least partially.
  • Ruby On Rails has a built-in HTML sanitizer.
  • mustache, while extremely convenient for escaping HTML, doesn’t solve the URL problem, and so developers using it will need to look elsewhere. Google Caja would be one option.
  • WordPress has my favorite solution of all of these: the esc_url function. Running a full HTML sanitizer is overkill when using templates. I think it got a nice solution out of necessity: WP can be set up so users that aren’t logged in are allowed to comment and leave a link to their website, and malicious users could easily submit a link with a javascript: URI scheme. I took a look at the source for esc_uri and the functions it calls and it seems to handle a lot of cases, including a value starting with javascript:javascript: and the URI scheme being in upper case.

Most of these aren’t commonly known, and yet many, many apps provide ways to add URLs. Because of this, I think this must be a very common way of doing XSS attacks. It’s frustrating that I don’t have a quick way for developers to get up to speed on this, no matter what tool they’re using.

A better solution

I’d very much like a JSON test suite and a list of recipes to run the test suite in various languages, with extra recipes for languages where there is a framework or a library that makes the code shorter.