When HTML Can Be Better Than JSON
It all starts like this:
- Create a ".html" file and open in a browser.
- Play around with Javascript to build some interactivity.
- Later, grab a framework and return some data from a server in a custom JSON structure.
That’s the beginning of many people's career as a Web Developer.
However, the communication between servers and browsers are more than just a bunch of random JSON payloads and JavaScript frameworks. There are simple hidden ideas
This post is going to talk about them.
More precisely, this post is going to talk about how the HTML Content-Type can provide better foundations to build Application Programming Interfaces. The use case for this is when you want to expose the website's workflow
You can apply some of these fundamentals in a custom XML API. However, this post is not about XML. With XML, you create a new representation of your website for a different purpose in another URL that is different from the one you serve the website.
You can also apply most of these fundamentals in a custom JSON API. However, besides the same point as with XML above, for a JSON API to be useful, you need to choose a hypermedia message format specification like
There's no problem with XML or JSON. However, if you intend to expose the website's workflow, there are alternatives. This post is only for the use cases where you already have a website that provides visual data for a human to consume it. In that case, you also want to expose the very same workflow for machines.
HTML-like content types can provide better foundations to build APIs for the Web
Let's start at the beginning.
When you publicly serve static files from your computer, an S3 bucket or Github Pages, you need a web server. If you're serving a web page, that server has to return HTML in the first interaction for that page, even if it’s a single script tag to load a bunch of JavaScript code.
That understanding is critical.
What many developers don’t get is the HTML the server produces is merely a message, as in a piece of information that it transfers through the network. That information happens to be understood by the code which runs inside the browser. The HTML specification defines how the browser should render that message visually in a backward compatible manner.
However, HTML is not only for rendering. You can add metadata to the markup so that other kinds of non-visual clients can interpret it.
There’s a server somewhere, even if you don’t see it.
A long time ago, there was this Search Engine called Google which had a compelling idea to rank the relevance of websites. They did so by writing code that would read the HTML and look for hyperlinks (<a>
). Once they indexed how many websites were linking to each other, they could quickly check which ones were more relevant than the others. This way, when the user looked up for a search term, the most relevant websites would stay on top of the search results.
Google added a lot of additional heuristics since the first time they crawled the web. However, the fundamentals of crawling hyperlinks are still there. Somewhere.
You can effortlessly write software that can understand what the author of a website meant. Given how the HTML specification defines HTML and hyperlinks, the author of the crawling script and the author of the page don’t need to know each other; they only need to build code that can write and read the message from each computer in a standard format.
Given how HTML defines anchor tags, you can write code to crawl any website and follow links, as long as the author of the website can produce a meaningful message that follows the HTML specification of how to create anchors.
A Web Crawler is not the only type of client that doesn’t care about how the browser renders the HTML. The code for a headless test automation script, such as Selenium or Chrome Headless, only cares about how to identify specific elements of the page. The automation code doesn't care if the underlying test engine has to simulate full browser rendering or not.
For example, if you want to test how an authentication system works, you write a script that can identify the input fields and the form to submit. If you’re testing a website you control, then you can write attributes in the markup to identify those elements. This way, those attributes become the contract between client and server to identify the elements in the page for a test automation script. As long as they remain in the right places after you refactor the HTML, the code that depends on them never breaks.
If the server adds the authentication in other places, say the navigation menu, the code you write for the automation script doesn’t need to change.
You can enhance the website’s HTML to provide attributes that a test automation script can consume. That test automation script can understand the website’s workflow.
Given all this, you may want to create specific attributes aimed towards test automation scripts. For example, a “test” attribute with the value “open product details,” or a “QA” attribute with the value “checkout.”
That’s a mistake.
Keep in mind there are other clients besides a test automation script that might be interested in the workflow of the website. If you identify the elements using attributes relevant to their domain, then other types of clients can consume it, not only test automation scripts.
It’s very hard, sometimes impossible, to change the name or value of an attribute once it becomes a contract unless you create a new element. The construction of test-specific attributes creates coupling between the test automation script and the attributes of the website. If you name the attributes with a terminology that is specific to test automation clients, that name makes no sense for a client that is not a test automation script.
Do not create HTML attributes that are test-specific
Here’s an example of another type of client besides a test automation script:
Say you work in a Social Network website with a bad UI. The company hires a new designer to redesign the website. The designer wants to know the triggers where users click to update their privacy settings.
In a real-life scenario, the most efficient solution is to get up from the chair and talk to the developers. However, everybody is working remotely in different time zones. Communication has to be asynchronous.
The developers decide to code the website in a different way to solve this problem. Some buttons that can trigger the user’s privacy settings are marked up with a class containing the prefix “company name” and “privacy.”
In the page which has the news feed of this Social Network, there’s an “always post as anonymous” shortcut button with the class “company privacy always anonymous.” In the “settings” page, under the “privacy” group, there’s a button with the same semantic. In both places, the functionality is the same. Therefore, it has the same class.
The developers write a script that the designer can add to their browser bookmark. Every time the designer logs into the website in a test environment, they can click on the script, and it highlights all the privacy controls on the screen.
In the example above, there’s also a client-server relationship. The client is the JavaScript code that the designer executes from the bookmark. The server is the Document Object Model which serves the metadata for the script to identify which privacy controls are visible. In this context, the client is the consumer of the metadata; the server is the producer of it.
The HTML attributes are not specific to a given client. Therefore, you can also read them from a test automation script. You don’t need to make changes to the website unless it’s to add a new identifier for another element on the page. All the scripts that deal with the existing elements always work, they survive the test of time without significant maintainability impact when you add a new element to the page.
Besides, you can release those attributes as a Public API. You can deploy everything to production so that third-party clients can effortlessly write the same kind of scripts to read the website’s workflow as a non-visual machine, not as a visual user.
You can enhance the website’s HTML to provide attributes that any script can use, including third-party consumers.
These principles are not just for HTML. You can do the same thing with raw JSON.
- You return a bunch of “link” properties that can point to other websites or pages.
- You design endpoints which can return custom JSON to represent your website’s data and write a test script for it.
However, just because you can, that doesn’t mean you should.
Imagine you’re writing code against a raw JSON structure that is specific to every page. There’s a high degree of coupling between the code you write and the message written in JSON. Ideally, you would write code to traverse the message structure and have a standard, or a language,of how to identify the attributes. That has a whole development overhead of its own.
If the server returns HTML and the client code uses a parser such as “query selector” to look for specific attributes identifying the elements, the server can change the whole structure of the HTML without breaking the clients. You also don't need the overhead to write any code to traverse the message; there's a standard language right there.
By default, the structure of the HTML is not specific to the website. You can add, rename or remove tags and the browser will render the page differently, it won't break. The “query selector” function understands the HTML specification. As long as the attributes are in the right elements, the clients won’t break either. If there’s a requirement that may drive you to rename an attribute, consider that as a new element for backward compatibility.
The code which retrieves the latitude from a message written in a JSON structure specific to the website breaks when the server changes the structure. That doesn’t happen if you use HTML and "query selector," as you can see in the examples below:
The HTML examples store the value of the latitude in an array. Say you have many latitudes for Sydney at different places, you can add a new "sydney latitude" element and the code still works:
The “query selector all” function understands "text/html". If the server changes the structure of the message, the code still works. With a hypermedia specification like this, you can use the code somebody else wrote, like "query selector" and be sure it never breaks.
If the server returns raw JSON as a response, clients have to write code that is tightly coupled to the structure. Any significant change to the structure break the clients.
If you want to use JSON and create a robust integration between two computers using HTTP for many different purposes, you need to use a standard hypermedia specification that can tell you where servers should put identifiers and where clients should look for them. You need to choose a hypermedia specification like Jasonette, Mason, Collection+JSON, HAL or JSON API. You need a language.
If the server uses no specification at all, you have to write specific code to interpret the message only for that website.
That creates coupling and a considerable cost of development.
The most common workaround for a fragile API without a language — where every small change can break clients — is to start sending versions in URLs or headers instead of versioning the message or the clients. Given there’s no specification and every change can break stuff, you need to add versions and increase the maintenance cost of the website.
Roy Fielding has had many rants about it in the past:
If you look at how the browser codes against HTML, you’ll see it has already solved the problem of API versioning. You can change the structure of the website, and the page won’t break, there's no need for website versioning. As long as the HTML syntax matches the DOCTYPE and the browser supports that element or attribute, you can render something correctly without breaking anything.
HTML is an API to the browser. It has a specification that can give you for free a lot of the benefits an API built with JSON without a hypermedia specification doesn’t have.
HTML is not just some magical thing to render a website. The server writes on that language to create an API the browser can consume and provide value to visual clients — the humans. It can also be enriched by the server to provide metadata for non-visual clients — the machines.
server -> HTML -> browser -> human brainserver -> HTML metadata -> browser -> machine
You can write HTML in ways that can allow other kinds of clients to understand what the server is trying to say:
- Follow links to other pages or websites.
- Write test automation scripts that don’t need to change if the elements of the page change.
- Write bookmark scripts that can highlight certain elements.
- Expose a Public API for third-party consumers to understand the website's workflow.
Given there’s a significant cost to adopt the right hypermedia specification for JSON, prefer as a sensible default to enhance the website’s HTML and leverage a battle-tested specification that is already there.
HTML is not only for rendering. It’s not only for the browser.
HTML is a great message format that can be enriched to serve as a foundation to build great APIs and expose information about the website’s workflow without significant effort and high Return On Investment. In this specific circumstance, you may not need JSON at all.
It’s time to challenge your assumptions.
Stop trying to fit a square wheel into a standard car.
Also, stop to reinvent the wheel, unless you can clearly show how the new wheel can be a better fit.