December 31, 2023

RDF and HTTPS

HTTPS is becoming the new norm on the Internet, but in most places, HTTP is still the "default". One of them is RDF ‒ being stuck with a particular URI to identify vocabulary terms means that even a minor change of one "s" in the URI scheme makes it identify a completely distinct resource. For this reason, RDF URIs (and similar things like XML namespaces) will never change to https. However, this is not such a big deal, for at least two reasons:

  • URIs in RDF are not "meant" to load data from. They should be dereferenceable, to make it possible to find out more information about the resource from the URI itself, but there is no obligation. Especially when talking about vocabulary terms, there are not that many automatic and security critical processes that would request descriptions of every encountered vocabulary term, and arguably there shouldn't be ‒ if it is so important to load the vocabulary from a trustworthy source, you should not download it from the URI in the first place, since it should have already been on your disk and verified manually. HTTPS is not going to save you from network outages, random errors, or hacked webservers, and the worst thing a potential attacker could cause through HTTP anyway is to break your inference.
  • Browsers have already started treating http in URIs as not as binding, instead they may switch to HTTPS under various conditions, and they send the Upgrade-Insecure-Requests header for the server to direct you to HTTPS as quickly as possible. With HSTS (HTTP Strict Transport Security), you request the browser to never use HTTP only for the given domain, and even if entered manually, the browser will switch to HTTPS (with no bypassing). It is also possible to preload a given domain to be stored in a list that is used by all modern browsers, so that first HTTP-only request never ever has to take place.

In case you actually are making an automated RDF vocabulary loader, it is a somewhat important question how to interpret http in URIs. There are a few possibilities:

  • Do nothing special. If there are no particular requirements, I don't think there is anything wrong with upholding the protocol indicated by the URI.
  • Many web servers always auto-redirect to HTTPS, disregarding Upgrade-Insecure-Requests. Arguably, if security is so important for these servers to ignore user's preference, they should also send Content-Security-Policy to make this redirect automatic. If you cache this information, you can feel safe if you treat http as https next time you make a request (but only for the connection, don't rewrite it in RDF!).
  • Due to the forced redirect most websites do, it may be a worthy optimization to use HTTPS automatically and revert to HTTP only in case of issues (can't connect or no actual RDF description found).
  • The HSTS preload list is available to download, and you can use this to skip the first round of HTTPS redirecting altogether. There are a few nuisances though ‒ the list is very huge but stored as JSON (and the download is base64-encoded), which is not such a great streaming format, and there are also comments. Its location has also been needlessly changed at least once, so better be on guard for surprises.

I decided to explore the last point and assess how exactly would this help the vocabularies found on the web. The results were not great...

June 12, 2023

JSON-LD is the new XML

It's been a pastime of mine to compare XML and JSON, as a mature, expressive and self-descriptive format against one that is ambiguous and unextensible. As many of you should realize by now, this comparison is actually quite meaningless ‒ these two formats have a vastly different focus and primary area of use ‒ XML is focused on documents (apparent for example in situations where formatting may be relevant), while JSON is focused on representing commonly-used structures in programming languages (well, only those in JavaScript) and nothing more and nothing less. There are however usage areas where these two formats overlap, and that is when describing entities or objects of various kinds, linked together using properties. In other words, linked data.

March 13, 2023

Converting between XML and JSON

There are hundreds of things I could be doing now, so... let's think of a way to convert from JSON to XML and vice-versa!

Why? Because why not? JSON seems to be becoming quite a prevalent format for storing and transmitting data, despite its lack of mature tooling, visualization, comments or extension mechanisms, so it might be advantageous to work with XML which, despite being constructed from JSON, would be reasonably "natural". Let's get started!

November 28, 2022

Mapping .NET exceptions to XPath Errors

Because why not?

The namespace http://www.w3.org/2005/xqt-errors# (shortened to err:, also described here and here) hosts a number of error codes relevant in the context of XPath functions, XSLT or XQuery. While these errors are usually useful only to XML processors, there is really no reason not to use them when describing errors in general. Imagine you are trying to use RDF to describe the result of a process, or perhaps monitoring a single programming function. The function may fail, end in an exception or otherwise not produce a desired effect, in which case it is useful to be able to use a standardized identification for the cause of the error. Of course the original exception would be more useful to people actually fixing the error, but this should be "language-agnostic", potentially translatable to any different language which uses similar concept of exceptions or errors.

I decided to browse the error codes listed in the namespace, and tried to match them with exception types in .NET. There are not that many of them that align, but the useful ones are still representable. Coupled with properties like err:code, err:description, err:value and err:line-number, these may be useful when describing arbitrary program errors in RDF.

July 26, 2022

Thought on the Future of Computers

I am going to make a prediction about the future of computing, not in a distant future but more likely what might be in a few decades. It's also not a particularly creative prediction, because the process that might eventually lead to it has already started. Yet it's interesting to imagine, so here it is:

June 14, 2022

RDF-star and prototypes

Updating the RDF specification to create a new version is not an easy task. While getting new features added in is (relatively) not that hard, the cloud of related specifications, such as SPARQL, SHACL or OWL, is pretty large, and the whole ecosystem of applications and databases made to work with it is even larger. As such, getting new features actually out in the field is definitely not trivial, and some of the issues that may arise as part of that are still apparent today ‒ RDF 1.1 came out in 2014, while the latest version of SPARQL is still from 2013, and as a result, plain literals in SPARQL are still not quite exactly the same as string literals, even though RDF 1.1 doesn't make the distinction anymore.

A different approach is to develop a new version sideways, i.e. create a variant of RDF just with your own favourite new feature. This is similar to how XML is defined, as a set of specifications whereby the core XML specification, to the surprise of many, doesn't actually define anything related to namespaces, xml:lang or datatypes. The folks at RDF did a similar thing and created RDF-star, which is why this flew under my radar for so long.