IS4 Code Blog

April 29, 2021

Why you should never unescape a URI

The street address of the Internet, the URI, is an interesting piece of syntax. Its specification is longer than those of some data languages, and together with all the existing schemes, it would take a couple of books to describe it all. The simplified form of an (absolute) URI is this:

scheme://host/path?query#fragment

Virtually all portions except the scheme can be omitted including their surrounding syntax, making the format very flexible and usable for a multitude of cases. Yet there are some cases where some of the design decisions cause troubles.

URI scheme for identifying linked data entities by their identifier

It is a good practice in RDF to identify entities with a URI scheme that may allow one to eventually arrive at a description of the identified entity. However, the individual URI patterns aren't semantical, and there isn't a uniform standard that would allow identifying an entity within a particular dataset. This is an attempt to devise one. The resulting URI should behave like a "link" that connects to the dataset and finds the actual URIs used for the entity. In theory, it could even be used directly to identify the entity itself.

Identifying resources works by traversing inverse functional properties. This kind of a property (an instance of owl:InverseFunctionalProperty) behaves like a function from its range to its domain. Thus a particular value assigned to the property serves as its primary key which, when used with the property, uniquely identifies the resource.

More about datatypes in RDF

Datatypes in RDF are arguably the most confusing and also underutilized tool in RDF. To understand datatypes, we first need to understand literals.

Literals

While RDF and XML are very different in many aspects, they share some core concepts. XML at its core doesn't really have standard datatypes like you'd find in many programming languages and other data languages like JSON. Instead, you have text (or character data if you will) which might get a specific meaning via other facilities, but to an external observer, everything is only text (or whitespace if you want to go into details).

RDF is very similar to XML in that a literal is simply a piece of (character) data. Unlike XML however, it is also possible to assign a datatype to the literal. The notion of a plain (untyped) literal was changed somewhat in RDF 1.1, making xsd:string the implicit datatype (more on that later). Specific serialization formats may define syntax for other common typed literals, such as numbers or booleans, but all of them are still backed by text. Thus a literal is simply a piece of text, optionally with something that identifies its datatype (we forget language-tagged literals for now).

Recommendations for producing linked data

While working with RDF and linked data, I have tried to come up with some tips that, in my opinion, make publishing RDF data better in a wide range of qualities. I shall call them IS4 Recommendations and list them in this article (which will be updated in case of new ones):

Dynamic Self-contained Infinite XML

Imagine a window showing some sort of a feed. There are multiple panels displaying various messages as they arrive, plus some boxes that show statuses of other things, or the time, for example. This is a complex system usually thought of as calls to different APIs, various types of messages, presentation layer etc. What if I told you however that such a feat could be done even with only standardized formats, no client-side scripting and a single XML document and HTTP request?

Null in RDF

There is nothing like null in RDF, but sometimes it is necessary to express its meaning in RDF documents as well. The issue with null however is that its semantics can vary from use to use, and thus one has to think about the intended meaning before going for one of the alternatives, as it may negatively affect the consistency of the data when an incorrect representation is selected.

Let's look at some examples.

A completely meaningless comparison of XML and JSON

I have never had anything against XML. Despite its many quirks and burdens, I think it is even nowadays a perfectly reasonable format for providing or accepting structured data, and I think it is meaningless (albeit trendy) to compare it with JSON trying to determine whichever is better. It is obvious those two languages were designed for different purposes, and the real virtue in data engineering (or software engineering in general) is to know and use the proper tools for completing a task.

In that regard, no language should come out as the winner in this comparison, but it should give you an idea when to use one or the other.

IS4 Code Blog

April 29, 2021

Why you should never unescape a URI

March 14, 2021

URI scheme for identifying linked data entities by their identifier

March 13, 2021

More about datatypes in RDF

Literals

February 27, 2021

Recommendations for producing linked data

February 22, 2021

Dynamic Self-contained Infinite XML

February 20, 2021

Null in RDF

February 15, 2021

A completely meaningless comparison of XML and JSON