June 14, 2022

RDF-star and prototypes

Updating the RDF specification to create a new version is not an easy task. While getting new features added in is (relatively) not that hard, the cloud of related specifications, such as SPARQL, SHACL or OWL, is pretty large, and the whole ecosystem of applications and databases made to work with it is even larger. As such, getting new features actually out in the field is definitely not trivial, and some of the issues that may arise as part of that are still apparent today ‒ RDF 1.1 came out in 2014, while the latest version of SPARQL is still from 2013, and as a result, plain literals in SPARQL are still not quite exactly the same as string literals, even though RDF 1.1 doesn't make the distinction anymore.

A different approach is to develop a new version sideways, i.e. create a variant of RDF just with your own favourite new feature. This is similar to how XML is defined, as a set of specifications whereby the core XML specification, to the surprise of many, doesn't actually define anything related to namespaces, xml:lang or datatypes. The folks at RDF did a similar thing and created RDF-star, which is why this flew under my radar for so long.

The star of the show are metastatements, i.e. statements about other statements. If you intimately know RDF, you are probably shouting "reification!" right now, and you are quite right ‒ reification is something that actually allows you to express the precise same set of statements that RDF-star can, so what is the advantage?

In Turtle, under RDF-star, you might write a statement like this:

<< _:a :name "Alice" >> :statedBy :bob .

I don't think explanation is needed, considering what we know about the purpose of RDF-star already. You might be inclined to think that, say, in RDF/XML, you might be able to write the same:

<rdf:Description rdf:about="http://example.org/alice">
  <name rdf:ID="alice-name">Alice</name>
</rdf:Description>
<rdf:Description rdf:ID="alice-name">
<statedBy rdf:resource="http://example.org/bob"/>
</rdf:Description>

Nope! There is a fundamental difference in the semantics, but to explain, we first have to talk about prototypes.

You might know prototypes from languages like JavaScript where it is the default way to implement object-oriented programming. Actually, JavaScript is the only language I know that does it this way, but that's true for many things anyway. In any case, an object in JavaScript is a specialization of a prototype object, adding additional properties or overriding the old ones. As a result, there is really no difference between creating an instance of a type and subtyping.

In all fairness, I actually quite like this system, and I think it is quite elegant and much better suited for a dynamic language than the traditional class-based system, while it supports most if not all of its features. However, I am not mentioning it here to adore it, but as an example of the difference between RDF-star and reification.

Getting back to the example, the first glaring difference is obviously that the RDF/XML snippet actually also asserts the statement. Automatic reification unfortunately isn't possible in RDF/XML without asserting the triple, so you would have to use a separate document in order to assess some doubt about the reified statement, for example.

The actual difference I am pointing out however is the fact that #alice-name is the actual, concrete triple in the document that is asserting it. At least in this syntax, you can form a link between a part of the serialization and a URI node, which means you could also potentially make tools that would be able to add or remove statements to RDF/XML based on their ID. Note that other RDF syntaxes do not have this feature, even with RDF-star: you may assert a triple and you may identify a reified statement (and Turtle in RDF-star makes both possible), but you cannot link a particular assertion of such a triple in the actual document or graph to a reified one.

In contrast, triple nodes in RDF-star are prototypes, the Platonic ideals of triples, which may be made concrete in specific RDF graphs. In that regard, the triple in RDF/XML is more specific than the one in the RDF-star example; while both have the same constituents, only the one in RDF/XML can be differentiated from every other occurrence of an equivalent triple.

In my mind, I imagine the following to be true for any triple <A> <B> <C>:

<< <A> <B> <C> >> # the Platonic ideal of this triple
a rdf:Statement ;
rdf:subject <A> ;
rdf:predicate <B> ;
rdf:object <C> .

[ # any occurrence of the triple
a rdf:Statement ;
rdf:subject <A> ;
rdf:predicate <B> ;
rdf:object <C>
# any other permitted property
] skos:broader << <A> <B> <C> >> .

I use SKOS to represent the prototype link between the actual instance of the triple and its ideal. I think it is a good match for the type of this relation, as SKOS doesn't really care about classes or instances; it only has concepts. You could consider the RDF-star triple to actually be a class and the concrete rdf:Statement to be an instance of it, but that is cumbersome. You don't really need higher-order set theory just to narrow down a concept to one particular occurrence of it.

A similar situation, just this time in reverse, happened as a result of the effort to specify the direction of language-tagged strings. One of the solutions was to add an rdf:CompoundLiteral, making it possible to specify not just the value, language, and direction of a plain text value, but in essence anything that might be worthy of specifying. In doing so however, the literal receives its own identity. It is no longer just a simple member of rdf:langString's value space, but an instance of such a member, one that has its own distinguishing properties. Again, I would imagine this to hold true for every literal:

"text"@en # invalid Turtle
a rdf:CompoundLiteral ;
rdf:value "text" ;
rdf:language "en" .

[ # any occurrence of the literal
a rdf:CompoundLiteral ;
rdf:value "text" ;
rdf:language "en"
# any other permitted property
] skos:broader "text"@en .

Once again it is possible to add any property to the "literal", and it is no longer identical to any other occurrence of the literal; it is your literal.


No comments:

Post a Comment