February 20, 2021

Null in RDF

There is nothing like null in RDF, but sometimes it is necessary to express its meaning in RDF documents as well. The issue with null however is that its semantics can vary from use to use, and thus one has to think about the intended meaning before going for one of the alternatives, as it may negatively affect the consistency of the data when an incorrect representation is selected.

Let's look at some examples.

No information

Null is commonly used to signalize lack of information regarding a value of a particular property. RDF makes it very easy to express when there is nothing to be said about a property: simply don't write anything. If data validation permits it, such a graph cannot be used to infer anything about the value of said property, which is precisely what we wanted.

Some but unknown value

Plain RDF also supports the case when we know that there is some value, but we cannot express its nature in any way. This is the perfect use case of blank nodes:

ex:Anna ex:brother [] .

We do know that Anna has a brother, but we cannot identify him. This is also useful when we want to express values that are unknown but shared, like when there are more people who have the same (but still unknown) brother.

In cases when we know that some value must exist, using blank nodes may pass validation where the absence of any value may not.

It is also possible to use OWL to express cardinality restriction on a property, thus we can infer that Anna must have a father, even though we don't mention him anywhere:

ex:Person rdfs:subClassOf [
  a owl:Restriction ;
  owl:onProperty ex:father ;
  owl:cardinality 1
] .
ex:Anna a ex:Person .

Here we specify that every person is something that has exactly one father, so it must hold for Anna as well.

Null in source

When our data comes from another format, we may not know its real nature, but we may still want to preserve it. With some care, this is possible as well.

Some languages use null synonymously with an empty sequence or tuple, in which case rdf:nil (the empty list) is perfect to use. For text-based formats such as XML or CSV, using an empty string might be sufficient as well.

When this is not applicable, one has to use something less common, like a proprietary URI or datatype to express the null value. I wasn't able to find anything recommended by W3C, but something like "null"^^<tag:yaml.org,2002:null> taken from YAML seems perfectly applicable to me.

In any case, one has to be careful when using a special value like this, as it can make the graph inconsistent under some entailment. This includes every situation when a domain of a property is restricted, and using null will (by definition) put it in such a class. While null does act like a bottom type in some type systems, this must be permitted when used in RDF. For illustration, the classes for SKOS concepts and SKOS concept schemes are disjoint, and using null in both places will put it in both classes, making the graph inconsistent (null cannot be both a concept and a concept scheme).

A contradiction

Sometimes a null value means an error or an issue of some sort, in which case such an inconsistency can be encoded in a graph as well:

process:1 ex:result "\u0000" .

With RDF 1.1, the literal has a type xsd:string, and under RDFS 1.1 entailment and XML Schema, the null character is not a valid part of the datatype (as it cannot be represented in XML text). This is to be interpreted as something that should not exist, thus the graph is automatically inconsistent. Beware that use of such graphs may lead to issues when transferring data between different formats.

Not applicable

Specific combinations of properties may render others unusable, thus specifying their values should be considered an error. This should be described by the used ontology, by making their respective classes disjoint.

No value

Perhaps the most interesting case is when there is simply no value of the property and we know it. It would be an error to specify one (even a representation of null) in this case, as simple queries may consider it a valid existing value. Instead it's better to introduce an auxilliary property that carries the information about its absence:

ex:Anakin ex:hasFather false .

OWL can be again used to make sure that the values of ex:hasFather and ex:father are logically linked:

[
  a owl:Restriction ;
  owl:onProperty ex:hasFather ;
  owl:hasValue false
] owl:equivalentClass [
  a owl:Restriction ;
  owl:onProperty ex:father ;
  owl:cardinality 0
] .

[
  a owl:Restriction ;
  owl:onProperty ex:hasFather ;
  owl:hasValue true
] owl:equivalentClass [
  a owl:Restriction ;
  owl:onProperty ex:father ;
  owl:minCardinality 1
] .

No comments:

Post a Comment