March 13, 2023

Converting between XML and JSON

There are hundreds of things I could be doing now, so... let's think of a way to convert from JSON to XML and vice-versa!

Why? Because why not? JSON seems to be becoming quite a prevalent format for storing and transmitting data, despite its lack of mature tooling, visualization, comments or extension mechanisms, so it might be advantageous to work with XML which, despite being constructed from JSON, would be reasonably "natural". Let's get started!

Limitations

The system that is going to be designed here will not be perfect, and will absolutely be ambiguous. XML and JSON differ in what their concept of "data" means, and as a result, not all JSON documents might be convertible to XML in a nice way.

Non-object JSON documents

As a first consequence, the JSON document has to be a JSON object to be converted, not an array or a value, since XML requires you to have a root element. The object has to be non-empty, since otherwise there is nothing to convert.

Primary properties

Every JSON object should have a property that is the most important one for some definition of important, such as the most complex, the longest, or the most varied. This property should be always present, but may have any type, except for object. The presence of this property implies the use of a given XML element with the same name, so if a property "prop" is primary, the resulting element created from the object would be <prop>. You can already see here that a property that does not have an XML name cannot be directly converted to an element.

The content of the element is taken from the value of the property. If the value is null, xsi:nil="true" could be used to indicate that, and xsi:type may also be used to specify the proper type of the value. If the property contains an array, in which case all objects within may be encoded similarly to the root object, while values could be encoded like other properties. Nested arrays cannot be encoded in this way.

Other properties

XML attributes are the prime choice for the other properties on an object, if there are any. They are unordered, but they are also ambiguous ‒ the type information is erased from the attribute, and you need a schema (or XLite) to retain it. Nevertheless, all types up to arrays of scalars can be represented, which can be encoded as their values joined with the space character.

Complex properties

If the value of a property is another object, its properties can be flattened an encoded as attributes on the parent object, joined with ., such as a.x and a.y for sub-properties x and y on a.

Extensions

There is no standard mechanism for extending JSON documents with external properties, but there are some specifications which do have their own "keywords". I know of JSON Schema (starting with $) and JSON-LD (with @). These could be converted to full-fledged XML namespaces, such as jsonschema:ref created from $ref with xmlns:jsonschema="https://json-schema.org/draft/2020-12/schema", or jsonld:context from @context with xmlns:jsonld="http://www.w3.org/ns/json-ld" (these namespaces are not definitive). Other schemes for namespaces might map to XML namespaces directly.

No comments:

Post a Comment