April 1, 2022

The Glory of Semi-Structured Data

A not-so-recent trend in information storage has been an increase of use of an interesting blend of structured and unstructured data. Such a style of expressing information is convenient for users, and with the advent of neural networks and prompt-based programming, also manageable for automatic processing.

March 12, 2022

Delphi form data (TPF0) binary format

Delphi uses a Pascal-like syntax for .dfm files which are meant to store data about forms and various controls, but also other resources and miscellaneous data. These files are compiled to a special binary format and then used to construct and initialize the forms when the application starts.

I wanted to parse these files to retrieve some of the more useful data therefrom, but I wasn't able to find any documentation about the format, so I analyzed it myself and that means you don't have to. What follows is therefore a description of the format and the data model is stores. Some basic knowledge of principles of data storage is expected.

February 14, 2022

The Models of CONCURrent SGML

Nowadays, the old glory of SGML is almost forgotten as it got replaced by XML in all but a handful locations, but the language itself still has some very impressive capabilities that, I dare say, might yet make it a viable option for some very specific cases in the future. One of those features is "CONCUR", a.k.a. the original namespaces. This comparison is inaccurate of course, so I will first take a few paragraphs to describe what this feature is, and how it relates to the whole SGML ecosystem.

December 25, 2021

Identifiers for some file formats

I managed to find a gold mine when it comes to URIs and identifiers: the DocBook notations module. DocBook is an older SGML/XML-based standard developed by OASIS used for writing documentations, and with its DTD comes a module that “imports” several then-common notations for use with unparsed entities.

December 21, 2021

What Flavour is Your Function?

One programming article worth reading is definitely What Color is Your Function? by Bob Nystrom. A good old rant about the design of several programming languages, one that separates them into two groups of good languages and bad languages, depending on a presence of a certain feature.

The gist of the article is describing a feature called “colored functions” and then analyzing languages depending on whether there is some mechanism akin to this. The presence of a callback mechanism for asynchronous programming is basically the only application of these colored functions, but I believe pointing out other similar mechanisms is worth doing.

December 12, 2021

XML Lite (definitely not XML 2.0)

Years after the conception of XML, it had been customary for programmers to point out its perceived flaws and suggest fixes to the language. This trend has somewhat declined in recent years, as developers have either learnt to live with XML, or discovered that they can use something else.

I do not share these feelings towards XML. In many cases, I find it suitable for various purposes, and I think many cases of criticism stemmed from misunderstanding. Yet even I would like to add some features to the language, based on my experience. Let's take a look at them.

November 29, 2021

Finding the UUID for almost anything

Universally Unique Identifiers are a nifty way of obtaining identifiers for resources, objects, or concepts, without the need for a central assigning authority. Arguably the largest public use of UUIDs is from Microsoft's products, where they (known as GUIDs) identify classes or interfaces within COM, for example 450d8fba-ad25-11d0-98a8-0800361b1103 identifies the My Documents folder, accessible via shell.

What's less known is the fact that UUIDs have a specific structure and are not necessarily composed of random numbers. The earliest generated UUIDs used time as one guarantee of uniqueness (the UUID in the example above was created in 1997); these are version 1 UUIDs. Nowadays, you can still use them if you want to preserve the creation time in the identifier, but the most common are version 4 UUIDs that consist almost entirely of (pseudo-)random bytes.

Usually, one associates the generation of UUIDs with some random process that produces different identifiers each time it is invoked. A less known version of UUIDs, however, makes it possible to produce identifiers for certain resources deterministically, that is based solely on some input data and producing the same result each time.