March 12, 2022

Delphi form data (TPF0) binary format

Delphi uses a Pascal-like syntax for .dfm files which are meant to store data about forms and various controls, but also other resources and miscellaneous data. These files are compiled to a special binary format and then used to construct and initialize the forms when the application starts.

I wanted to parse these files to retrieve some of the more useful data therefrom, but I wasn't able to find any documentation about the format, so I analyzed it myself and that means you don't have to. What follows is therefore a description of the format and the data model is stores. Some basic knowledge of principles of data storage is expected.

The file's header consists of the 4-byte signature (TPF0 in ASCII), followed by a string that identifies the type of the object, and a string containing the name of the object. All strings are stored in a variable-length format, prefixed by their length as a single byte and with no terminating characters. Some of the stored strings (like the first two) are also symbols, meaning they refer to existing entities within the program. In that case, their name should be a valid identifier (a letter or underscore followed by letters, digits or underscores) or multiple identifiers joined by a dot (.).

The main object's name is followed by its data properties. Each property starts with its name as a symbol, followed by its value. If the name is empty, there is no value and there are no more properties. The value of the property starts with its type identifier, expressed as a byte with a specific value (little-endian). These are the types that I encountered, and how they may be mapped to .NET types:

  • 0x02: signed byte (1 byte); System.SByte.
  • 0x03: signed int16 (2 bytes); System.Int16.
  • 0x04: signed int32 (4 bytes); System.Int32.
  • 0x05: double-precision float (8 bytes); System.Double.
  • 0x06: a string; System.String.
  • 0x07: a symbol; System.String.
  • 0x08: the value false (0 bytes); System.Boolean.
  • 0x09: the value true (0 bytes); System.Boolean.
  • 0x0A: a binary blob, prefixed by its length (int32); System.Byte[].
  • 0x0B: a list of symbols, terminated by a zero-length symbol; System.String[].
  • 0x0D: the value null (0 bytes); System.Object.
  • 0x0E: a list of items, each item is prefixed by its type, either 0x00 ending the list, or 0x01 for a dictionary (no other item types known), followed by a standard list of properties; System.Collections.Generic.Dictionary<string, object>[].
  • 0x01: a list of typed values, the form is same as a property value; System.Object[].
  • 0x00: terminates 0x01 (0 bytes).

The list of properties is followed by a list of nested objects, each starting with its type and name (as symbols), followed by its own list of properties and nested objects. If the nested object's type name is empty, the list of the objects should be immediately terminated.

Well and that's all to it, at least everything I was able to find. If you happen to find some other possibilities, feel free to let me know (with examples).

Since I intended to use this knowledge in .NET, here is some code that should be able to read these files and convert them to a tree-like structure. The result is mostly similar to JSON, with a few more types, and explicit names and types of objects, so perhaps YAML might be a better choice if you want to convert it to a standardized format.

No comments:

Post a Comment