December 19, 2019

The four kinds of a reference

This article applies not only to C#, but also to other languages with references, like C++.

A reference is a special type that allows you to access data stored in another location separate from the variable that contains the reference. When accessing a reference variable, indirection is automatically performed, i.e. first the actual address to the remote variable is fetched, then the object stored in the variable is accessed. This behaviour is distinct from pointers, which employ the same indirection mechanisms, but are treated as normal values and so dereferencing must be explicit.
In C#, there are three keywords that denote a reference: ref, out, and in. The latter two work only for method parameters; however readonly ref can be used instead of in for variables and return types (out cannot be used in the same way). These map to different metadata in the underlying CIL code, but all use the internal CLI managed pointer type (T&). ref is translated directly to the managed pointer type. out is marked with OutAttribute. Both in and readonly ref are marked with InAttribute, but notice that the attribute is only allowed on parameters. Rather than allowing the attribute to be used on return values in addition to parameters, or using a different type like IsReadOnlyAttribute, the type itself is actually decorated with a required modifier (modreq) of InAttribute, which is one of the rare occassions the compiler uses a modifier instead of an attribute. (After a quick search in the Roslyn repo, I’ve found out other modifiers, like IsVolatile used for volatile fields, or that the unmanaged constraint also uses a required modifier of UnmanagedType. Creative.) This also means that technically, you could have two methods with ref return that differ only in readonly, as the required modifier is part of the signature and CLI allows method that differ only in the return type (unlike attributes which aren’t part of the signature).

In C++, normal ref reference can be translated to T& while in reference is T const &, although its behaviour differs from C# (calling non-readonly methods creates a defensive copy of the value in C#, while calling non-const methods in C++ is prohibited). There is no analogue to out in C++, as it is recommended to return these values directly (and the code is usually optimized so that the least amount of copying is necessary).

So far, I’ve outlined three types of a reference, so what is the fourth one? It is a controlled-mutability reference. In object-oriented languages, such a reference allows the modification of the target value only via the methods it provides. In C#, any object reference is a controlled-mutability reference, as you cannot copy the state from one object to another without invoking its methods or setting all the fields manually. This also holds true for boxed values types: even though the CLI permits unboxing the object and obtaining a reference to its state, the state can only be changed by calling the methods or setting the fields that allow it. In other words, it always behaves like a normal object.

I am not aware of any mechanism in C++ that would directly mimic this functionality. You can turn all non-const references to an object into controlled-mutability references by deleting or hiding operator= on the type, so the state of an already constructed object will always be protected, and then use a wrapper type that will have access to operator= in case assignability is required. A simpler option is to create a special pointer type whose operator* returns a const reference but operator-> returns a non-const reference. Even though nothing stops you from calling ptr->operator=(val) or ptr.operator->() = val, at least the syntax makes it stand out and make you think whether you are doing something safe or not.

Side note: In C#, you can (semantically) turn any controlled-mutability reference to an object’s state by implementing the IAssignable<> interface:

public interface IAssignable<T> where T : struct, IAssignable<T>
{
    void Assign(in T value);
}

struct MyStruct : IAssignable<MyStruct>
{
    public void Assign(in MyStruct value)
    {
        this = value;
    }
}

This works even when a field is readonly. For reference types, implementing this pattern is a bit harder as you have to deal with inheritance, and the state of an object can only be assigned from a state of an object of the same type, so the operation may fail (virtual bool TryAssign(object value) is probably the simplest way).

No comments:

Post a Comment