July 31, 2015

The Cloneable pattern

For most languages that deal with OOP, one concept is always quite hard to approach. The cloning of objects. First of all, there are two implementational methods of cloning – deep cloning and shallow cloning. Deep cloning analyzes the whole structure of the object and the objects the instance references, and provides a complete copy, separated from the original structure. Shallow copy, on the other hand, only copies the most accessible data, the technical value of the object (in C#, values of all fields).
In C#, shallow copy of the current object is provided by the protected method MemberwiseClone, which is almost totally useless (good only for simple sealed classes). Not only it isn't accessible outside the object, but in most cases, you cannot use it to help you with implementing your own cloning, as it returns an already initialized object.
Fortunately, there is an interface which is used for this purpose – ICloneable. However, there is a commonly adressed problem about this interface: does it mean deep or shallow copy?
In turns out, neither. This is what the documentation says:
Supports cloning, which creates a new instance of a class with the same value as an existing instance.
Now what is the value of an object? The technical value are the values of all non-static fields of the instance, including references, but the semantical value depends on the semantical meaning of those fields. A field (or a property) can mean two things: association and composition (used mostly in C++). Association simply associates two objects, they reference each other, but do not depend on each other. Composition puts objects into ownership state, where one object owns other objects, which can't exist without the owner. The IDisposable pattern is similar to composition, when there are resources that are disposed if the owner is disposed (even if they are referenced outside the object).
So a field basically means two things: "I exist outside of this object, and the object only uses me", or "I am part of this object, and I can't exist without it". The second meaning is what makes the semantical value of the object.
Unfortunately, C# doesn't provide the means to differentiate between those meanings (it is similar to C++/CLI reference types with value-type semantics, like System::Object without the ^). If I have a Wall that contains a Window, it is in the ownership relation, and thus cloning it should yield a new wall with a new window. On the other hand, if the wall keeps a reference to the House it is in, cloning shouldn't create a new house.
Enough theory, how to implement it?

public class House
{
    
}

public class Wall : ICloneable
{
    public Window Window{getprivate set;}
    public House House{getprivate set;}
    
    protected Wall(Wall prototype)
    {
        Window = (Window)prototype.Window.Clone(); //creates a new window
        House = prototype.House; //keeps one house 
    }
    
    public virtual object Clone()
    {
        return new Wall(this);
    }
}

public class Window : ICloneable
{
    public int Size{getprivate set;}
    
    protected Window(Window prototype)
    {
        Size = prototype.Size;
    }
    
    public virtual object Clone()
    {
        return new Window(this);
    }
}

This utilizes the copy-constructor technique to initialize an object from a prototype, and allows extensibility for derived classes to implement their own cloning. The copy-constructor is protected to make programmers use the Clone method instead of the constructor.
A problem with this approach is that the cloning method doesn't specify what it returns, although it it the part of the contract.

public class Wall : ICloneable
{
    public Window Window{getprivate set;}
    public House House{getprivate set;}
    
    protected Wall(Wall prototype)
    {
        Window = prototype.Window.Clone();
        House = prototype.House;
    }
}

public class Window : ICloneable
{
    public int Size{getprivate set;}
  
    protected Window(Window prototype)
    {
        Size = prototype.Size;
    }
  
    object ICloneable.Clone()
    {
        return CloneImpl();
    }
    public Window Clone()
    {
        return (Window)CloneImpl();
    }
    protected virtual object CloneImpl()
    {
        return new Window(this);
    }
}

public class TrueWindow : Window
{
    protected TrueWindow(TrueWindow prototype) : base(prototype)
    {
      
    }
  
    public new TrueWindow Clone()
    {
        return (TrueWindow)CloneImpl();
    }
    protected override object CloneImpl()
    {
        return new TrueWindow(this);
    }
} 

There is no real way for the ICloneable interface to limit the implementing class to return itself, but at least this way it is apparent for the outside that it does so, at the cost of three methods in the base class and two methods in each derived class.
Or is there are way to limit it better? What if we introduce a generic ICloneable<T> interface to do it? Well, let's see:

public interface ICloneable<out T> : ICloneable where T : ICloneable<T>
{
    new T Clone();
}

public class Wall : ICloneable<Wall>
{
    public Window Window{getprivate set;}
    public House House{getprivate set;}
    
    protected Wall(Wall prototype)
    {
        Window = prototype.Window.Clone();
        House = prototype.House;
    }
    
    object ICloneable.Clone()
    {
        return this.Clone();
    }
    public Wall Clone()
    {
        return (Wall)CloneImpl();
    }
    protected virtual object CloneImpl()
    {
        return new Wall(this);
    }
}

public class Window : ICloneable<Window>
{
    public int Size{getprivate set;}
  
    protected Window(Window prototype)
    {
        Size = prototype.Size;
    }
  
    object ICloneable.Clone()
    {
        return CloneImpl();
    }
    public Window Clone()
    {
        return (Window)CloneImpl();
    }
    protected virtual object CloneImpl()
    {
        return new Window(this);
    }
}

public class TrueWindow : Window, ICloneable<TrueWindow>
{
    protected TrueWindow(TrueWindow prototype) : base(prototype)
    {
      
    }
  
    public new TrueWindow Clone()
    {
        return (TrueWindow)CloneImpl();
    }
    protected override object CloneImpl()
    {
        return new TrueWindow(this);
    }
} 

Not that hard, actually. We've now introduced a contract that an object implementing ICloneable<T> clone-returns T. We can't really limit it to the actual type of the object, but this is the best we can get.
But was it really necessary?
In our attempt to get rid of a cast, we have introduced a new painful-to-implement interface, and lots of other unnecessary methods in all classes. What we really needed was a method that clones the object and casts it to its type. Extension methods to the rescue!

public static T Copy<T>(this T obj) where T : ICloneable
{
    return (T)obj.Clone();
}

The only problem left is cloning structures, which require boxing and unboxing the value. Although structures aren't usually used to represent such complex principles, we can clone them, too:

public struct WallStructure : ICloneable
{
    public Window Window{getprivate set;}
    
    private WallStructure(WallStructure prototype) : this()
    {
        Window = prototype.Window.Clone();
    }
    
    object ICloneable.Clone()
    {
        return Clone();
    }
    
    public WallStructure Clone()
    {
        return new WallStructure(this);
    }
}

As you can see, it is possible and not so hard, but you shouldn't really design structures like that. As structures are passed by value, one might expect that it is the same what Clone does (although if it was, there would be no need of the public method and the copy-constructor at all). ICloneable on structures is useful only when you want to clone the boxed value, not the unboxed one.

I hope I have clarified the correct way of cloning objects, and the semantical value of objects. See you next time!