DevCity.NET - http://devcity.net
Copying vs. Cloning
http://devcity.net/Articles/196/1/article.aspx
Phil Eakins
Having followed a successful career in the legal profession for over 20 years, Phil saw the light when he bought his first home computer (an Oric) in 1982. Since then he has attained a Batchelors Degree in Computing and Maths from The Open University (and met his Wife at the same time)as well as attaining a CNE qualification. Phil has worked extensively in the fields of Network Engineering and in the teaching and training of IT subects to adults.
Published on 2/9/2006

Introduction

As a newcomer to the wonderful world of .NET (although not to programming) last year I started, as most of us do, by taking a course. It was whilst working on one of the exercises that I ran across a ‘feature’ of the .NET Framework which has caused me no end of fun and enjoyment!

I had filled an array of Char, Array1()with data and I wanted to sort it.  However, I also wanted to keep a copy of the unsorted data to revert to.  This apparently simple requirement led me deep into the field of shallow copies, deep copies, copying and cloning.

This article reflects the result of my research and I hope it will be of help to you if you have a similar requirement at some time in the future.

Introduction

Introduction

This was the problem. I had filled an array of Char, Array1()with data and I wanted to sort it.  However, I also wanted to keep a copy of the unsorted data to revert to.

So,

The unexpected result of this code was that both Array1() and Array2() were sorted simultaneously!

As I later discovered, I was the victim of what is called a ‘shallow’ copy, and yes, there is a ‘deep’ copy which I will discuss, and demonstrate, later. But how and why did this happen? The answer, I’m afraid, lies deep in the Object hierarchy and in how Objects are stored in memory.

Objects

This is a very quick primer on how Objects inter-relate. Please feel free to skip the next few lines if I’m repeating what you already know.

We start by saying that (almost) everything in .NET is an Object (capitalisation intentional) – not just something we can describe as an object, but is derived from the Framework Class System.Object. The Object Class is the ultimate parent of each and every Object contained either within the Framework or which is derived from it.

Unfortunately, below the System.Object level, things get a bit more complicated and the idea of an Object having a Type must be introduced.

Reference Types and Value Types

Value & Reference Types, Summary

As can clearly be seen, the Primitive and Enumerated Types are derived from the ValueType Object (and are, therefore, called Value Types). Everything else derives directly from the Object Type (including, you will note, the String Type) and are described as Reference Types.

The Type of the Object dictates the way in which the Framework allocates memory, as detailed below.

The Value Type

A Value Type is stored as a value on the Thread Stack, ie following:

The Stack will contain the following entries:

Copying Val1 to Val2 (Val2 = Val1) and incrementing Val1 (Val1 += 1) will result in the stack looking like this:

Just as we would expect.

The Reference Type

How do Reference Types differ? Well, quite a lot, Reference Types put their values into the Managed Heap and point to them from the Thread Stack.

Both variables point to the same object.

Adding to the string using Val2 &= "a new word is added" things become very interesting.  As a string is immutable (it cannot be changed) a new Object is entered on the Heap for the concatenated String and its Stack pointer altered:

One side effect of this is that whilst Val1 remains in scope the Garbage Collector cannot reclaim its memory, should the original value not be needed that is. Val1 should be set to Nothing so that collection of the orphan can take place.

Why Didn't the Copy Command Work?

Looking at Figure 1, Char is a Value Type and hasn't been Boxed, so its copy should behave independently.  But (and it's a big but) the Array Type is a Reference Type, it doesn't matter that the content of the array is of Value Type.  The array contents are still placed on the Heap with a pointer on the Stack.

Thus, when I thought I'd copied the array, I'd actually copied the pointer to it.  This diagram should make it clearer:

Cloning

Many Framework Objects include a Clone method - whether that method will produce a truly independant copy of an array depends very upon the content of the array.  in this case:

will produce an independent copy of Array1() as the content is of Value Type.  The result will be of Object Type.  It must be cast to the correct Type (in this case Char) as I have done here, using DirectCast as the clone is a Reference Type.

Confusion!

The MS documentation* says that Cloning produces a 'shallow' copy - but in this case it produced a different sort of shallow copy than when the Copy command is used.  Confused?  I was, which is why I set about compiling a comparison of the different ways of copying the content of arrays.

Investigation

Introduction

In order to investigate the various methods of obtaining a copy of an array, I wrote a short program (the VS 2003 source code can be downloaded from the link) and systematically worked my way through copying or cloning both System and User Designed Types.

These tests also introduce the concept of a ‘deep copy’, which means that a copy of any Object Type must be a completely independent copy of that Object. Everything directly or indirectly referenced by any field in that Object must also be copied.

A true deep copy was thought, by me at least, to be difficult to attain as it would have to be implemented on an object-by object basis, and only then so long as each object’s Type implemented the IClonable interface. However, Richter & Balena suggested a remedy using Streams. Balena enhanced the method in "Programming Microsoft Visual Basic .NET Version 2003" (MS Press 2004, ISBN 0-7356-2059-8) and is substantially used in the following tests as 'CloneObject'. The catch, of course, is that the objects in the Array being deep copied by this technique must each support Serialisation, as must each Object referred to by them.

Preparation

The following tables list the tests that were carried out on arrays of either Type in order to establish the effectiveness of producing a truly independent copy of that array.

The test of System Types is implemented by creating Array1() and filling it with appropriate data.  Array1() is then copied to Array2() using the selected method.

Array1() is altered using the method shown in Table 1 and the arrays are then compared for independence.

A list of the data used are shown in Appendix 1.

The Person Class, which supports both Clone and CloneObject methods, was subclassed into PetOwner and Team Classes.

The PetOwner Object can have a reference to a Pet Object, which does not have a Clone Method, or a SecondPet Object, which does.

Team Objects have no references.

Array1() is filled with one of the Person Objects, see Appendix 1.  Array1() is copied to Array2() using the chosen method, and then altered by changing Person, Pet or Team data as detailed in Appendix 1.  The arrays are then compared.

Results and Conclusion

The results of the test are tabulated below:

Testing for Existance of Second Array

There is a facility to compare pointers to see whether they point to the same Object on the Managed Heap,  Array1() is Array2().

I have used this comparison to provide the text for the ‘Independence’ Label on the lower right corner of the Form in the accompanying program. The test takes place after the data is written for the second time.

Whilst it provides accurate results for the System Types it cannot serve for the User Designed Types as (although the Pointers might not be to the same Object) the content of those Arrays is, in some cases, not the required independent copy (see Table 4 with Table 5).

Conclusion

This has been a most interesting exercise and in doing the research I have learned some valuable lessons about how .NET 1.1 organises variables in memory. In particular, I now know what to avoid!

As can be easily seen, the copying of an array of System Types if straightforward, but making a straight copy of the Array pointer to another variable should be avoided, unless a ‘second’ copy is required of course.

User Designed Types are also straightforward so long as there is no type referred to within the Object, a little more complicated if there is, and impossible to do by a Native Framework Method if the Object referred to does not, itself, support cloning.

However, one thing that stands out very clearly indeed is that any copy involving the CloneObject Method/Function will succeed no matter what, provided that all of the objects involved are serialisable.

All of the tests were carried out using the associated program (developed using VS2003) which can be downloaded, with the source code, from the link.  Please feel free to explore different options and to change the code as you see fit.  I should, however, like to hear of your experiences so that I can keep this paper up-to-date.

No doubt there will be questions and comments, which I will be happy to deal with on-line.

Acknowlegements and Appendix

Acknowledgements

I could not have started upon this quest without having (frequently) to hand a copy of ‘Building Applications and Components with VB.NET’ by Pattison & Hummel published by Addison-Wesley ISBN 0-201-73495-8, from which Figure 1 is based and also from which I learned so much.  I should also mention Dr Hummel’s webcasts to be found on msdn.microsoft.com which are without equal (IMHO).

Secondly, ‘Applied Microsoft .NET Framework Programming in MS Visual Basic .NET’ by Richter & Balena published by Microsoft Press ISBN 0-7356-1787-2, particularly chapter 14.

Francesco Balena’s book ‘Programming Microsoft Visual Basic .NET (2004) has already been mentioned.