Sunday 13 March 2011

Versioning in Serialization

The ability to save and restore objects leads to an interesting question: what happens when an object has been stored for so long, that upon restoration it finds that its format has been superceded by a new, different version of the class?
The stream reading the serialized representation is responsible for accounting for any differences. The intent is that a newer version of a Java class should be able to interoperate with older representations of the same class, as long as there have not been certain changes in the class structure. The same does not necessarily hold true for an older version of the class, which may not be able to effectively deal with a newer representation.
So, we need some way to determine at runtime (or more appropriately, deserialization-time) whether we have the necessary backward compatibility.
In Java 1.1, changes to classes may be specified using a version number. A specific class variable, serialVersionUID (representing the Stream Unique Identifier, or SUID), may be used to specify the earliest version of the class that can be deserialized. The SUID is declared as follows:
static final long serialVersionUID = 2L;

This particular declaration and assignment specifies that version 2 is as far back as this class can go. It is not compatible with an object written by version 1 of the class, and it cannot write a version 1 object. If it encounters a version 1 object in a stream (such as when restoring from a file), an InvalidClassException will be thrown.

The SUID is a measure of backward compatibility. The same SUID can be used for multiple representations of a class, as long as newer versions can still read the older versions.

If you do not explicitly assign a SUID, a default value will be assigned when the object gets serialized. This default SUID is a hash, or unique numeric value, which is computed using the class name, interfaces, methods, and fields. The exact algorithm is defined by the Secure Hash Algorithm (SHA). Refer to the Sun Java documentation for details.

The JDK (MRJ) utility program serialver will display the default (hash) SUID for a class. You can then paste this value in any subsequent, compatible versions of the class. (It is not required in the initial version of the class.) As of this writing the serialver program has not been included in the MRJ SDK, but hopefully will be in the future.

How can you obtain the SUID for a class at runtime to determine compatibility? First, query the Virtual Machine for information about the class represented in the stream, using methods of the class ObjectStreamClass. Here is how we can get the SUID of the current version of the class named MyClass, as known to the Virtual Machine:
ObjectStreamClass myObject = ObjectStreamClass.lookup(
Class.forName( "MyClass" ) );
long theSUID = myObject.getSerialVersionUID();

Now when we restore an Externalizable object, we can compare its SUID to the class SUID just obtained. If there is a mismatch, we should take appropriate action. This may involve telling the user that we cannot handle the restoration, or we may have to assign and use some default values.

If we are restoring a Serializable object, the runtime will check the SUID for us when it attempts to read values from the stream. If you override readObject(), you will want to compare the SUIDs there.

How do you determine what changes between class versions are acceptable? For an earlier version, which may contain fewer fields, trying to read a serialized object from a later version of the same class may cause problems. There is a tendency to add fields to a class as that class evolves, which means that the earlier version does not know about the newer fields. In contrast, since a newer version of a class may look for fields that are not present in the older version, it assigns default values to those fields.

This can be seen in the example code when we add a new field to the MyVersionObject class, but don't update the SUID. The new class can still read the older stream representation, even though no values exist in that stream for the new fields. It assigns 0 to the new int, and null to the new String, but doesn't throw any exceptions. If we then increment the SUID (from 1 to 2) to indicate that we do not consider older class versions compatible with this version, we throw an InvalidClassException when attempting to read a version 1 object from the stream.

The Sun documentation lists the various class format changes that can adversely affect the restoration of an object. A few of these include:

  • Deleting a field, or changing it from non-static or non-transient to static or transient, respectively.
  • Changing the position of classes in a hierarchy.
  • Changing the data type of a primitive field.
  • Changing the interface for a class from Serializable to Externalizable (or vice-versa).
On the other hand, not every change will have a negative effect. Here are some changes to class versions that do not have a detrimental effect on object behavior:
  • Adding fields, which will result in default values (based on data type) being assigned to the new fields upon restoration.
  • Adding classes will still allow an object of the added class to be created, since the class structure information is included in the stream. However, its fields will be set to the default values.
  • Adding or removing the writeObject() or readObject() methods.
  • Changing the access modifier (public, private, etc.) for a field, since it is still possible to assign a value to the field.
  • Changing a field from static or transient to to non-static or non-transient, respectively.

No comments:

Post a Comment