Object Serialization

Link:  http://java.sun.com/products/jdk/1.1/docs/guide/serialization/spec/serial-arch.doc.html.

What is it:  It's the act of taking a data structure (object, array, tree, etc) and bundling it up into a stream of bytes,such thatyou can get the original thing back by reading those bytes.

Why have it:  Because you need the functionality.  Maybe to save the state of a program on disk, or to send an object across the net, or for RMI argument marshalling.

The Java Scoop:  There are two interfaces important here.  Externalizable means that you can bundle and recreate the objects as descibed above.  Serializable means that you can do that and also that each object carries an object type identifier for verification, and to support inheritence.  Normally Serializable is the interface of choice.

What can you serialize: Anything that implements the Serialization interface.  Almost everything in the standard class lib does, with the exception of things like file and socket handles.  Even the AWT stuff is serializable!!  For your own objects, you need to implement Serialization yourself, and must decend from 'Object'.

What about new types of objects?  If your program deserializes an object of a type not who's code is not found in your program, that code will be searched for and loaded by the standard mechanisms (CLASSPATH).  How could your program do this?  Adding new subclasses to on object heirarchy.

How is this more powerful that just printing to a file and reading from a a file?  Because this allows you to store and retrieve very large complex objects with just a fiew function calls.

Example Writing:
       // Serialize today's date to a file.
       FileOutputStream f = new FileOutputStream("tmp");
        ObjectOutputStream  s  =  new  ObjectOutputStream(f);
        s.writeObject("Today");
        s.writeObject(new Date());
        s.flush();

First an OutputStream, in this case a FileOutputStream, is needed to receive the bytes. Then an ObjectOutputStream is created that writes to the OutputStream. Next, the string "Today" and a Date object are written to the stream. More generally, objects are written with the writeObject method and primitives are written to the stream with the methods of DataOutput.

The writeObject method serializes the specified object and traverses its references to other objects in the object graph recursively to create a complete serialized representation of the graph. Within a stream, the first reference to any object results in the object being serialized or externalized and the assignment of a handle for that object. Subsequent references to that object are encoded as the handle. Using object handles preserves sharing and circular references that occur naturally in object graphs. Subsequent references to an object use only the handle allowing a very compact representation.

What about primative data types (not objects)?  Primitive data types are written to the stream with the methods in the DataOutput interface, such as writeInt, writeFloat, or writeUTF. Individual bytes and arrays of bytes are written with the methods of OutputStream. All primitive data is written to the stream in block-data records prefixed by a marker and the length. Putting the data in records allows it to be skipped if necessary.

Example Reading:
        // Deserialize a string and date from a file.
        FileInputStream in = new FileInputStream("tmp");
        ObjectInputStream s = new ObjectInputStream(in);
        String today = (String)s.readObject();
        Date date = (Date)s.readObject();

The Serializable Interface

Object Serialization produces a stream with information about the Java classes for the objects that are being saved. For Serializable objects, sufficient information is kept to
restore those objects even if a different (but compatible) version of the class's implementation is present. The interface Serializable is defined to identify classes that
implement the Serializable protocol:

package java.io;

public interface Serializable {};

A Serializable object:

      Must implement the java.io.Serializable interface.
      Must mark its fields that are not to be persistent with the transient keyword.
      Can implement a writeObject method to control what information is saved, or to append additional information to the stream.
      Can implement a readObject method so it can read the information written by the corresponding writeObject method, or to update the state of the object after it has
      been restored.

ObjectOutputStream and ObjectInputStream are designed and implemented to allow the Serializable classes they operate on to evolve. Evolve in this context means to
allow changes to the classes that are compatible with the earlier versions of the classes.

Protecting Sensitive Information

When developing a class that provides controlled access to resources, care must be taken to protect sensitive information and functions. During deserialization, the private
state of the object is restored. For example, a file descriptor contains a handle that provides access to an operating system resource. Being able to forge a file descriptor
would allow some forms of illegal access, since restoring state is done from a stream. Therefore, the serializing runtime must take the conservative approach and not trust the
stream to contain only valid representations of objects. To avoid compromising a class, the sensitive state of an object must not be restored from the stream, or it must be
reverified by the class. Several techniques are available to protect sensitive data in classes.

The easiest technique is to mark fields that contain sensitive data as private transient. Transient and static fields are not serialized or deserialized. Marking the field will
prevent the state from appearing in the stream and from being restored during deserialization. Since writing and reading (of private fields) cannot be superseded outside of
the class, the class's transient fields are safe.

There is no encoding.  Particularly sensitive classes should not be serialized at all. To accomplish this, the object should not implement either the Serializable or Externalizable interfaces.

Some classes may find it beneficial to allow writing and reading but specifically handle and revalidate the state as it is deserialized. The class should implement writeObject
and readObject methods to save and restore only the appropriate state. If access should be denied, throwing a NotSerializableException will prevent further access.