InterMezzo 1.0 KML and Expect Protocols
The InterMezzo filesystem keeps sets of files on multiple hosts
synchronized. It sits on top of the native filesystems on each
host and keeps track of updates to the filesystems in such a way
that it can synchronize the changes between multiple hosts. In
this document we describe the architectures and protocols that
InterMezzo uses to keep files synchronized.
This description is in its very early stages...
Coherence and Granularity
InterMezzo guarantees only very loose coherence between the
filesystems. Files are only ever handled as complete units,
changes are not propagated until the file is closed for writing,
and changes on one system are not necessarily reflected on
another immediately. In InterMezzo 1.0 whole filesystems are
replicated and only one host may have the write lock for that
filesystem at any one time.
Presto
Presto is the kernel module for InterMezzo. It implements the
various operations associated with the InterMezzo file system
under VFS and creates pseudo devices for communication with
Lento.
Lento
Lento is a user-space daemon which handles file transfers and
other caching issues on behalf of presto. There is one Lento
per mounted InterMezzo file system.
The KML File
There is one KML file per mounted InterMezzo filesystem. The
KML file is contains records of changes to the filesystem, and
taken as a whole the KML file can provide a script for building
a replica of the whole filesystem.
The KML file is a series of binary records, each of which
represents a single modification to the filesystem. Each record
is self-contained in that it does not have references to other
records, a property which makes the records easy to move around.
The records are of variable length, and the length of the record
is stored at the beginning and end of each record to facilitate
moving forward or backward through the file. A complete
description of the allowed KML record formats doesn't exist yet.
The Expect File
There is one Expect file per mounted InterMezzo filesystem. The
Expect file contains information about how this host is
synchronized with the other hosts by holding pointers into this
and other hosts' KML files. This information is stored in the
filesystem so that it will be persistent across reboots.
The Expect file has four pieces of information for each remote
host.
-
next_to_expect. A pointer to the next record in the
remote host's KML file that we expect it to send to us. If we
get a set of records that is does not start at this value then
a message has been dropped somewhere and we need to
renegotiate with that host. This is NOT a hint.
-
next_to_send. A pointer to the next record in our KML
file that we intend to send to the remote host. This is just
a hint because we advance next_to_send as soon as we've sent
data to another host, not when we've gotten confirmation that
it has been received and processed. When we send KML records
to the remote host we send the value of next_to_send (plus the
gap, below) to tell the remote host where the records come
from in our KML.
-
confirmed. A pointer to the beginning of the next
record in our KML file that has not yet been confirmed as
received and processed by the remote host. This is NOT a
hint.
-
gap. An adjustment to add to next_to_send before
sending it to the remote host. This lets us move records
forward or back in our local KML file while preserving the
externally visible file locations. This is NOT a hint.
Legal Transformations of the KML and Expect Files
In order to maintain consistency, only certain kinds of
transformations to the KML and Expect files are allowed, and
generally they have to be done together using transactions to
make sure the system remains in a coherent state.
-
Append a Record to the KML File. This is the operation
that the normal VFS file operations end up using. The record
is appended to the KML file, and no modifications are made to
the Expect file.
-
Incorporate a Remote KML Record. In addition to
performing the operation and appending the record to the local
KML file, increment the next_to_expect for that host.
Modifies the KML and Expect files.
-
Send KML Records to a Remote Host. A block of KML
records are read from the KML file starting at next_to_send,
and are transmitted to the remote machine. next_to_send is
incremented by the number of bytes read. We effectively get a
read lock on this section of the KML file. KML is read but not
modified, and the Expect file is modified.
-
Receive Confirmation of KML Processing. We receive
confirmation from a remote host that a set of records starting
at a given point and with a certain byte length has been
received and processed. These offsets are from a remote host
so we have to subtract off the gap for that host, then compare
with what we think the confirmed pointer is, then move the
confirmed pointer. There are no KML modifications.
-
Optimize a Section of the KML File. We obtain a write
lock on the section to be optimized, then read the section in,
perform whatever optimizations we desire, then write it out
again. The newly written section must be no larger than the
previous, and if it is smaller a NOP block is inserted to fill
out the space either before or after the new section. If this
section is at the end of the KML file then the KML file can be
truncated to remove the NOP block at the end. Then the write
lock is relenquished. The KML file is modified and the Expect
file is not.
-
Punch Out a Section of the KML File. The section to be
removed must not have outstanding read or write locks, and it
can only have NOP records in it. File system magic is then
performed to release the appropriate file blocks to produce a
sparse file. The Expect file is not changed.
-
Front-Truncation of the KML File. Instead of producing a
sparse file you can remove the beginning of the KML file.
Like the punch operation, the section to be removed should
have no outstanding read or write locks and should have only
NOP operations. The file should then be truncated and the gap
values for all of the remote hosts adjusted in one
transaction.
-
Skip a NOP Block in the KML File. The next_to_send and
confirm pointers must both be pointing to the beginning of a
NOP block. Then next_to_send, confirm, and gap can all be
incremented by the size of the NOP block. No changes are made
in the KML file.
Scenarios
Here we show how the above system components act in concert to
keep filesystems coherent.
Steve Karmesin
Last modified: Thu Apr 27 16:41:44 MDT 2000
$Id: KML-Expect.html,v 1.1 2000/04/27 22:43:16 karmesin Exp $