In this document we will describe the fundamental invariants of the InterMezzo file system. First we need to understand how InterMezzo manages its objects and what the underlying invariants are. Second we need to look at the protocols involved in exchanging file objects.
The key concepts a distributed file system needs are:
Unlike Coda/AFS/NFS we hope that our design is simple enough to support a variety of file sharing semantics. A semantic scheme will likely be attached to a file set.
Several semantic models we would like to support are the following:
In replicator mode, the basic assumption is that caches are up to date, and are kept up to date by forwarding modification logs. A server can add a replicator only if the replicator first fills its cache. Filling a client cache is not simple when the file system may be modified during the synchronization. If modifications are made to the file set during the cache filling procedure, they can be reintegrated after this has completed (this might be slightly hairy).
Before modifications can be made to a file set, a client must acquire a file set permit for that file set. Once the permit has been acquired, it may proceed to modify the file set through write back caching and reintegration to the server.
The server propagates the CML to other replicating nodes. All systems confirm which records they have integrated and the server maintains the last forwarded and integrated record number for each replicator until every record has been reintegrated. During reintegration a the server sends record numbers for successfully reintegrated records to the client and this will be maintained by the client, to handle failed reintegration requests efficiently.
Upon breaking a permit, a client must pull its kernel log and reintegrate, or risk conflicts. Before granting a permit to another system, the server must forward the CML to this replicator.
A client which receives a CML forwarded from the server has several options. If files are open we need to decide on a policy to replace these files or just their data (unlink vs. overwrite). In overwrite mode, the client experiences Unix semantics (data is replaced by the writer). In unlink mode, the client may see stale data.
When a client is disconnected, it may make changes to a file set, and it should grow the CML for reintegration upon reconnection.
Upon reconnection a client, the client acquires a permit. This forces the server to forward the records held on the server which were not seen by the client (normally because of the disconnection).
During this reintegration the client must carefully check the versions of the files, since conflicting updates might have been made during the disconnection. The CML contains these validation calls as explicit commands, and allow reintegration to be interrupted to deal with the conflict. During connected mode, the need to execute these commands arises solely from the possibility that permits were broken while files were open for write. During connected mode for the most part the validation commands can be skipped (if they come from the host holding the permit).
When the server's CML records have been reintegrated, the client gets a permit and reintegrates its records to the server. This latter phase proceeds as in connected mode.
The semantic state in this model consists of permit management (which system has the permit for a file set), of CML's and index numbers for the last records sent and successfully reintegrated by replicators and servers. CML records should be kept on persistent storage until reintegrated by all replicators. If quota are exceeded due to too many CML records, any offending client must be told to switch to validation mode.
This semantic model is similar to Coda with write back caching (currently under development). InterMezzo makes no assumptions about having seen all CML records.
If a file object (file/directory/symlink) does not have the HAVE_ATTR or HAVE_DATA flags set, the object is validated and possibly re-fetched.
Servers break the callbacks before reintegrating the changes. Callbacks could be made to time out. Permit revocation forces a CML flush, but the risk of conflicts exists for files open for write at that point.
During disconnection, all callbacks are ignored, and after a reconnection all callbacks must be re-established.
Servers maintain a table of clients and files for which these clients have callbacks. To limit callback state, a timeout could be associated with the callbacks.
A file set in validation mode requires no initialization.
FIXME: finish