Evolution of Groupware for TP/Business Applications:
A Database Perspective on Lotus Domino/Notes
IBM Almaden Research Center
650 Harry Road, K01/B1
San Jose, CA 95120, USA
+1 408 927 1733
In this short write-up, I introduce some of the database aspects of Lotus Domino/Notes. At HPTS99, I would like to give a talk that focuses on these aspects, thereby illustrating the evolution of groupware for business/transaction processing applications.
More than a decade ago, Iris Associates, now a subsidiary of Lotus, pioneered the concept of groupware. Consequently, Lotus Notes was released in 1989 with its own support for persistent storage management (i.e., without using a DBMS). While it was designed initially as a workgroup product for use by a small number of users, it has been enhanced extensively over the years, allowing it to be successfully deployed in many large enterprises. At the end of 1998, it had an install-base of 34 million seats. In April 1999, R5 of Domino/Notes that incorporates many scalability and data integrity enhancements was released.
Since the time Notes was enabled for the internet, the name Domino has been used to refer to what was previously called the Notes Server and the name Notes for the Notes Client. Because the database functionality supported in the client and the server is almost identical, the two names are used interchangeably.
Unlike in the case of RDBMSs, support for semi-structured data management has been one of the unique features of Notes from the very beginning. Notes supports the storage and manipulation of documents (notes) that can have structured as well as unstructured data (e.g., audio, video). Every document in a Notes DB could potentially be structured differently (e.g., with respect to number and types of fields) from every other document in the same DB. Fields in a document could be deleted or new fields could be added at anytime. These characteristics make the product ideally suited for the storage of web pages and XML data.
All data of a given DB is stored in a single file dedicated to that DB. A Domino server can manage any number of DBs. Notes stores its data on disk in a machine-independent format so that binary copying of a DB file across dissimilar machine architectures (e.g., PC and RISC) does not cause any problems. Because of the unstructured nature of the Notes data model, DBs as well as documents are designed to be self-describing in a location independent fashion. Storage management is done differently for structured fields compared to multimedia data (e.g., attachments). Sophisticated B+-trees are used for managing indexes (materialized views). As documents are updated, these indexes are not synchronously kept up to date. Various policies can be specified for refreshing the indexes.
From the beginning, support for replication and disconnected operation has been one of the most significant and innovative features of Notes. While initially concurrent updates were checked for conflicts at the document granularity, subsequent enhancements have made it possible to do conflict checking at field granularity. The replication mechanism supports a great of flexibility with respect to which replica to synchronize with and when. Timestamps are used to decide which documents need to be synchronized. Tombstones are used to deal with document deletions.
In order to provide high availability in the presence of unreliable nodes, Domino supports clustering of a collection of servers with automatic failover support. The clustered servers manage replicated databases that are synchronized more often and differently than in the case of normal replication. The switchover of a client from one server in the cluster to another can be made to happen if the first server is not responsive enough.
One of the major features implemented in the latest release (R5) of Domino is a traditional DBMS-style, log-based recovery. Each Notes API call is implicitly treated as a transaction. Since Notes had not been designed with this type of recovery in mind, accomplishing this required significant design work. Furthermore, enhancements had to be made to our ARIES recovery method to deal with the fact that storage management in Notes is done in an unconventional way. Some of the data structures in a DB file are paginated while others are not. Over time, the data structures might also be moved around in arbitrary ways.
Using companion products like NotesPump and DECS (Domino Enterprise Connection Services) it is possible to integrate data from Notes and other sources (e.g., RDBMSs, SAP R/3). Notes applications can be written as if all the data comes from Notes itself when in fact some of the data may be dynamically or periodically materialized from other sources.
Acknowledgements: I would like to thank my IBM colleagues in the Dominotes project at IBM Almaden Research Center and our partners in Iris Associates. Our joint work gave us deep insights into the internals of the product and led to significant enhancements to its DB infrastructure/functionality.