Trends in Database Requirements
for OLTP Applications
IBM Almaden Research Center and
IBM Santa Teresa Laboratory
There has been a noticeable evolution in OnLine Transaction Processing applications built on DB2 UDB over the past three years. These applications exploit many of the new features delivered with Version 2 and Version 5 of the product, but the evolution is not a simple mapping to the new features of DB2. This experience paper will briefly discuss these new applications, the features they exploit, and the emerging trends they collectively define in OLTP systems. In particular, the traditional requirements for scalability and availability continue to expand and become more stringent. Furthermore, we are seeing applications placing increased demands on database systems for supporting richer data, increased transaction complexity, internet enablement, and support of packaged applications.
Expansion of Traditional OLTP Requirements
There is always a spectrum of requirements placed on the DBMS by applications and DBMS customers. OLTP requirements for scalability, availability, and high-concurrency are met by various degrees by various database vendors and platforms. IBMís DB2 for 390 is an example of a system that has focused on meeting the most stingent of these requirements. Open database vendors such as Oracle, Informix, Sybase, Microsoft, and IBMís DB2 for UNIX and Intel platforms have lagged behind these systems by various degrees. For instance, DB2 for 390 now supports online upgrading of software and support for duplexed bufferpools in shared storage to prevent the need for planned downtime and increase the speed of response to unplanned failures, while the open database vendors do not currently provide this level of availability support. Our primary focus here will be on the market addressed by open database management systems.
There has been a continual growth in the number of online applications and the importance of these applications to businesses and their customers. The explosion of the internet, however, has caused an inflection point in the rate of change in demand for scalability, availability, and concurrency. The demand for internet applications, particularly for retailing and online financial
transactions has exploded, and that has made scalability critical to these customers. Online trading systems are examples where the volumes actually experienced on the systems far exceeded system sizing estimates by the brokerage services. For example, one of our customers saw a need to add nodes to their IBM SP system at a weekly rate to meet the growth of their customer transaction rates. These internet applications have also greatly increased the need for availability and elimination of planned downtime for reorganization or upgrades. One of our customers, a large department store with and online catalog system, saw over 30% of their business during the 1998 Christmas season come between 10:00pm and 6:00am. This is a time that two years ago they would see no customer transactions and could easily schedule system maintenance. With the internet, any downtime for maintenance, even in the middle of the night, results in lost revenue.
Scalability means that the system can grow in capacity as the demand on the database increases. Scalability in the number of users supported, the number of transactions, the speed of queries, and increasing data volumes are all important. Scalability of users requires supporting more connections and greater concurrency. From a database perspective, increasing the number of supported connections means decreasing the per database agent resource requirements, decreasing the amount of work needed to be done when connections are established, and reducing the amount of resources held by connections that are not currently active. Also important to DB2, because DB2 uses UNIX processes or NT and OS/2 threads for its agents, are scaling of operating process and thread scheduling algorithms. Also important are improvements in OS support for interaction between the DBMS and OS for scheduling decisions to prevent scheduling of agents that either have no work to do, or are unable to work, due to lock or latch wait conditions. DB2 continues to increase its ability to support increasingly large number of connections with each release.
Scalability of concurrency is, to a large degree, a matter of eliminating lock conflicts between transactions. Record level locking is a necessary prerequisite for high concurrency locking, but it is not sufficient. DB2 has supported record level locking since the product was first released as the OS/2 Extended Edition Data Manager in 1985. Each release, however, has seen improvements to decrease effects of blocking by next-key locking in indexes, to add additional consistency options to support transaction protocols with reduced locking at the cost of weakened consistency, and to eliminate locking or shorten lock hold times when locks have been used as a method to test for committed data. But high concurrency is not just restricted to making a large number of small transactions run efficiently. High concurrency must also focus on concurrency of DDL operations and utilities. Reducing lock conflicts in system catalogs is important since many applications create and drop tables on a regular basis. These operations can often be avoided by alternate application coding techniques, such as the use of table expressions, but it does not eliminate the need for supporting high volumes of DDL operations.
One of the most important aspects of scalability is the ability to support large SMPís and clusters of SMPís. The trend in hardware is to increase the number of processors supported in symmetric multi-processors (SMPís). Sun and HP support 64-way SMPís today and have plans for supporting larger SMPís in the future. Intel will soon support 8-way SMPís with Intel supplied technology, and today Sequent supports up to 64-way Intel processors using Sequent proprietary technology. The hardware industry is moving to support the largest shared-memory multiprocessors using Non-Uniform Memory Architectures (NUMA) where the time to access data in local memory is less than the time to access memory in remote memory. The Sequent machine is actually a NUMA machine. Clustering of SMPís allows the application of even greater number of CPUís to the database workload. Clusters are supported with either Shared-Disk or Shared-Nothing database architectures. Oracle and DB2 for S/390 use shared-disk architectures, and DB2 for UNIX and Intel uses a shared-nothing architecture. The challenge for the DBMS is to scale will on these large systems. Despite the increase of performance in individual CPUís, the demand for processing power continues to increase faster than the speed of the CPU for many applications, so support for these very large multi-processors continues to increase in importance. An example of scalability requirements is another large financial corporation that uses DB2 Extended Enterprise Edition on a cluster of UNIX SMPís. The application is a customer account management system that allows the customer to process financial transactions online, as well as provide portfolio analysis functions. The application is an interesting mix of a large number of very short transactions and much larger SQL requests involving up to 4-way joins to do analysis for the customer. The demand for processing power is driven not just by the increasing number of customers supported, but the increasing complexity of the application. We are just beginning to see the demand for Complex OLTP. A customerís database decision is a strategic business decision and the database must scale to support the increasing demands of the business.
Businesses depend on computing systems and DBMSís to function. Availability of the system during the time the business is in operation is an essential requirement. Seven day by twenty-four hour availability (7x24) was once required only by the largest multinational corporations. Today, most businesses that support internet commerce have become multinational, or at least multi-time zone corporations and the demand for availability has increased tremendously. The two most important aspects to availability that are minimization of planned outages, and minimization of unplanned downtime. Planned outages are typically required for certain system configuration changes, upgrade of hardware or software, and execution of certain utility operations which either require portions of the database to be offline or limit the amount of online activity against portions of the database while the utility is executing. We discussed earlier the importance of locking considerations with utilities and DDL. It is also important to support online and granular backup, reorganization, and to support online configuration changes such as increasing or decreasing resource pools such as buffers. This is an area where DB2, as well as other database vendors, are making continual improvements.
Minimization of unplanned downtime means that when something goes wrong, the database response must be as fast as possible. Database recovery speed is increased by ensuring backup is granular so that the most critical portions of the database can be restored to get the most critical applications online first. Minimizing the amount of data that needs to be recovered is accomplished by ensuring dirty pages are flushed from the buffer pool rapidly, including very hot pages that suffer from the problem of never becoming "old enough" to be flushed out by LRU replacement algorithms. Since many of the causes of unplanned outages are from defective code, high-quality software development practices must be followed. Even the highest quality code has bugs, and defect do occur that are repeatable so that even when the database is successfully restored after one crash, the same problem reoccurs resulting in another database failure. Thus, techniques for rapid problem determination such as sophisticated error and diagnostic information capture and logging is essential. Also, customer support geared toward rapid response to customer outages is also an availability requirement. All of these are areas that DB2 has continual focus.
Emerging OLTP Requirements
Richness of Data
DB2 for UNIX and Intel platforms has provided support for very large objects (up to 2GB each), user defined functions, and user defined datatypes since 1995. We also provide support of packaging the function and types in what we call Database Extenders so that the kinds of data supported by the database, such as video, audio, text, geographic, etc. can easily be extended. Other database vendors provide ways of providing database extensibility to varying degrees as well. We have seen many interesting uses of this technology in customer applications. One customer, a health care provider, provides an online medical information service to customers which includes a rich collection of information on chronic diseases including information about the diseases, patient care, treatment, and other resources. This application also provides a full, online medical dictionary. The application was implemented using DB2 and the DB2 Text Extender for providing the management of the text documents and search capabilities. A second customer that sells digital player pianos has an online commerce application that support selling digital piano music with the ability for prospective customers to listen to audio clips as part of their online catalog shopping. This uses DB2ís audio extender as well as other extensions. There are many other customer examples for the use of extensions, but an important aspect of the use of these extensions is that they are often used for applications where searching is important, either of information such as title or artist that is easily stored in traditional database data types, or information stored in the rich data types such as text or audio. These applications cannot be implemented by interlinked HTML pages alone. We see increasing demand for support of rich data types.
We have already mentioned the increased complexity of transactions we are seeing with the financial customer that provides not only online financial transaction support, but also online portfolio analysis. This is one example of "application complexity scale-up" where a traditional online applications supports increasingly complex function such as simple portfolio analysis. Application complexity scale-up is rapidly increasing as retailers, financial services, and other types of business compete more and more on the sophistication of the services they provide online to their customers. A second source of complexity of online transaction processing is where complex analysis that was traditionally done offline, or by small, and highly specialized parts of an organization are being made available to larger number of people in an organization, or even to customers or suppliers outside the organization. This is really the process of simplifying and taking online, complex analysis. We see this occur time and time again with customers who install decision support systems and then realize the speed at which they can obtain results and the value of the information they are discovering. Business Intelligence is one of the largest growing segments of the database industry, and it is driven by the accessibility that is being made to analysis applications. Since most of these applications provide interfaces to users that generate very predictable classes of queries on the database, automatic summary tables or materialized views that DB2 support are well suited for these kinds of applications. Other vendors including Oracle and NCR Teradata support to various materialized views to varying degrees.
Internet technology is being adopted at a rate that far exceeds the early adoption of many other technologies we take for granted today including the telephone, television, and even the personal computer. We have discussed some of aspects of the requirements the internet places on database management systems. There are also new demands placed on database systems as database management system integrate web server technology in a manner similar to the integration of transaction monitor technology with the move to client-server computing and stored procedures. We have also seen a rapid adoption of Java as a database application programming language. What is interesting is that we are even seeing customers that traditionally used COBOL starting to write new applications in Java. So the support for not only Java as a programming language environment for stored procedures and user defined functions is important, but we are seeing customer demand for support of other Java technologies such as JDBC, SQLJ, and distributed Java transaction services. Although the adoption of Java is not limited to internet applications, the rapid acceptance of internet technology and Java functionality in browsers is spilling over to other distributed application development as well.
Adoption of Packaged Applications
The adoption of package applications such as PeopleSoft, SAP, and Baan in the ERP space is not a new phenomenon. These applications have been large drivers of database revenue for some time. What is changing is that as these vendors have increased in size, they have been able to demand more from database vendors for support of their online transaction processing. The applications themselves are changing and many of these vendors are starting to provide data warehouse and analysis capabilities exploiting the important business data that they help manage. An additional trend is to integrate data from the various vendors, so that for instance, data from a PeopleSoft application can be feed into an SAP application. These trends are another source of evolution of the requirements on database management systems for OLTP support.
This paper has attempted to give the authorís view of some of the important trends experienced in OLTP systems. There are certainly other trends that have not been discussed here, and these trends are certainly not necessarily unique to DB2. These trends help point to areas where new research or technology development can be important.