HPTS 1999 Position Paper

HPTS 1999 Position Paper

Is It Time to Rethink our Architecture Design Points?

Scalability Issues in System Architecture

Pat Selinger

IBM Santa Teresa Lab

pgs@us.ibm.com

1. Enabling High Volume, High Performance Applications

As companies decide to participate in the Internet movement and become web-enabled, they are moving to a new design point in system architectures. Instead of designing and building (or purchasing) a solution for 100 or 1000 or 10000 internal employees (e.g. Bank tellers), application developers and solution providers are designing and building for potentially millions of web users. The internal tracking programs or accounting programs first implemented five or ten years ago will now feature a new web front end and quite likely will experience at least an order of magnitude more throughput demand..

What is the right architecture to use when reengineering these high volume, high performance OLTP applications? One choice is the classic multi-tier architecture, with a desktop client or browser front end, then a middle tier where the application runs with possibly a web server acting as a gateway or router, and then a third tier containing the database server. Another alternative is to exploit the configuration used by many database vendors when running benchmarks like TPC-C. This alternative uses a two-tiered architecture, with a web browser or desktop client as the first tier and a second tier that runs both the application and the database server. These may be three-tiered, with the third tier being a transaction router-manager but not doing application work. Very often this configuration exploits stored procedures (application logic together with DBMS calls) so that both the application and also the database work can be run using a single agent (process or thread). This eliminates communication overhead and network traffic delays as well as process switches between application and database. Further, because it is centralized, this architecture is easier to manage than multiple, possibly heterogeneous servers. If the application and database work all fits comfortably within a single system, this two-tier architecture provides the best performance and manageability and is very appealing.

But what happens as the workload volume increases? There are two obvious choices. Add more and/or faster processors to the system or offload function from the data server. When is moving to a distributed architecture better than a larger single system? Are these still the right choices when our new design point must do 100 times or 1000 times greater volume of work? Today we professional software engineers most often have the following set of objectives:

Build software that produces the correct answer
Try not to lose or corrupt the data
Design for performance, performance, performance

Are these still the right design objectives as we look to the future? In the rest of this paper we will analyze the considerations in determining how to make this decision.

2. Examining the Consequences of Classic 2- and 3-tier Architecture Choices

What design objective should we be using as we look at current applications and how they might evolve or be reengineered to handle massive volumes of work?

If we look at the alternative of offloading function from the data server, our first inclination is to separate at the database interface. Moving application function away from the data server will increase response time because of added network delay plus the extra instructions to send messages rather than pass parameters in memory and the added instructions to do process switches between application and message subsystem on the application side and between message subsystem and database server on the third tier. As processors become faster, memory accesses become more expensive relative to cache speeds so it is important to maintain long instruction pipelines and to avoid cache misses. When we introduce process switches and message boundaries into the unit of work by offloading application function, instruction pipelines are broken, systems must reload the instruction and data caches, and this will be costly in terms of performance.

Let’s take an example from commercial systems such as the new orders transaction from TPC-C. New_Orders spends approximately one quarter of its time in the application and three quarters of the time in the database engine and depending on the implementation has about 40 send/receive invocations between application logic and database. If the application ran on a different system, then for each of those 40 invocations, there would be the cost of two process switches, the message cost in TCP/IP, the network delay, and the cost of converting and linearizing the input/output data to the wire format. Each system (database and application) pays for half of this work, which we’ll refer to as the Offload Cost. For New_Orders in most implementations, the Offload Cost is more than the original application work! In this case, it is never worthwhile to offload the application because offloading the application slowed the database server down, made it less scalable and delivered worse performance.

Lesson 1: Offloading small amounts of work can cost more than just doing the work. In these cases both scalability and performance objectives drive towards a two-tiered architecture.

Let’s take another example. Suppose the application work of the overall transaction is 90% of the total pathlength, leaving 10% of the work to be done by the database server. In a recent S&D user benchmark for SAP, this is approximately the ratio between application and database server workloads. In that benchmark, 17 application servers at an average of 54% CPU utilization drive one database server at 99% CPU utilization. Because the application and database servers were identical systems, we can calculate that the pathlength ratio is (17*.54) or about 9 to 1. Suppose that our example has an application pathlength of 5 million instructions and the database pathlength has .5 million instructions ( a 10:1 ratio between application and database work just to make the math easier) and 40 SQL calls total. For this example, is it worth offloading the application logic to a separate server to obtain added scalability? Using some estimated costs, which of course will differ by platform and protocol, the communication and message packaging overhead of making the SQL calls remote instead of local adds 7% to the total transaction pathlength. If all the work were done on one system, a total of 5.5M instructions would be run on the database server. By offloading the application to a separate server, splitting the 7% overhead between application and data servers equally, the data server now only has to execute .7 M instructions and the application server executes 5.2 M instructions. Ignoring contention, the database server in a three-tier architecture is able to support nearly 8 times as many transactions. Meanwhile with a reasonably fast network, where the latency is overshadowed by the communication pathlength, the transaction only runs 7% longer which in many cases would still meet user response time expectations, particularly in an Internet environment where latency is already unpredictable.

Lesson 2: Offloading significant amounts of work can be very advantageous in achieving high levels of scalability. In these cases the best architecture for scalability is NOT the best architecture for high performance.

Surprisingly in this last example, high scalability and high performance work against one another, although most system designers tend to think of them together.

3. Examining the Consequences of Contention

Continuing our exploration of the architecture tradeoffs for high volume, high performance database work, let’s look at contention. The classic extreme example of contention is the application that assigns a unique and ever-increasing order number. Every instance of this application targets the same one-row one-column table, obtaining an exclusive lock, updates the order number to the next value, then commits and does the rest of its work. Essentially every application is serialized waiting for the lock on this order number. What happens if we offload the application function, including the database request to update the order number, to a separate system? This particular order number transaction drives the pace of the entire application. Therefore anything that lengthens that order number transaction reduces throughput. Because getting an order number is a very simple transaction, probably just a single database request, we could conceivably double its duration by moving it to a different system.

We know generally that the probability of deadlocks grows with the square of the update transaction duration. If moving the application logic to a separate server increases the duration of this transaction by 100%, then we increase the probability of deadlock by 10000%! Moving the application logic to a different system in this example has a devastating impact on scalability.

Admittedly, having a centrally provided monotonic order number is a terrible application design. This would be a poorly scaling application even on a single system. However, we often see hot spots in the data which mimic this behavior in a more modest way. In a number of applications that are update-intensive, lock contention becomes the scalability limit, not CPU capacity on the database server. If offloading application logic to a separate tier adds even 20% to the duration of transactions holding hot locks, we will have increased the probability of deadlocks by 44%. In such cases, doing at least some application logic on the database server improves scalability. I have seen some customer applications where contention limits throughput severely, and database server CPU utilization doesn’t grow beyond 30 or 40 percent.

Lesson 3: Offloading application work to a separate tier increases transaction duration which causes additional contention for locks and in some cases becomes the limiting factor for achieving high levels of scalability. In these cases the best architecture for scalability is again the best architecture for high performance, a single system running both application and database work.

4. The Challenge in Front of Us

This conflicting set of lessons should cause us to step back and question our own technical design judgment and objectives. Should we make our system design judgments based on achieving high performance or on achieving scalability? In the high volume Internet world, is high scalability more important than high performance? We know for certain that these two objectives are not always in concert. What we have observed is that:

1. When there is a high degree of shared data being updated by many transactions, the best scalability is achieved with the shortest transaction duration. Scalability and performance both improve with centralization of function.

2. When the proportion of application work is much larger than database work, scalability increases when application work is offloaded from the database server. Performance, however, as measured by response time is reduced. Unlike case 1, scalability does not improve as performance improves, and in fact we achieve greater scalability at the cost of worse performance.

3. In the situations where scalability increases by moving from two to three tier architectures, the amount of achievable scalability is eventually limited by one of three factors:

A. Longer pathlength and network delays will increase response time beyond a user-tolerable limit.

B. Longer pathlength and network delays will lengthen transactions so that lock contention increases, triggering deadlocks to an intolerable level.

C. The application servers drive the database server system to its full capacity. At 99% CPU utilization on the data server, adding more application servers will reduce, not increase scalability.

We began this position paper by questioning whether the old design objectives that worked well for internal application usage will also be the right ones as application services open up to broader audiences, perhaps everyone who has Internet access, which some predict will be 70% of the world by 2007. If we were building all these systems today, would we use the same objective of performance, performance, performance, or would we look more for scalability, even at the cost of reduced performance? As we look at technology evolution, we are seeing high speed networks drive down latency and improve bandwidth, so that the performance degradation by moving function to a separate system will be smaller and smaller. This offers the opportunity for new design points, even in familiar systems, and may demand that we make different design decisions than we would have made in the past.