This original article is published in Techbeacon on July 31 2017

Now that Agile and DevOps are in widespread use in software development environments, early and quick performance analysis of business-critical, high-traffic applications is essential. This has led to an increase in the adoption of performance modeling activities, which helps to predict system performance metrics during early lifecycle phases.

Due to a fear of mathematics, many performance testers mistakenly think queuing theory is too complex in nature and leave it to core performance modeling experts. But knowledge of simple mathematical queuing theory principles can make system performance analysis easier. As Performance testers, it’s important that we understand the relationships that exist between performance metrics such as response time, throughput, and server utilization so that we can quickly identify performance-, scalability-, and capacity-related problems. Here are a few basics for the uninitiated on how to use queuing theory principles in performance testing.

**What is queuing theory?**

Queuing theory is the mathematical study of waiting lines, or queues. A queuing system is represented using three primary elements: the input, the service center, and the output. The input refers to the job arrivals to the system, the service center refers to the processing component that performs the expected service, and the output refers to the job completions that leave the system. Depending on their arrival pattern & time required to service the incoming jobs, jobs might need to wait in the queue before getting the intended service.

Lets assume a abstract system which represents a simple queuing system. It has incoming job arrivals, which waits in a queue to get its turn to get the processing done. It has a service center which performs the intended service. The completed jobs leaves the system upon processing completion. Let’s imagine you are monitoring the system for a finite time (T). During this monitoring period, we observe number of job arrivals to the system (A), number of request completions (C) and the amount of time the service center is found busy (B) during the monitoring period. You can use these directly measurable quantities to derive the quantities below:

- The arrival rate, represented using the symbol lambda (λ) equals the number of request arrivals (A) divided by the time (T).

**Stated algebraically: λ = A ÷ T** - The throughput or completion rate , represented using the symbol (X) equals the number of request completions (C) divided by the time (T).

**Stated algebraically: X = C ÷ T** - Utilization (U) equals the total amount of time the service center is found busy (B) divided by the time (T).

**Stated algebraically: U = B ÷ T** - Mean service time (S) equals the total amount of time the service center is found busy (B) divided by the number of request completions (C).

**Stated algebraically: S = B ÷ C** - If there are more than 1 visits made to the service center to complete the job, then another quantity called Service Demand can be derived. Service Demand (D) represents the total demand for service center required to complete the entire job processing, before the job leaves the system. It equals the total visits (V
_{k}) made to the service center multiplied by its mean service time (S).**Stated algebraically: D = V**_{k}× S

**The operational laws of queuing theory**

Operational laws are simple equations used as an abstract representation or model of the average behavior of the queuing system. These laws, very simple and generic in nature, are popular because they do not make any assumptions about system behavior. You can derive five key operational laws using the relationship between the quantities that we discussed :

**The Utilization Law**states that service center utilization (U) equals throughput of the service center (X) multiplied by the mean service time (S) at the service center.

Stated algebraically: U = X × S**Little’s Law**states that the average number of jobs in a queuing system (N) equals the arrival rate (λ) multiplied by the average time that a job spends in the system (R). Based on flow balance assumption, arrival rate (λ) equals Throughput (X) for a stable system, Hence, Little’s law can be represented as

N = λ × R (or) N = X × R**The Interactive Response Time Law**states that the average number of jobs in the system (N) equals throughput (X) multiplied by the average response time (R) plus the think time (Z). Here Think Time is the time spend by real end user waiting (thinking or entering the data in the form fields available in the web page, etc) between successive actions. N = X × (R + Z)**The Forced Flow Law**states that throughput at service center k (X_{k)}equals visits to the service center (V_{k}) multiplied by the overall system throughput (X).

X_{k = }V_{k }× X**The Throughput Bound Law**states that the maximum achievable system throughput (X_{max}) equals or less than 1 divided by the value that represents the service center with maximum service demand in the system (D_{max}).

X_{max}< = 1 ÷ D_{max }

These simple operational laws can be used during performance testing for addressing challenges such as validating the performance non-functional requirement (NFR), validating the performance test results, validating workload models, calculating the think time, predicting the maximum achievable system throughput, estimating the server capacity requirement, building performance prediction model, and so on.

Here’s a simple case study that shows how you can use the above discussed operational laws during performance testing to accelerate system scalability and capacity analysis.

**Case in point: The three-tier web app example**

A three-tier web application under test is expected to support 1,000 users meeting a throughput target of 100 transactions per second (TPS) and a response time SLA of 2 seconds to complete its 3 transactions.

The business projects a 2.5X times increase in throughput due to promotional activity for the next quarter (a validated projection). What is the think time to be configured in the test script to meet this NFR? How should you validate the load test result? What is the maximum achievable system throughput by the system? And what is the additional server capacity required to handle the increase in throughput?

Lets apply Operational laws to answer all these questions.

**Step 1: Think time calculation**

In order to run a realistic load test, you must apply Little’s Law to calculate the think time to be configured in the script.

Little’s Law tells us that N = X × R, and the Interactive Response Time Law tells us that N = X × (R + Z), where Z represents think time. With basic math skills, we can convert the equation:

Z = (N ÷ X) – R

We know, N is 1000 users, X is 100 TPS, R is 2 seconds for its 3 transactions which will be 6 (3 × 2) seconds. We just plug in these values we know:

Z = (1,000 ÷ 100) – 6 = 4 seconds

So a think time of 4 seconds should be configured in the test script to carry out a realistic load test to meet the system performance target.

**Step 2: Load test results validation**

The actual configured users used to carry out the load test (N_{actual} = 1,000 users), the transaction response time for 3 transactions are 2.2 seconds, 1.8 seconds & 2.4 seconds and the average throughput measured during the test ( X = 100 TPS). And we know the think time is 4 seconds, Using these test observations, you can apply Little’s Law to identify the calculated number of users, N_{calculated}.

By applying Interactive Response Time law equation, N_{calculated} = X × (R1 + R2 + R3 + Z)

Hence, by plugging in the test result values, N_{calculated} can be derived as 1,040 users.

As N_{actual} & N_{calculated} values are almost equal, this confirms that you have a valid load test & test observations are valid.

It would be nice if a performance tester performs the additional calculation that is described in Step 3 and shares it along with load test report.

**Step 3: Calculation of maximum achievable system throughput **

During 1,000-user load test, assume the web server’s average CPU utilization (U_{W-CPU}) is 25%, the application server’s average CPU utilization (U_{A-CPU}) is 60%, and the database server’s average CPU utilization (U_{DB-CPU}) is 30%.

If web, application, and database server CPU utilization values are available, you can do the following calculations:

For a service center k, by applying the Utilization Law, you know U_{k} = X_{k} × S_{k }

By applying forced flow law ( X_{k} = V_{k} × X) on the above equation, you get U_{k} = (V_{k} × X) × S_{k}

This equation can be regrouped as U_{k} = X × (V_{k} × S_{k}) . You already know, V_{k }× S_{k} represents the service demand, D_{k.}

Hence, Utilization of service center (k) equals the overall system throughput (X) multiplied by its Service Demand (D_{k})

Utilization of service center, Uk = X × D_{K}

Therefore, to find service demand on the web , application & DB servers, we can apply utilization law as shown below :

D_{W-CPU }= U_{W-CPU} ÷ X, which is 0.25 ÷ 100 = 0.0025

D_{A-CPU = }U_{A-CPU} ÷ X, which is 0.6 ÷ 100 _{ }= 0.006

D_{DB-CPU = }U_{DB-CPU} ÷ X, which is 0.3 ÷ 100 = 0.003

We can then use the Throughput Bound Law (X_{max} < = 1 ÷ D_{max}) to determine the maximum achievable overall system throughput. In our case, the service center with maximum service demand is the application server (D_{A-CPU}). The overall system throughput will depend on the throughput of this slowest service center which is application server CPU resource.

Therefore:

X_{max} < = 1 ÷ 0.006 = 167 TPS

In the above equation, the maximum utilization value of the service center is considered as 100% (U = 1) to calculate the maximum achievable system throughput, X_{max}. As the CPU utilization threshold for the application server is 85% for the system under test,

The maximum achievable system throughput can be re-calculated as

X_{max} < = 0.85 ÷ 0.006 = 142 TPS

Hence, the system under test can support a maximum throughput of 142 TPS in the current test environment.

As you have seen, during the 1,000-user test, the system throughput measured using the performance test tool is 100 TPS. With the help of this load test result, you have predicted that the maximum achievable system throughput value will be 142 TPS. As the application server has the maximum service demand, the overall performance improvement is possible, only by improving the application server CPU resource performance (either by doing CPU profiling for code optimization or through CPU capacity upgrade).

**Step 4: Server capacity sizing analysis**

In order to support the projected system throughput (X = 250 TPS) provided by business, you can calculate the additional capacity required by the system by using the Utilization Law.

The web server CPU service demand is 0.0025. By applying the Utilization Law (U_{k} = X × D_{k}), you get:

U_{W-CPU} = 250 × 0.0025 = 0.63

Hence, the web server CPU utilization is projected as 63% to support the target 250 TPS load, which confirms web server has enough capacity to handle the projected 250 TPS, considerably less than web server CPU utilization threshold value of 85%.

The database server CPU service demand is 0.003. By applying the Utilization Law (U_{k }= X × D_{k}), you get:

U_{DB-CPU} = 250 × 0.003 = 0.75

Hence, the database server CPU utilization is projected as 75% to support the target 250 TPS load, which confirms DB server has just the required capacity to handle the projected 250 TPS, just under the DB server CPU utilization threshold value of 80%.

The application server CPU service demand is 0.006. By applying the Utilization Law (U_{k} = X × D_{k}), you get:

U_{A-CPU} = 250 × 0.006 = 1.5

Hence, the application server CPU utilization is projected as 150% to support the target 250 TPS load, which confirms application server CPU requires more capacity to handle the projected 250 TPS. Since the application server CPU utilization threshold is 85%, the additional utilization capacity requirement will be 1.76 (1.5 ÷ 0.85).

The application server CPU requires approximately 1.76 times the current CPU capacity to handle the projected 250 TPS load. Based on this calculation, an additional buffer can be included and you can use the TPC or SPEC benchmarks to recommend the right server configuration to meet the projected load levels.

**In the ballpark**

These types of simple performance test validations, scalability analysis, and capacity extrapolation activities offer a good representation of system performance. These calculations can be done quickly using less effort and can alert performance testers on the next steps for performance tuning.

This knowledge of operational laws is the first step toward performance modeling using analytical modeling techniques. Now you can start exploring on queuing network, simulation, or statistical modeling techniques as ways of predicting system performance with greater accuracy. But performance modeling activities aren’t a replacement for conducting performance tests. Rather, you should use them to get a quick ballpark estimate or direction when you don’t have the time or complete system available to perform more sophisticated performance tests.

*To learn more about queuing theory and how the mathematical principles can be applied in Performance Testing, see my presentation at the PerfGuild online conference. Registered attendees also have full access to my presentation after the event.*

** **