Printer FriendlyEmail Article Link

Avalanche: How do I determine in real-time if Avalanche is the bottleneck or the Device Under Test?

Answer

This is especially important for performance testing. Because the SimUser can cover the full spectrum of complexity like a simple one line HTTP GET to something very complicated like E-Commerce / HTML5 validation, it is impossible to predict resource needs before the test starts. Avalanche provide you with real-time ways of determining bottlenecks. It is important to understand how to interpret these metrics because it will show you if you can generate more traffic at the same complexity, or backoff load due to tester congestion. In addition, it give you a mechanism for explaining to the customer when actual and expected load differentiates.

 
When running an Avalanche test, here is how you determine if Avalanche is bottlenecking your test:
 
 
1.       Memory Pool Usage Issues
This is the amount of memory you test is consuming across all chassis of ‘Client’ or ‘Server’. This live metric may be found in the Run-Time stats, under the branch ‘Resources’. As a rule when (Main Pool Used +Packet Memory)/Main Pool Size>=.95 over an extended period, then the test is memory bound. Main pool size is the sum of all available memory for all ‘Client’ or ‘Server’ reserved resources. Packet memory used is the current amount of ram used to buffer packets both incoming and outgoing, and main pool used is RAM used to store open TCP connections, SimUsers, FormsDB, content, etc. The most likely cause of memory bound issues is to many Open TCP connections, or very long lived SimUsers. TCP open connection issue have some sub-causes. First, the LoadSpec used is just in excess of the capabilities of the hardware hosting Avalanche. If you look on the Performance Matrix for your hardware, then look at Attempted Open TCP connections, if you are in excess, then reduce the Open SimUser or Open TCP connection count. Next, if the Device Under Test is having issue processing TCP (ex. Excessive ACKs (or SACKS), timeouts, etc, then TCP will queue up and stay open in RAM. If you look The LoadSpec Engine will attempt to adapt and slow down, recursively looking every 200 mSec, but all this will do is prevent new TCP session from loading. Existing TCP session “In Trouble” may still memory throttle. Also, you need to look both at client and server resource tabs, either side may be bottlenecking your test. The poll rate by default is one poll per 4 seconds, but uses the same rate as real-time results, so may be polled every second if desired.
 
2.    Receive Queue Length
The receive queue length is an indicator of packet backlog for the CPU cores in the test. The x-axis is time, polled at the same polling rate set in the run options (Every 4 Seconds by default), but may polled every  second, the y-axis is the sum of the number of packets across all testing units of ‘client or ‘sever waiting to be processed by Avalanche. A value of 0 means that that the CPU core resources process all packets as soon as they enter the IO queues. Receive Queue length is one indicator of congested CPU cores. If the CPU is at high load, packets fill up the queues. Individual spikes are normal and should not impact such things as TCP timeout. The pattern that indicates that the cores are bottlenecks is a series on back to back bursts with an average height > 100 packets. Therefore, any queue pattern with an average value less than 100 is general OK, whereas when the average drift  above 100 then the CPUs are bound. Typical causes for this are overly complex ActionLists at the current Load, especially when advanced HTTP options like MATCH/MATCH NOT, FormsDB, Search Criteria are used. Try reducing the complexity of the action list or reduce the number of open SimUsers.
 
 
 
 
3.       Idle Time Issues
On the Run>Loads tab, you will see the SimUser Desired vs. Current chart, polled at the configured rate. When you see the two diverge, that may indicate a tester bottleneck. In the grid below entitles “Load Generator Status”, you will be able to determine by user profile and port how balanced and loaded your test and CPU cores are in your test. Rows represent port summaries and individual user profiles within the test. Desired and Current load are rated in ‘SimUsers’. Idle time is measure of NOPs (No Operations) in the client side CPU cores. A NOP is a “Hole” between processing instructions. In this case, bigger is better because there is a lot of free space between operation so that one event does not impact or delay another event. There are two guidelines for Idle Time (a) Idle Times should have a value of > 2,000 and (b) Idle Times should be relatively the same across User Profiles. If you see a very low idle time on a  user profile or a port summary, then the combination of complexity in your User Profile at the current loading is impacting your test results. As an action, either reduce the user profile complexity or reduce load. If you see very different results across user profiles or summaries, then there is a good chance that the test is not tunes for the number of CPU cores. As an action, retune the test to the number of cores per port.
 
 
 
4.       User Stats
The User Stats on the Runtime Statistics>SIM Users>Users States can be a very useful indicator of congestion in the ActionList. The Suspension State is the total number of SimUsers that have been suspended. This counter increments whenever a Suspend transition occurs. A suspend transition can occur when the system is in an error condition, such as when it has limited available memory. The total number of suspended SimUsers can be greater than the total number born when the error condition is of a long duration, and the same SimUser is repeatedly suspended and reinstated. The Unnatural Death counter indicates the total number of SimUsers who have not, and will never, successfully execute the statements in the Actions list.
 

Product : Avalanche,Load Testing,L4-7,Avalanche