Server scaling
Every server has limitations on how many parallel requests may be served, that is, processes and threads are limited; and on the amount of operating system resources such as file descriptors or memory.
Configuration options in nevisAuth
Java heap size
Configuration via JAVA_OPTS in the env.conf configuration file.
Recommendation | JAVA_OPTS |
---|---|
Default value | -Xms256m -Xmx1024m |
Production system | -Xms2048m -Xmx2048m |
Production system with substantial workload | -Xms4096m -Xmx4096m |
- The authentication flows in nevisAuth are highly customizable; therefore, depending on your configuration, memory requirements can be very different.
- It is recommended to do performance tests in a test environment to determine the exact needs of your production environment.
- In Kubernetes Cluster setups, the sizing is not handled with JVM arguments but with Kubernetes Limits.
Worker threads
By default, nevisAuth uses a maximum of 200 worker threads to process incoming requests. This can be changed in the server configuration with the property max-threads
.
The most common reason for the number of threads to reach maximum is:
- Threads blocking each other
- Bottlenecks in certain components
- Number of incoming requests require more parallel processing
The worker thread value should not be changed. When modified, only in small incremental changes should be applied.
Using a value above 500 worker threads is not recommended.
Detecting number of used threads
Two methods are available for detecting the number of used threads:
Configuring the
jcan.Op
log category to INFO, for example see here.infoEvery
jcan.Op
log line contains a cR=<number>
which shows the number of working threads used at that time.It is usually very beneficial to draw a graph how the cR looks over time, using a tool such as Splunk.
You can also check the number of currently used threads on the OS level. Use the following command:
ps -o nlwp <PID> | tail -1 | xarg
- Java has different mechanisms which use threads, garbage collection for example, so the number of threads overall for the process will be higher than the max number of worker threads.
- The worker threads are in a pool therefore they are ramped up only once. The currently running ones are occupied with processing previous requests.
- Once the load subsides, the pool slowly ramps down. cR values in the
jcan.Op
fluctuate a lot because that will not consider idle threads in the pool. - Ramping up threads is CPU and memory intensive
- Once the load subsides, the pool slowly ramps down. cR values in the
Searching for bottlenecks
To verify if nevisAuth responses are slow, configure the
jcan.Op
log category to INFO. For more information, see Logging configuration.Once you have those
jcan.Op
log lines, you can see the response time in log line atdT=<response time>ms
.To identify slow parts inside nevisAuth, use the following loggers on INFO:
AuthPerf will display the processing time of each in DbPerformance will display response times for Remote session store and Out of context data store.
Detecting blocked threads
Bad nevisAuth response times together with low CPU and memory usage indicate blocking between working threads.
To investigate blocked threads, do the following:
Run the following command to create a thread dump:
sudo -u nvauser jstack -l <PID> > /tmp/threaddump.txt
cautionjstack requires JDK, which is not installed on all systems by default. It is delivered in the nevisAppliance package.
Open and analyze /tmp/threaddump.txt.
infoContact nevisAuth support if you are having trouble analyzing the thread dump.