Failure Safety and Load Balancing
This section shows how nevisAuth implements failure safety and load-balancing. It also briefly discusses the global session store.
Failure safety
nevisAuth manages a global session. As a consequence, the server is stateful and failure safety support requires that the corresponding session cache is synchronized, so that failover-aware clients are able to access it at a different physical location. The failure safety pattern in nevisAuth is based on a vertical line concept with horizontally paired failure safety. The shows this pattern:
The design in the shown pattern has the following advantages:
- The cost of failure safety does not increase dramatically with the number of nevisAuth server instances (synchronization overhead multiplies if the number of synchronized nodes is larger than 2).
- Configuration remains maintainable.
The restrictions include:
- Session stickiness between clients and nevisAuth: For optimal performance and to prevent session loss, an established session context (e.g., the user is authenticated and the corresponding channel is associated with a client, i.e., a reverse proxy instance) should be handled over the same communication channel. In case of a failure, the client failover (e.g., the reverse proxy) should address the nevisAuth instance that holds the backup of the session state (slave instance). Failover to another nevisAuth instance will result in an error (unknown session exception).
- Session stickiness between clients and reverse proxy: Session loss arises when the load balancer in front of the reverse proxy does not route the user's established channel to the same reverse proxy (e.g., based on the SSL session ID, cookies or source IP address).
The. The figure shows the following aspects:
- Session changes may be propagated asynchronously with a small delay to reduce network traffic.
- Sessions are pulled when they are not found locally but need to be present.
- If a session is terminated, only the instance that received the event sends notification events. The slave instance just terminates the session without notification.
In the figure above,
- createSession is equivalent to a user login.
- joinSession is equivalent to any session access.
- killSession is equivalent to a user logout or a session time-out.
Load balancing
nevisAuth gets about two to ten hits per user session (e.g., a login, a step-up, possibly three to four session joins plus some votes and a logout). In other words, failure safety is a big issue (as nobody is able to work when login is not possible), while load balancing is far less important. When considering the session synchronization above (see sync messages in sequence diagram), a client may distribute calls to the two nevisAuth instances. If every (distributed) client does this, session synchronization delays may lead to sessions that have been modified on both instances. In this case, synchronization fails as sessions are protected (using an optimistic locking scheme). This problem is solved by synchronous synchronization (setting the synchronization delay to 0 seconds), but this increases the overhead and callers are forced to wait a little bit longer.
Global Session Store
To be able to synchronize sessions between nevisAuth instances, nevisAuth must store already authenticated sessions in a JDBC database. This allows session sharing within nevisAuth clusters of arbitrary sizes. See Session management for details on how to configure the JDBC global session store.