Skip to main content
Version: 8.2405.x.x RR

Probability density based model

In the normalization model described here we make the assumption that the plug-in risk scores are statistical independent: p(r) = p1(r1) ·p2(r2) ... · pD(rD). The normalized risk score is defined as rnormalized = 1 - p1(r1) ·p2(r2) ... · pD(rD).

Since we would like to make full use of the interval from 0.0 to 1.0 for the normalized risk score, we scale p1(r1) ·p2(r2) ... · pD(rD) by it's mode: rnormalized= 1 - p1(r1) ·p2(r2) ... · pD(rD) / (mode1 ·mode2 ... ·modeD) , with modei as the mode of the density pi .

We estimate the single densities pi from the observed data by a kernel density estimator using a Gaussian kernel. The bandwith hi of the kernel estimating pi is by default chosen as: hi = 3 ·VAR^ (pi ) , where VAR^ denotes the empirical variance of the observed data. (The factor 3 can be replaced by any other value by configuration.)

The following pictures show 657 data points of BehavioSecSession and BehavioSecTransaction plug-in risk scores (denoted by r1 and r2) from a nevisDetect test system and a level plot of the trained normalization:

Note that the data are not realistic, since the test system has frequently being used for demonstration purposes with a confidence threshold of 0.0.

Probability density based model

The advantages of the probability density based model are:

  • Training the model is fast, stable, and also suited for a large training data set.
  • The Proximity property is fulfilled.

The disadvantages are:

  • The assumption of statistical independence of the plug-in risk scores does not hold.
  • The desired property of Monotonicityis in general not fulfilled.