Can, Başarbatu
(2022)
*Online anomaly detection in the Neyman-Pearson hypothesis testing framework.*
[Thesis]

## Abstract

We consider the statistical anomaly detection problem with particular attention to false alarm rate (or false positive rate, FPR) controllability, nonlinear modeling and computational efficiency for real-time processing. A decision theoretical solution to that problem can be formulated in the Neyman-Pearson (NP) hypothesis testing (anomaly vs nominal) framework in which the detection power is maximized subject to a user defined upper-bound constraint on the FPR. We first introduce a novel NP classifier (NP-NN) that obtains the NP test in a data driven online manner. The introduced classifier (i) infers an asymmetrical cost structure that matches the desired FPR and (ii) minimizes the resulting empirical risk, i.e., the weighted average of type I and type II errors. These two steps are conducted jointly with a single Lagrangian max objective. A compact and computationally efficient nonlinear modeling is achieved based on a single hidden layer feedforward neural network. The first layer initially expands the kernel space of the radial basis function (RBF) with sinusoidal activation, i.e., Fourier features, the second layer follows to classify, and the overall network is sequentially optimized (subject to the NP constraint) via stochastic gradient descent updates. The data processing in our algorithm is truly online, and appropriate for large scale data applications as it only has linear computational and constant space complexity with respect to the data size. Based on many publicly available and widely used benchmark datasets, our algorithm is experimentally shown to be superior to the state-of-the-art techniques with precise FPR controllability, either by outperforming in terms of the NP classification objeciv tive with a comparable computational as well as space complexity or by achieving a comparable performance with significantly lower complexity. NP-NN has two disadvantages. (i) It requires extensive cross validations for the RBF bandwidth, making it prone to overfitting (underfitting) if the bandwidth is chosen too small (large), and (ii) it has fixed modeling power since the bandwidth is kept constant, causing a model mismatch, i.e., overfitting (underfitting) in the earlier (later) phases of a data stream. In fact, the modeling power (i.e. the degree of nonlinearity) should be gradually increased in online applications as the learnability gradually improves with increasing data. In order to dynamically control the modeling power and mitigate fitting issues, we additionally propose an ensemble NP classifier (Tree OLNP) that is based on a binary partitioning tree. Tree OLNP creates an ensemble of doubly exponential (in the tree depth) number of partitions of the sample space. Each partition corresponds to an online piecewise linear (hence nonlinear) expert classifier and each expert is essentially a union of linear NP classifiers (OLNP, linear version of NP-NN). Partitions with different number of regions define experts with different modeling powers, and we emphasize that any smooth nonlinear class separation can be modeled by a powerful enough (with sufficient number of pieces) piecewise linear expert. For example, the simplest partition/ expert is at the root node with a single region/piece whereas the most powerful one is at the leaves. In this setting, and while maintaining a precise control over the FPR, Tree OLNP generates its overall prediction as a performance driven and time varying weighted combination of the experts in the ensemble. Simpler (more powerful) experts receive larger weights early (late) in the data stream, managing the bias-variance trade-off and hence mitigating the fitting issues as desired. We mathematically prove that, for any stream, our Tree OLNP asymptotically performs at least as well as of the best expert in terms of the NP performance with a regret diminishing in the order O(1/ p t) (here, t is the data size). Tree OLNP is online and its computational complexity scales linearly with respect to both the data size and tree depth, and scales twice logarithmic with respect to the number of experts. We experimentally show that Tree OLNP outperforms the best expert, and largely improves NP-NN by the introduced dynamical control over the modeling power. Even when an algorithm is designed online, running it in real time can still be challenging if the data stream is voluminous. To address this challenge of real time processing, we also consider selective processing via active learning. Using the binary entropy as a measure for the uncertainty in the aforementioned expert ensemble for a given sample, we empower our Tree OLNP with the property of processing a sample only if the uncertainty is high. As a result, while preserving the false alarm controllability, a desirable NP performance is achieved in our experiments with significantly lower number of data instances. We finally introduce a system pipeline to detect abnormal objects/actions from video streams. The introduced video processing system receives frames from a live video stream, extracts the YoLo objects, reduces dimensionality with a denoising autoencoder and detects abnormal activities with active Tree OLNP. Our experimental results, on 3 different datasets from the literature and another one from Sabanci University campus, demonstrate that Tree OLNP is a viable option for detecting abnormalities from video streams.

Item Type: | Thesis |
---|---|

Uncontrolled Keywords: | anomaly detection. -- Neyman-Pearson. -- online. -- nonlinear. -- classification. -- large scale. -- kernel. -- neural network. -- context tree. -- partitioning tree. -- active learning. -- anomali tespiti. -- çevrimiçi. -- nonlineer. -- sınıflandırma. -- büyük ölçekli. -- sinir agları. -- baglam agacı. -- bölümlere ayıran agaç. -- aktif ögrenme. |

Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |

Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Electronics Faculty of Engineering and Natural Sciences |

Depositing User: | Dila Günay |

Date Deposited: | 24 Apr 2023 09:35 |

Last Modified: | 24 Apr 2023 09:35 |

URI: | https://research.sabanciuniv.edu/id/eprint/47036 |