Abstract. Machine Learning (ML) has become a valuable asset to solve many real-world tasks. For Network Intrusion Detection (NID), however, scientific advances are still seen with skepticism by practitioners. This disconnection is due to the intrinsically somewhat limited scope of research papers, many of which primarily aim to demonstrate new methods “outperforming” prior work—oftentimes overlooking the practical implications for deploying the proposed solutions in real systems. Therefore, the value of ML for NID depends on a plethora of factors, such as hardware, that are often neglected in scientific literature.
This paper aims to reduce the practitioners’ skepticism towards ML for NID by changing the evaluation methodology adopted in research. After elucidating which factors influence the operational deployment of ML in NID, we propose the notion of pragmatic assessment, which enable practitioners to gauge the real value of an ML method for NID. Then, we show that the state-of-research hardly allows one to estimate the value of ML for NID. As a constructive step forward, we carry out a pragmatic assessment. We re-assess existing ML methods for NID, focusing on the classification of malicious network traffic, and consider hundreds of configuration settings, diverse adversarial scenarios, and four hardware platforms. Our large and reproducible evaluations enable estimating the quality of ML for NID. We also validate our claims through a user-study with security practitioners.