Novel Techniques for Monitoring Network Traffic at the Flow Level

Bok av Eduard Glatz

Research in Internet measurement provides us with new ways to understand, operate and improve the Internet. Learning from network traffic data requires a well-chosen set of analysis techniques. In this book we delve into novel techniques and their application on large data sets to extend the choice of analysis schemes. In particular, we focus on traffic data at the network level that is readily available from commercial routers in the form of flow metadata (e.g. NetFlow) to enable analyzes of ever growing traffic volumes with low demands on the measurement infrastructure.This book consists of two major parts.In a first part, we explore a promising approach to study unsolicited traffic without the need to reserve unpopulated IP address ranges to this task, as has been done in the past. Our approach is to study one-way traffic, i.e., packets that never receive a reply in live networks. We introduce a novel scheme to classify one-way traffic at the flow level into interpretable classes. We validate this scheme based on a data set that we prepare using all informative details available from packet data (e.g. header and payload contents). We use our classifier to shed light on the composition of one-way traffic, and illustrate how the particular class of "Unreachable Services" can be used to passively detect network service outages by processing flow-level traffic data only. Moreover, to obtain a comprehensive view of one-way traffic, we conduct a large-scale study covering eight years of traffic data leading to new insights about the evolution of this exotic piece of traffic over time and space.In part two, we present novel visualization methods following the well-known adage «A picture is worth a thousand words». In particular, we tackle the problems of how to summarize data to extract the most relevant information from big data sets, and how to visualize this information in an easy interpretable way. We envision a top-down workflow that in a first step identifies probably hidden patterns in a data set captured from a potentially large network, followed by a second step that involves a closer inspection of the traffic of individual end systems or subnets. Specifically, we use frequent itemset mining to obtain a list of most relevant patterns from the traffic data of a network that we then visualize through hypergraphs. To drill down to traffic patterns of individual end systems we make use of a graph representation and a domain specific summarization scheme, which is based on the characteristics of typical host roles (e.g. client, server, P2P), to provide a quick overview of what roles an individual host assumes and what applications it runs. We demonstrate the usefulness of our approach by using proof-of-concept implementations in a number of illustrative case studies.