Privacy and Encrypted Traffic Classification: Traffic classification

The overwhelming majority of network traffic is carried over the HyperText Transfer Protocol (HTTP), and more than 99% of HTTP traffic is protected using the Transport Layer Security (TLS) protocol. As a result, the data type and its content are hidden from third-party observers. In other words, information about the type of traffic (video traffic, audio traffic, web traffic, etc.) or the service (Yandex, Twitch, VK, etc.) to which a particular flow belongs is not transmitted in plain text. However, this information is required by mobile network operators and Wi-Fi network administrators to ensure a high Quality of Service (QoS) or to provide free access to specific resources. It is also used to block malicious or legally prohibited traffic.
Most existing traffic classification algorithms, including those developed in the Laboratory, aim to classify a single flow based on the transmitted data. In doing so, individual flows are considered independent, even if they are generated by the same application. However, in practice, a single application often generates multiple related flows. For example, traffic analysis of AR/VR applications shows that existing applications may use several separate flows to transmit video content. Therefore, the Laboratory’s researchers have developed a new algorithm that takes into account the relationships between flows generated by the same application, which has significantly improved traffic classification accuracy. Moreover, this algorithm outperforms classifiers that determine traffic categories based on TLS handshake metadata in scenarios with a limited set of unencrypted data fields, such as those that arise when using the new Encrypted Client Hello extension to the TLS protocol.
