New generation tomography measurement in the Internet
Description of the measurement
The basic idea of Network tomography is to perform active measurements that do not require special coordination from the network operators and cause limited traffic overhead. These measurements are related to the unobserved quantities of interest such as delays inside the network. Our main goal is to set up a working piece of scalable service, which monitors network state via investigating packet delay fluctuations in a large tree structure in real-life network spanning among etomic measurement nodes.
The capture equipment and the statistical inference technique is improved
so that the quality of the extracted link level delay is comparable to
the passively measurable information.
Measurements are carried out regularly to collect historical records
of the given segment of the European Internet.
Corelated end-to-end measurements serve the basis of resolvable internal
Trains of back-to-back packets are constructed to measure the network. Each packet of the train travel to different destinations, the first packet is looped back to the DAG card to serve as a time reference. Between the packet-trains certain delay is kept to avoid buffer overflow caused by the measurement stream itself (100 ms). Measurement data are downloaded periodically to the database.
The difficulty in the evaluation is that in most cases no analytical solution can be given to the problem and numerical computation requires a summation over all possible combination of internal delays in the network, which means a non-polynomial computational complexity. Implementaions of the simplest Expectation-Maximization (EM) techniques will not scale well with the size of the network.
We introduce a highly reliable estimator through a fast computation of polynomial complexity. The recursive formulae of the EM algorythm step are mapped to a graphical problem. One EM step is interpreted as a walking up and down the tree and the calculus of local conditional probability formulae are calculated, which sum over a smaller domain.
Moreover, our implementation gives a preview of the inferred inner link delay distributions available in a few seconds and if time and CPU resources allow, the delay distributions are refined at higher resolutions.
FIG:scale-bin.eps,scale-L.eps Performance analysis of the ML implemented. LEFT: Varying the time resolution of the distributions ie. increasing the number of bins is investigated. The total runtime of the algorithm scales better than O(B^2). We show the results for the tree graph of the source node "COLB". RIGH: Scaling analysis. The runtime of the ML algorithm is tested against the number of receiver nodes in the network. Results for the sender node "COLB" are shown. The runtime increases approximately linearly with the number of leaves in the tree. The variance of the runtime denoted by the error bars grows even slower.
In the past decade it became obvious that Internet has several interesting fractal and scaling properties, such as the self-similar nature of traffic and the power law scaling of the network topology. The accelerated inference algorithm now makes it possible for us to get higher resolution distributions than it was possible in our earlier works and to find interesting new scaling properties in our data.
FIG:ELTEfa.eps A typical tree graph with the inferred queueing delay distributions. Only the nontrivial distributions are shown. Hop distance between neighboring routers are labelled. Branching nodes, anonymized as 1,2,3,4,9 and 11, are in the high speed (~10 Gbps) part of the academic research network GEANT. The queuing delays between these routers are below the resolution of the measurement and are not shown.
FIG:elte-0-1+.eps An example of queueing delay distribution for a single up-link queue determined by our new iterative algorithm from high temporal resolution delay data. INSET: The logarithm of the complementary cumulative distribution of the Weibull distribution P(X>x)=\exp(-bx^a) is a power-law function. The logarithm of the measured complementary cumulative distribution is shown on a log-log scale to derive the parameters of the Weibull approximation. The slope of the fitted line yields a, which is related to the Hurst-exponent a=2(1-H) and H=0.79.
FIG:elte-2-5.eps Inferred queueing delay distribution for the compound link 2-ERIC. This is a typical down-link from the high speed core down to the 100 Mbps local ethernet of the measurement station "ERIC". A fitted Weibull function is also shown.
A compound link consists of many hops, whose distribution is a convolution of the distributions of the hops within the compound link. The distribution is similar to a typical one-way end-to-end delay distribution. This distribution can also be well fitted by a Weibull distribution, though the Hurst exponents of the links have no simple relation to the Weibull parameter in this case.
It is possible to analyze together many delay distributions produced in different measurements. Timescales of the distributions range over three orders of magnitude. A good overall representation of the data can be achieved by plotting the standard deviation of the distributions against their averages. The standard deviation scales linearly with the average over a wide timescale range, which is consistent with our previous observation that the Weibull family fits well the individual distributions.
FIG:avg_vs_var.eps The standard deviation as a function of the average of the queueing delay distributions. The straight line is \sigma(x_i)=E(x_i). Standard deviation values are in the 0.5E(X)<\sigma(X)<1.1 E(X) range, except for the smallest averages (coming from the high speed links), where the standard deviation is inaccurate due to the limited resolution of the measurement.
The analysis of the distribution of average queuing delays reveals an interesting scaling behaviour.
FIG:ccd.eps A new scaling law for queuing delays in the Internet. The complementary cumulative distribution of the average queueing delays on a semi-logarithmic plot. The continuous line is a ~-ln(x) distribution corresponding to a ~1/x$ type scaling of the corresponding probability density of average delays.
The corresponding density shows ~1/x type of power law scaling in the same temporal range. The exact reason for this scaling is not yet obvious, a possible explanation can be that the new scaling reflects the scaling properties of link capacities built into the network.