| PolyDocs: User-defined Distributions |
|---|
| Web Polygraph |
This page describes how to specify arbitrary distributions in Polygraph. Documentation has been synchronized with Poly 1.1p0.
Polygraph has many built-in distributions: normal, exponential, Zipf(1), constant, etc. In some cases a user-defined distribution is required. You can instruct Poly to use an arbitrary distribution by specifying a distribution pdf or value frequency histogram. Such a histogram is placed in a separate file. File format and command line usage are described below.
The current format is line-based. That is, syntax elements cannot span lines.
Here is a simple example of a distribution called ``PConnDream''. This distribution might be used for specifying the use limit for persistent connections.
# comments are allowed
NumDistr PConnDream = { # mandatory header
1 51.0 # value 1 has frequency 51
2 28.7
3 13.3
4 2.3 # value 4 has frequency 2.3
5 1.3
6 0.7
7 0.4
8 */100 # value 8 absorbs the rest (out of 100 total)
} # closing bracket is required!
As you may have already guessed, the ``frequency'' column may have arbitrary values. The sum of those values does not have to add up to 100. One can use percents, probabilities, actual counters, etc. as frequencies. Poly will simply sum all the values and take that sum as a ``100%'' equivalent.
The last bin of the PConnDream histogram is interesting. We used percents as frequency values (Poly did not know that!). We were lazy to calculate how many percents were left after the first 7 entries, and just told Poly to do the math for us. Note that we had to specify the total of 100 in the wild-card entry.
Also note that Polygraph requires that the type of histogram values is specified. For persistent connections, the type is simply a ``number'' (NumDistr). For time- or size-based histograms, one has to specify TimeDistr or SizeDistr type. For now, TimeDistr histograms use seconds as a unit. SizeDistr histograms use byte.
Here is a more complex example. The distribution below was adopted from real response time measurements on a cache server.
TimeDistr client_http_svc_time = {
1.017:1.943 2 # [min:max) range!
:3.069 34 # max above becomes current min
:4.050 303 # [3.069 : 4.050)
:4.938 792 # [4.050 : 4.938)
.... # many lines snipped
:870251.625 9
:918521.893 9
:969469.420 14
:1023243.301 6 # [969469.420 : 1023243.301)
}
For client_http_svc_time, we used real counters to represent frequencies to avoid boring recomputation into percents or probabilities.
As the example above illustrates, you can specify ranges of values and ``borrow'' maximum values from preceding lines. Note that a single value (as opposed to a range), say N, produces (for the purpose of borrowing only!) a maximum value of N+1. You may find this behavior ``natural'' for some applications:
SizeDistr reply_sizes = {
0 .01 # zero sized replies
:1025 .30 # a [1,1025) range, 0 is not included!
:2049 .15 # a [1025:2049) range
....
1048576 .02 # a 1MB reply precisely: [1MB : 1MB]
:2097152 .01 # replies in [1MB+1byte : 2MB) range
}
A user-defined distribution can appear on the command line anywhere a distribution is expected. For example:
$ ./polysrv --pconn_use_lmt const:20000 ... $ ./polyclt --pconn_use_lmt tab:/tmp/pconnDream.pgd ...
A special distribution name ``tab'' will instruct Polygraph to load the
distribution for the --pconn_use_lmt option
from the specified file (/tmp/pconnDream.pgd).
$Id: tabdistr.sml,v 1.2 1999/04/27 19:03:27 rousskov Exp $