| Polygraph: Traffic Model |
|---|
| Poly Documentation |
This page explains the traffic model supported by Web Polygraph. Several
components are described and a ``putting-it-all-together'' example is given.
1. Overview
Traffic model is responsible for producing a stream of HTTP requests and
replies with certain properties.
1.1 Components
The model has several components:
Some components depend on each other while other are fully independent. ``A depends on B'' is defined as
Changing A's properties affects B's characteristics as perceived by an ideal proxy.For example, changing request submission rate does not affect temporal locality while temporal locality may affect popularity of objects.
Here is an outline of the algorithm that is used to simulate the request stream. We concentrate on simulating Object Ids and omit details about actual URL generation. For simplicity, we also assume that all components are enabled.
The acronyms/abbreviations on the left are for the components responsible for the corresponding step.
HR decide if we should generate a hit or a miss
if (hit) {
repeat(cachable)
} else {
CR decide if we should generate a cachable or uncachable miss
if (cachable) {
generate new cachable Object Id
} else {
HR decide if we should repeat an uncachable Id or not
if (repeat)
repeat(uncachable)
else
generate new uncachable Object Id
}
}
TRL schedule the next repeat for the generated Object Id
return generated Object Id
The repeat procedure mentioned above looks like this:
TRL search the first N Ids in the schedule for an un/cachable Id if (found) return found Id else OP return any old un/cachable Id
The scheduling of the next access for a given Object Id
procedure is simple:
generate an appropriate inter-request distance scan N schedule slots around the generated distance and look for an empty slot if (found) schedule next access for Id in that slot else do not schedule next access for this Id
The Object Popularity Model determines the popularity of an Object Id or frequency of Id's appearance in a stream.
See the algorithm above for the steps when the Object Popularity Model kicks in. Since the model is not used in some steps, the actual popularity of objects is affected by other steps and, in general, will not look exactly like the one implied by the model. Note that with positive hit ratio, the number of ``known'' or ``old'' Object Ids grows with time. The latter also affects the distribution.
Currently supported popularity models are:
When an old Id is requested, the id is selected among previously
generated Object Ids using classic Zipf's law (i.e., P[id] ~
1/id).
When an old Id is requested, the id is selected among previously generated Object Ids using a uniform distribution. Hence, all old Ids have equal chances of being selected during one selection.
Popularity models are selected using the --pop_model option.
3. Temporal Locality
Temporal Locality Model determines the distribution of distances between requests for the same Object Id. The distances are measured in number of requests rather than time.
To give some freedom to other models, the choice of the next slot for an Object Id is fuzzy. The model selects several ``candidate'' slots around the best location determined by the locality distribution. The number of candidates is controlled by the --tmp_loc_delta option.
The distribution for the Temporal Locality Model is selected using the --tmp_loc option. According to
some research studies, lognormal distribution seems to model temporal locality
well.
The model keeps a schedule or plan of Object Ids to be requested. The
schedule capacity is controlled by the --tmp_loc_depth option.
4. Hit Ratio
Hit Ratio Model decides whether the next request should be for some old Object Id or for a brand new Id.
Note that in some cases, emitting two requests for the same object does not result in a hit (e.g., if an object is uncachable).
Document (object) hit ratio is controlled by the --dhr option.
5. Cachability Ratio
Some HTTP responses are cachable and some are not. Cachability ratio determines the portion of cachable replies (not unique objects!) as seen by the client.
An object preserves its cachability status during a single Polyclt execution.
Cachability ratio is controlled by the --rep_cachable option on
the client side.
6. Request Timing
There are two timing options supported: The default ``Best Effort'' request rate and an optional ``Constant'' request rate. In ``Best Effort'' mode, a robot submits the next request right after the reply to the previous request was received. ``Constant'' request rate emulates Poisson stream of requests with a given mean throughput; the submission of the next request does not depend on the reply status.
It is important to understand that Best Effort workload introduces a tight dependency between throughput and response time. In general, those two metrics should be independent. With Constant workload, throughput is virtually independent from response time, given you have enough resources to support all pending transactions. For an example of the importance of this distinction, visit on of our Watchdog page.
See --robots and --req_rate options for more
info.
Think time for Best Effort requests can be specified using the --xact_think option.
7. Reply Timing
Polyserver replies as fast as it can unless a delay distribution was
specified using the --xact_think option on the
server side.
8. Example
The configuration below will produce a Constant Rate request stream with Zipf Popularity Model and a mean inter-request distance of 20 requests (with 3 object ids to choose from). Hit ratio will be kept at 55% level and 80% of replies will be cachable. There will be a 3 seconds mean reply delay on the server side.
./polyclt ... --robots 1 --req_rate 20/sec --pop_model zipf --tmp_loc logn:1000,500 --tmp_loc_delta 2 \ --dhr 55p --rep_cachable 80p ./polysrv ... --xact_think norm:3sec,1.5sec
Options irrelevant to the discussion were omitted.
$Id: traffic.sml,v 1.7 1999/06/17 20:58:47 rousskov Exp $