Polygraph: Traffic Model

Poly Documentation

This page explains the traffic model supported by Web Polygraph. Several components are described and a ``putting-it-all-together'' example is given.

1. Overview

Traffic model is responsible for producing a stream of HTTP requests and replies with certain properties.

1.1 Components

The model has several components:

Some components depend on each other while other are fully independent. ``A depends on B'' is defined as

Changing A's properties affects B's characteristics as perceived by an ideal proxy.
For example, changing request submission rate does not affect temporal locality while temporal locality may affect popularity of objects.

1.2 The algorithm

Here is an outline of the algorithm that is used to simulate the request stream. We concentrate on simulating Object Ids and omit details about actual URL generation. For simplicity, we also assume that all components are enabled.

The acronyms/abbreviations on the left are for the components responsible for the corresponding step.

HR	decide if we should generate a hit or a miss
	if (hit) {
		repeat(cachable)
	} else {
CR		decide if we should generate a cachable or uncachable miss
		if (cachable) {
			generate new cachable Object Id
		} else {
HR			decide if we should repeat an uncachable Id or not
			if (repeat)
				repeat(uncachable)
			else
				generate new uncachable Object Id
		}
	}
TRL	schedule the next repeat for the generated Object Id
	return generated Object Id

The repeat procedure mentioned above looks like this:

TRL	search the first N Ids in the schedule for an un/cachable Id
	if (found)
		return found Id
	else
OP		return any old un/cachable Id

The scheduling of the next access for a given Object Id procedure is simple:

	generate an appropriate inter-request distance
	scan N schedule slots around the generated distance and
		look for an empty slot
	if (found)
		schedule next access for Id in that slot
	else
		do not schedule next access for this Id

2. Object Popularity

The Object Popularity Model determines the popularity of an Object Id or frequency of Id's appearance in a stream.

See the algorithm above for the steps when the Object Popularity Model kicks in. Since the model is not used in some steps, the actual popularity of objects is affected by other steps and, in general, will not look exactly like the one implied by the model. Note that with positive hit ratio, the number of ``known'' or ``old'' Object Ids grows with time. The latter also affects the distribution.

Currently supported popularity models are:

Zipf

When an old Id is requested, the id is selected among previously generated Object Ids using classic Zipf's law (i.e., P[id] ~ 1/id).

Uniform

When an old Id is requested, the id is selected among previously generated Object Ids using a uniform distribution. Hence, all old Ids have equal chances of being selected during one selection.

Popularity models are selected using the --pop_model option.

3. Temporal Locality

Temporal Locality Model determines the distribution of distances between requests for the same Object Id. The distances are measured in number of requests rather than time.

To give some freedom to other models, the choice of the next slot for an Object Id is fuzzy. The model selects several ``candidate'' slots around the best location determined by the locality distribution. The number of candidates is controlled by the --tmp_loc_delta option.

The distribution for the Temporal Locality Model is selected using the --tmp_loc option. According to some research studies, lognormal distribution seems to model temporal locality well.

The model keeps a schedule or plan of Object Ids to be requested. The schedule capacity is controlled by the --tmp_loc_depth option.

4. Hit Ratio

Hit Ratio Model decides whether the next request should be for some old Object Id or for a brand new Id.

Note that in some cases, emitting two requests for the same object does not result in a hit (e.g., if an object is uncachable).

Document (object) hit ratio is controlled by the --dhr option.

5. Cachability Ratio

Some HTTP responses are cachable and some are not. Cachability ratio determines the portion of cachable replies (not unique objects!) as seen by the client.

An object preserves its cachability status during a single Polyclt execution.

Cachability ratio is controlled by the --rep_cachable option on the client side.

6. Request Timing

There are two timing options supported: The default ``Best Effort'' request rate and an optional ``Constant'' request rate. In ``Best Effort'' mode, a robot submits the next request right after the reply to the previous request was received. ``Constant'' request rate emulates Poisson stream of requests with a given mean throughput; the submission of the next request does not depend on the reply status.

It is important to understand that Best Effort workload introduces a tight dependency between throughput and response time. In general, those two metrics should be independent. With Constant workload, throughput is virtually independent from response time, given you have enough resources to support all pending transactions. For an example of the importance of this distinction, visit on of our Watchdog page.

See --robots and --req_rate options for more info.

Think time for Best Effort requests can be specified using the --xact_think option.

7. Reply Timing

Polyserver replies as fast as it can unless a delay distribution was specified using the --xact_think option on the server side.

8. Example

The configuration below will produce a Constant Rate request stream with Zipf Popularity Model and a mean inter-request distance of 20 requests (with 3 object ids to choose from). Hit ratio will be kept at 55% level and 80% of replies will be cachable. There will be a 3 seconds mean reply delay on the server side.

./polyclt ... --robots 1 --req_rate 20/sec 
	--pop_model zipf --tmp_loc logn:1000,500 --tmp_loc_delta 2 \
	--dhr 55p --rep_cachable 80p
./polysrv ... --xact_think norm:3sec,1.5sec

Options irrelevant to the discussion were omitted.


$Id: traffic.sml,v 1.7 1999/06/17 20:58:47 rousskov Exp $