Frequently Answered Questions

Web Polygraph

We will categorize this list as it gets longer... Please see the User Manual for system requirements, installation procedure, and sample runs. Additions and corrections are more than welcome.

$Date: 2001/05/29 19:25:17 $

General Questions

Questions specific to Poly 1.x

Questions specific to Poly 0.0


General Questions

What is Web Polygraph?

Polygraph is a set of programs that simulate Web clients and servers. Polygraph can be configured to send HTTP requests through a proxy. High-performance simulation allows to stress test various proxy components. The benchmarking results can be used for tuning proxy performance, evaluation of caching solutions, and for many other interesting activities.

What tests should I run?

You can find standard proxy benchmarking workloads at http://polygraph.ircache.net/Workloads/ page. A somewhat outdated http://polygraph.ircache.net/strategies.html gives ideas about micro-level benchmarks you may want to run. Finally, read IRCache cache-off reports for methodology insights and caveats.

Describe Poly-traffic model

See our Traffic Model page for details. The following [relatively old] conversation may help as well.

> I was wondering if you could give me a general description of what
> clients do in your simulation system. Presumably, they generate web
> requests - do they follow a Zipf distribution in the types of
> documents they are requesting?

URLs are generated according to the following structure:

        http://origin_host/_obj_id/world_id/world_type

The URL generator depends on command line options.

World Id is a string constant configured using --world_id option. World Type is a string constant configured using --world_type option. [World Id is an integer constant configured using --world option in Poly 0.0, and URL Signature is 0.0 equivalent of World Type]

Object Id is a Zipf(1) distributed integer. [In Poly 0.0, the distribution is set to Zipf(1) using --rnd option (default) or to ``sequential'' using --seq option].

For Zipf, the ``world capacity'' (the total number of objects in the ``world'') is set using --world_cap option and defaults to (2^31-1) objects which is also the maximum supported value. [--world and (2^30) in Poly 0.0]

Finally, you can also use --unique_urls option to get Zipf distributed object ids but unique URLs:

        http://origin_host/_obj_id/world_id/world_type/unique_id
Unique ids are useful to simulate miss-only workloads. Note that in this mode, ``uniqueness'' is achieved by simply appending a counter to a URL. Hence, uniqueness is not guaranteed across runs (on purpose!); use World Id for that.

Note that object id is used on the server side to determine object size. Thus, (--unique --rnd) is preferred to (--seq).

What happens when I simulate two or more robots?

Do all robots generate the same exact streams? No. That would make most workloads pretty useless.

"Robots" share the random number generators. Thus, the "total" stream of N robots is close to a stream from a single robot with factor N higher submission rate. This means that the most popular document for robot 1 is also the most popular document at robot 2, and, continuing on, the i-th most popular doc at robot 1 is also the i-th most popular document at robot 2...

Also, note that experiments are repeatable if the same parameters are used. Because of network delays, there is no 100% guarantee that the order of requests will be exactly the same, but it should be very close.

For most caches, the actual requester of a popular object is not likely to affect the performance. Moreover, when the simulated robots come from the same IP, they probably look as one robot to a proxy.

With Poly 1.x, you can support different "kinds" of robots by starting several client processes (with different parameters like world_id), preferably on several machines. Poly 2.x allows you to simulate different robots within one polyclt process.

Describe object size distribution

Reply sizes are distributed exponentially with a configurable mean (--rep_size option) that defaults to 13 KB. There is an upper limit that increases with object id (so the most popular object will not end up being 1MB in size, by chance).

Because of object popularity and size limits the actual mean is usually around 11 KB with the default settings. The polygraph reports actual size statistics and histogram of reply sizes after the completion of an experiment.

What special content markers are used to optimize end-of-transmission checks?

> The system call trace of the polygraph client shows that there is
> no system call between a read and a close on a file descriptor. From
> this, I gather that the logic used for closing the connection on the
> client's side either depends on the content length or a search for
> delimiters. Could you explain the code/logic which the client uses to
> shut down the connection to the web server or proxy?

Polygraph uses no special end-of-content markers or delimiters. The size of a reply is determined using Content-Length HTTP header. Poly will close a socket when the entire HTTP message has been received. Poly assumes that requests have no body.

What platforms are supported?

Polygraph is written with portability in mind. Most compilation problems are due to non-POSIX or non-conventional C interfaces (e.g. signed-ness of system call parameters). The development platform is FreeBSD. We have verified that Poly 1.0p0 compiles and works on simple tests on the following platforms:

The compilation warnings on Linux mentioned above are probably harmless. You can edit include/config.h and xstd/include/xstd.h to remove FD_SETSIZE redefinition. We do not know how to eliminate those warnings without adding a ./configure script to the distribution (which we may do eventually).

If you discover other portability problems, please let us know.

Where can I get pre-compiled binaries?

Sorry, we do not distribute binaries. If you want to provide binaries for others, we will be happy to put a link to your archive.

Note that a binary file compiled on one system may not work correctly on similar systems because of the differences in system wide settings like maximum number of file descriptors and such. Those discrepancies may produce subtle bugs that are very hard to identify.

Finally, installing a gcc/g++ distribution is quite doable in most environments.

Does Poly support HTTP/1.1? Persistent connections?

Polygraph emulates HTTP/1.0 clients and servers. Supported HTTP/1.1 features include Cache-Control message header field and persistent connections. Polygraph clients do not use pipelining of the requests.

Does Poly test ICP or other non-HTTP protocols?

No. Poly-servers and clients talk exclusively HTTP. There are no plans to add ICP or FTP support in the foreseeable future.

How do I report a bug or suggest a feature?

Please check with this FAQ and the User Manual first.

Bug reports should be e-mailed to polygraph-bugs@ircache.net. For compiling problems please specify the version of your C++ compiler and operating system. A stack trace is essential for coredump reports.

The best place to discuss new features and possible applications of polygraph is our mailing list, polygraph@ircache.net.

Questions specific to Poly 1.x

What are the essential advantages of 1.x?

Here is an incomplete list.

Where the hell are all the stats?!

Poly 1.x can log statistics and other info in two places. First, high-level stats are dumped on the ``console'' (standard output). The verbosity level can be changed using the corresponding option. The console output can be redirected into a file.

If you need more than a simple request-per-second trace, you should enable binary log (off by default). A lot of detailed stats will be dumped into the specified file in a compact binary format. All the comments displayed on the console will be stored in this log as well. Moreover, even the console output suppressed by the verbosity level option will be stored. Note that all information is kept in memory and is flushed to a file at the end of a run.

Use Log Reader (lr) and Log Extractor (lx) programs from Polygraph distribution to read binary logs.

Console output is convenient for run-time monitoring and debugging. Binary log should be used for report generation and long term storage.

Console output shows no req/sec trace.

Increase verbosity level using --verb_lvl option. The default level is zero.

Log Reader shows console stuff only.

Working on it. The stats are logged, but we need to write more code to extract them.

I get a lot of warnings and errors compiling Polygraph

These are common reasons for a flood of error messages during build:

Note that Polygraph does not use fancy C++ so any decent C++ compiler should suffice. G++ works great for us.

I get a lot of undefined symbols when linking on Solaris

Polygraph is known to compile and link with some minimal efforts on Solaris. Some users reported that using a recently patched box helps. Most problems occur for two reasons:

  1. Standard Sun linker cannot find or does not link with g++lib or similar: Consider using GNU compiler and linker or specify the required standard C++ library manually. Standard C++ library (not to be confused with STL!) is needed for I/O stream support.
  2. Math and/or network related functions are missing: Try to add the following to your LDFLAGS variable in Makefile:
    		LDFLAGS   = -lm -lxnet
    	
    Some people had better luck with adding socket and nsl libraries instead of xnet.

Questions specific to Poly 0.0

Polygraph 0.0 dumps core on Solaris

First release of Polygraph (0.0p0) had a bug that caused coredumps shortly after program execution on Solaris. The problem was fixed in version 0.0p1.

sysctlbyname() is not available on XXX

Polygraph version 0.0p2 uses sysctlbyname() function that is not portable. The easiest fix is to replace calls to that function in Gadgets.cc with -1. The bug is fixed in later versions of Polygraph.

Have you implemented your own threads library?!

	> In terms of the overall design, I saw that you implemented your own
	> threads package (for platform independence, I would assume).

Yes and no. There are threads, but they have nothing to do with CPU scheduling. It was just a convenient base to implement the rest of the Polygraph on. Essentially, a thread is a piece of some "logically concurrent" activity. For example, one "transaction" (e.g., send request and receive reply) is implemented using about six(!) threads.

From the OS point of view there is just one solid process, no fancy CPU scheduling...

We were not very excited about this design. Threads were eliminated in Poly 1.0.


$Id: index.html,v 1.8 2001/05/29 19:25:17 wessels Exp $