| Frequently Answered Questions |
|---|
| Web Polygraph |
We will categorize this list as it gets longer... Please see the User Manual for system requirements, installation procedure, and sample runs. Additions and corrections are more than welcome.
$Date: 2001/05/29 19:25:17 $
Polygraph is a set of programs that simulate Web clients and servers. Polygraph can be configured to send HTTP requests through a proxy. High-performance simulation allows to stress test various proxy components. The benchmarking results can be used for tuning proxy performance, evaluation of caching solutions, and for many other interesting activities.
You can find standard proxy benchmarking workloads at http://polygraph.ircache.net/Workloads/ page. A somewhat outdated http://polygraph.ircache.net/strategies.html gives ideas about micro-level benchmarks you may want to run. Finally, read IRCache cache-off reports for methodology insights and caveats.
See our Traffic Model page for details. The following [relatively old] conversation may help as well.
> I was wondering if you could give me a general description of what > clients do in your simulation system. Presumably, they generate web > requests - do they follow a Zipf distribution in the types of > documents they are requesting?
URLs are generated according to the following structure:
http://origin_host/_obj_id/world_id/world_type
The URL generator depends on command line options.
World Id is a string constant configured using
--world_id option. World Type is a string constant
configured using --world_type option.
[World Id is an integer constant configured using
--world option in Poly 0.0, and URL Signature is 0.0
equivalent of World Type]
Object Id is a Zipf(1) distributed integer. [In Poly 0.0, the
distribution is set to Zipf(1) using --rnd option
(default) or to ``sequential'' using --seq option].
For Zipf, the ``world capacity'' (the total number of objects in
the ``world'') is set using --world_cap option and
defaults to (2^31-1) objects which is also the maximum supported
value. [--world and (2^30) in Poly 0.0]
Finally, you can also use --unique_urls option
to get Zipf distributed object ids but unique URLs:
http://origin_host/_obj_id/world_id/world_type/unique_id
Unique ids are useful to simulate miss-only workloads. Note that in
this mode, ``uniqueness'' is achieved by simply appending a counter to
a URL. Hence, uniqueness is not guaranteed across runs (on purpose!);
use World Id for that.
Note that object id is used on the server side to determine object
size. Thus, (--unique --rnd) is preferred to
(--seq).
Do all robots generate the same exact streams? No. That would make most workloads pretty useless.
"Robots" share the random number generators.
Thus, the "total" stream of N robots is close to a
stream from a single robot with factor N higher submission rate.
This means that the most popular document for robot 1 is also the most
popular document at robot 2, and, continuing on, the i-th most popular doc at
robot 1 is also the i-th most popular document at robot 2...
Also, note that experiments are repeatable if the same parameters are used. Because of network delays, there is no 100% guarantee that the order of requests will be exactly the same, but it should be very close.
For most caches, the actual requester of a popular object is not likely to affect the performance. Moreover, when the simulated robots come from the same IP, they probably look as one robot to a proxy.
With Poly 1.x, you can support different "kinds" of robots by starting several client processes (with different parameters like world_id), preferably on several machines. Poly 2.x allows you to simulate different robots within one polyclt process.
Reply sizes are distributed exponentially with a configurable mean
(--rep_size option) that defaults to 13 KB. There is an
upper limit that increases with object id (so the most popular object
will not end up being 1MB in size, by chance).
Because of object popularity and size limits the actual mean is usually around 11 KB with the default settings. The polygraph reports actual size statistics and histogram of reply sizes after the completion of an experiment.
> The system call trace of the polygraph client shows that there is > no system call between a read and a close on a file descriptor. From > this, I gather that the logic used for closing the connection on the > client's side either depends on the content length or a search for > delimiters. Could you explain the code/logic which the client uses to > shut down the connection to the web server or proxy?
Polygraph uses no special end-of-content markers or delimiters. The size of
a reply is determined using Content-Length HTTP header. Poly will
close a socket when the entire HTTP message has been received. Poly assumes
that requests have no body.
Polygraph is written with portability in mind. Most compilation problems are due to non-POSIX or non-conventional C interfaces (e.g. signed-ness of system call parameters). The development platform is FreeBSD. We have verified that Poly 1.0p0 compiles and works on simple tests on the following platforms:
The compilation warnings on Linux mentioned above are probably
harmless. You can edit include/config.h and
xstd/include/xstd.h to remove FD_SETSIZE redefinition.
We do not know how to eliminate those warnings without adding a
./configure script to the distribution (which we may do
eventually).
If you discover other portability problems, please let us know.
Sorry, we do not distribute binaries. If you want to provide binaries for others, we will be happy to put a link to your archive.
Note that a binary file compiled on one system may not work correctly on similar systems because of the differences in system wide settings like maximum number of file descriptors and such. Those discrepancies may produce subtle bugs that are very hard to identify.
Finally, installing a gcc/g++ distribution is quite doable in most environments.
Polygraph emulates HTTP/1.0 clients and servers. Supported HTTP/1.1
features include Cache-Control message header field and
persistent connections. Polygraph clients do not use pipelining of the
requests.
No. Poly-servers and clients talk exclusively HTTP. There are no plans to add ICP or FTP support in the foreseeable future.
Please check with this FAQ and the User Manual first.
Bug reports should be e-mailed to polygraph-bugs@ircache.net. For compiling problems please specify the version of your C++ compiler and operating system. A stack trace is essential for coredump reports.
The best place to discuss new features and possible applications of polygraph is our mailing list, polygraph@ircache.net.
Here is an incomplete list.
Poly 1.x can log statistics and other info in two places. First, high-level stats are dumped on the ``console'' (standard output). The verbosity level can be changed using the corresponding option. The console output can be redirected into a file.
If you need more than a simple request-per-second trace, you should enable binary log (off by default). A lot of detailed stats will be dumped into the specified file in a compact binary format. All the comments displayed on the console will be stored in this log as well. Moreover, even the console output suppressed by the verbosity level option will be stored. Note that all information is kept in memory and is flushed to a file at the end of a run.
Use Log Reader (lr) and Log Extractor (lx) programs from Polygraph distribution to read binary logs.
Console output is convenient for run-time monitoring and debugging. Binary log should be used for report generation and long term storage.
Increase verbosity level using --verb_lvl option. The default level is zero.
Working on it. The stats are logged, but we need to write more code to extract them.
These are common reasons for a flood of error messages during build:
Polygraph is known to compile and link with some minimal efforts on Solaris. Some users reported that using a recently patched box helps. Most problems occur for two reasons:
LDFLAGS = -lm -lxnetSome people had better luck with adding socket and nsl libraries instead of xnet.
First release of Polygraph (0.0p0) had a bug that caused coredumps shortly after program execution on Solaris. The problem was fixed in version 0.0p1.
Polygraph version 0.0p2 uses sysctlbyname() function that is
not portable. The easiest fix is to replace calls to that function in
Gadgets.cc with -1. The bug is fixed in later
versions of Polygraph.
> In terms of the overall design, I saw that you implemented your own > threads package (for platform independence, I would assume).
Yes and no. There are threads, but they have nothing to do with CPU scheduling. It was just a convenient base to implement the rest of the Polygraph on. Essentially, a thread is a piece of some "logically concurrent" activity. For example, one "transaction" (e.g., send request and receive reply) is implemented using about six(!) threads.
From the OS point of view there is just one solid process, no fancy CPU scheduling...
We were not very excited about this design. Threads were eliminated
in Poly 1.0.
$Id: index.html,v 1.8 2001/05/29 19:25:17 wessels Exp $