PolyDocs: Run-time Messages

Web Polygraph

This page has been synchronized with Poly 2.2.8.

This page describes run-time messages used by polyclt and polysrv. The scope of this page is human language, text messages rather than cryptic "current statistics" lines also emited by Polygraph. The overall description of the console output is available elsewhere.

Each message has its own minimum verbosity level at which it will be visible on the console. At the time of writing, the maximum verbosity level is 10.

Table of Contents

1. Types of messages
    1.1 Errors
    1.2 Warnings
    1.3 Informational messsages
2. Message descriptons

1. Types of messages

There are three types of messages: errors, warnings, and informational messages.

1.1 Errors

Error messages are prepended with an error: tag. The errors have low minimum verbosity level, usually 1.

Most errors have special numeric identifier called error number or just errno. Idendifiers smaller than 255 usually correspond to system errors that should be documented in your operating system environment. Errors with higher numbers are Polygraph-specific errors and are described in this document.

Most error messages should not be ignored. However, some errors are a part of the normal simulation. Please see individual message descriptions for details on a specific error.

1.2 Warnings

Warning messages are prepended with a warning: tag. The warnings have medium verbosity level, usually 3 and higher.

It is better to pay attention to warning messages as they often indicate that some condition is getting worse. Please see individual message descriptions for details on a specific warning.

1.3 Informational messsages

Informational messages are prepended with a fyi: tag (for-your-information) or have no tag at all. These messages have medium verbosity level, usually 3 and higher.

Most informational messages can be safely ignored. They are useful for monitoring the status and progress of an experiment.

2. Message descriptons

Below are descriptions of most common messages. If a message you are interested in is not documented, please let us know.


Errorforeign request

A "foreign request" error indicates that a Polygraph server has received a request that does not look like anything a Polygraph client (or a proxy on behalf of a client) would produce. To detect a Polygraph client request, servers look for Poly-specific URL format and Poly-specific HTTP extension header fields.

You should not be getting "foreign request errors" on no-proxy runs. If you get this error with a proxy in the loop, you may want to investigate what requests the proxy is sending that the server does not recognize. Enabling the --dump errs option in polysrv may help.


Errorforeign reply

A "foreign reply" error indicates that a Polygraph client has received a reply that does not look like anything a Polygraph server (or a proxy on behalf of a server) would produce. To detect a Polygraph server response, clients look for Poly-specific HTTP extension header fields.

You should not be getting "foreign reply errors" on no-proxy runs. If you get this error with a proxy in the loop, you should investigate what responses the proxy is sending that the client does not recognize. Enabling the --dump errs option in polyclt may help.

The most common source of "foreign reply" errors is a proxy generating a proxy-specific error page. For example, a proxy may report server side connectivity errors or an overload condition. We have also seen products that send a company ad as a part of the response for the first request from a given IP address.


Errorforeign ICP request

ICP request contains valid URL, but that URL was not in Polygraph format. Specifically, Polygraph failed to extract object identifier from the URL. Perhaps a non-Polygraph-aware client is submitting ICP requests to Polygraph agent?


Errorforeign ICP reply

ICP reply contains valid URL, but that URL was not in Polygraph format. Specifically, Polygraph failed to extract object identifier from the URL. This should not happen because Polygraph requests URLs of valid format only.


Errormalformed HTTP request or response line

This error indicates that a Polygraph client received HTTP response headers but could not extract the protocol version or the response code from the headers. For example, an ``HTTP/1.1 200 OK'' response line indicates that the protocol version is ``1.1'' and the response code is ``200''.

You should not be getting "malformed HTTP request" errors during no-proxy runs. If you get this error with a proxy in the loop, you should investigate what responses the proxy is sending that the client cannot parse. Enabling the --dump errs option in polyclt may help.

At this time, Polygraph servers do not emit this error, but that may change.


Errormisdirected request

Polygraph origin server received request with the Host: header field that does not match server's address(es).


Errorforeign host name

The request is for the object at an address (i.e., the host:port pair) that receiving Polygraph process does not manage.


Errorfailed to parse host name

Polygraph failed to parse the host component of the URL in an ICP message. At the time of writing, only IP addresses are recognized and FQDNs are not supported.


Errorpremature end of msg body

Transmission of HTTP message body terminated before the entire message was received. Usually means that the TCP connection was closed before polyclt read the response.


Errorpremature end of msg header

Transmission of HTTP message header terminated after receiving some portion of the messages header but before the entire message header was received.

The following only applies to versions that do not have ``connection closed before sending headers'' error:

The TCP connection got terminated when Polygraph was reading (client and server sides) or about to read (client side) a message header. Usually happens on the client side when a server or proxy is closing the connection without transmitting the response. One reason for the latter is a race condition allowed by HTTP: A server (proxy) may close an idle persistent connection after the client (proxy) sent the request but before the request reached the other end. Various timeouts and overload conditions are other possible reasons.


Errorextra reply data

This message will be documented on demand.


Errorclient closed conn w/o sending data

Client (proxy) has connected to the server and then closed the connection prior to sending any data.


Errorconnection closed before sending headers

The TCP connection got terminated when Polygraph tried to read the beginning of the next message header on a persistent connection. The most likely reason for the latter is a race condition allowed by HTTP: A server (proxy) may close an idle persistent connection after the client (proxy) sent the request but before the request reached the other end. Since this kind of error is normal for HTTP operation, you may ignore small number of them.


Errorno Content-Length header

Client received a response with no Content-Length HTTP header field. All ``200 OK'' Polygraph responses have Content-Length header. All ``304 Not modified'' Polygraph responses do not. However, at the time of writing, the client should not be receiving 304 replies because the client does not send If-Modified-Since requests.

The usual cause of this error are various error pages generated by a proxy (see also: ``foreign reply'' error). Note that polyclt checks for Content-Length header before it checks whether the response is ``foreign''.


ErrorHTTP header is too big

Polygraph ran out of I/O buffer space (16KB) before an HTTP header terminated.


Errorhit on uncachable object

Polygraph request for an uncachable object was satisfied with a cached response.

Polysrv marks uncachable objects with the following HTTP headers:

	Cache-Control: private,no-cache
	Pragma: no-cache

At least one proxy is known not to ignore this header and sometimes return cached objects anyway.


Errorhit on reload request

Polygraph attempt to ``reload'' an object was satisfied with a cached response.

Polyclt marks ``reload'' requests with the following HTTP headers:

	Pragma: no-cache
	Cache-Control: no-cache

Many proxies are known to ignore some ``reload'' requests, especially under peak loads.


Errorfalse hit

Polyclt first request for a cachable object was satisfied with a cached response.

A typical cause is when two concurrent requests for the same cachable object are satisfied in the reverse order, resulting (from polyclt perspective) in a false hit and a false miss.

This is not a real error in most environments, and it's detection is not enabled by default.


Errorserver had to terminate

This message will be documented on demand.


Errorlog buffer is full

This message will be documented on demand.


Errora server-advertised oid has not been requested for a while

Polygraph robots ``reserve'' object identifiers (oids) on servers. In other words, a server pre-allocated oids to be later requested by a robot. If a robot does not request the reserved object for a long time, the server complains.

Polygraph attempts to increase internal oid buffers to keep oid reservations longer, to adapt to current run conditions. If you get just a few of these errors in the beginning of a run, ignore them. If you continue to get the errors despite stable request rate, something is broken. Check that reply rate is close to request rate. That is, check that there is no backlog of unsatisfied requests.

Occasional errors of this kind are also unavoidable if you have other transaction errors. If a request is ``lost'' before reaching the server, polyclt would think it has requested an oid and might not requested it ever again.


Errora non-server-advertised oid has been requested

This message will be documented on demand.


Errorclient ran out of new public oids

Polygraph robots ``reserve'' object identifiers (oids) on servers. In other words, a server pre-allocated oids to be later requested by a robot. If a server cannot keep up with robot reservations, you get this error.

Polygraph attempts to increase internal oid buffers to request more oids at a time, to adapt to current run conditions. If you get just a few of these errors in the beginning of a run, ignore them. If you continue to get the errors despite stable request rate, something is broken. Check that reply rate is close to request rate. That is, check that there is no backlog of unsatisfied requests.


Errorcannot send failed oids to servers fast enough

To support globally shared URL space, Polyclt reserves global object identifiers (oids) from server agents. The servers keep track on what reserved oids has been requested so that all polyclt processes can revisit the corresponding objects to produce a hit in the shared URL space. If a transaction fails, the server may never see the oid of the failed URL, preventing other polyclt processes from revisiting the URL. Thus, each polyclt attempts to report failed oids back to the server. To report a failed oid, polyclt piggybacks that oid to request headers of the next miss transaction.

When transaction error rate is high, polyclt may not have enough successful transaction to report all failed oids, and some failed oids will remain unreported, generating ``cannot send failed oids fast enough'' errors.

These errors are just a sign that something else is not working right (producing failed oids). When you fix the original problem, the errors will go away.


Errorserver received too many requests for new oids

This message will be documented on demand.


Errorclient received too many new oids

This message will be documented on demand.


Errorserver new oid map cannot grow any more

Repeated ``a server-advertised oid has not been requested for a while'' errors caused internal buffers to grow beyond reasonable limit. Something is broken and Polygraph cannot compensate on its own.


Errorclient discovered server world id change

Server world id is a unique identifier attached to each server. The identifier is unique across simulations. It is reported back to robots using extension HTTP extension header fields.

If you restart polysrv while polyclt is running, the latter will notice the change and will complain. Polyclt should be able to recover on its own, but do not restart servers during production runs.


Errorforeign content <tag>

Polysrv inserts Polygraph-specific tags into the body of some responses. If those tags cannot be recognized by a robot, this error is reported.

Unless it is a Polygraph bug, the error means that the proxy is modifying content (i.e., response bodies) on-the-fly. This should not be happening.


Errormalformed content <tag>

Same as ``foreign content <tag>'' error, except a robot failed even to parse the tag (wrong tag syntax).


Erroropen content <tag

This is a particular instance of the ``malformed content <tag>'' error. A tag is missing its closing bracket, ``>''.


Errortoo many postponed xactions

The per-robot queue that keeps transactions waiting for resources got reached wait_xact_lmt limit (robot's configuration field) and hence cannot grow any more. You need to decrease request rate, increase the number of connections available to a robot, or do something else to resolve the bottleneck unless you have deliberately used a low wait_xact_lmt value.

Transactions that exceed the limit are ignored (never executed).


Errorinternal timers may be getting behind

Polygraph maintains many alarms and timers for internal scheduling purposes. If those timers start getting behind (i.e., the events are not executed on time, getting late), you get this error. You are probably overloading Polygraph process or the machine that process runs on.


Errorviolation of a sibling relationship

Polygraph proxy was requested to serve an object that could not be served from the proxy's cache. At the time of writing, such requests are refused.


Errorunsupported HTTP status code

HTTP status code is a property of an HTTP response that determines how the response should be interpreted. HTTP defines many status codes. Polygraph robots support (and Polygraph servers emit) several status codes (e.g., ``200 OK'' and ``304 Not Modified''). Under normal operating conditions it is unlikely that other status codes will reach client side of the bench.

The ``unsupported HTTP status code'' errors usually occur when the device under the test attempts to report an unusual condition (such as a network misconfiguration or connectivity error) back to the Polygraph robot. To determine what that unusual condition is, use the ``--dump errs'' option on the client side.


Errorunsupported ICP opcode

ICP reply contains unsupported opcode. At the time of writing, only three opcodes are supported: hit, miss, and miss-no-fetch.


Errorunsupported ICP version

ICP message has unsupported ICP version number. Only version 2 of the protocol is supported at the time of writing.


Errorbad ICP message size

ICP message has invalid size. The size is either smaller than 2 bytes or does not match the message length header field.


Errorbad ICP reqnum

The reqnum field of an ICP reply is negative or is ut of the range used by the Polygraph ICP client.


ErrorICP client may have too many outstanding requests

Reqnum conflicts in ICP client metadata may indicate that Polygraph cannot keep information about all pending ICP transactions. The same error may also indicate that an ICP reply was delivered very late, when the slot for corresponding request was already occupied by another request.


Errorunexpected message to an ICP agent

An ICP server received an ICP reply or an ICP client received an ICP request. Check your ICP ports configuration.


Errorclocks out of sync

A Robot has detected a suspicious difference between the client and server side clocks. Specifically, a ``first hand'' response from a server had the Date: header more than a minute behind or ahead of the local time (the difference is displayed after the error message).

There are at least two possible reasons for this error message:

If a proxy changes the value of the server's Date: header on misses, replace the word ``server'' with ``proxy'' in the narration above.


Errorunclassified error

Self-explanatory. Usually is emitted with some extra information to help you to diagnose the problem. Often fatal.


FYIagent[N] Kind starting on Host

Just an indication that an agent (robot or server) of the specified ``kind'' is ready to work on the specified host.

The message does not mean that a robot will start submitting requests immediately. Launch windows and other reasons may delay the first request.


FYIserver scan is probably X% completed with N out of S servers (Y%) ready to be hit

At startup, active robots are trying to contact all servers to make sure that any robot can re-visit a page on any server after the scan is completed. During this time, the servers allocate new object identifiers and report them to clients. The process is not deterministic due to possible delays and errors. That is why Polygraph cannot give you an exact progress indication or ETA. The scan will continue until N is equal to S.

Note that polyclt locks current (usually the first) phase until the scan is completed. Unfortunately, at the time of writing, polysrv has no clue that the initial scan is going on and does not lock the phase.


FYIserver scan completed with all R local robots ready to hit all S servers

The initial server scan is complete. Polyclt will start its normal mode of operation and unlock the current phase.


FYImin `direct' objects in working set: global public: G local private: L

Polyclt reports its current knowledge of the Working Set Size (WSS). The counters are kept for direct objects only. That is, embedded objects are not accounted for in this message. ( Note, however, that the size estimation uses average object size from the ``fill'' statistics that does include embedded objects. )

Two classes of objects are distinguished: public and private. Public URL space is shared among all Robots. All polyclt processes should have similar values for public WSS at any given time (subject to synchronization delays among distributed polyclts.)

Private objects are specific or local to every (Robot, Server) pair. Polyclt reports the sum of all local private working set sizes (i.e., the sum across all robots within the corresponding polyclt process).

How does one get the total WSS based on the FYI message above? Here is an imprecise formula that you may use:


	Objects_Per_Direct_Object = 1.6
	Working_Set_Count = Objects_Per_Direct_Object * (G + N*L)
	Working_Set_Size  = Mean_fill_object_size * Working_Set_Count
	
Where Objects_Per_Direct_Object is taken from PolyMix-2,3 workloads and may differ for other workloads. N is the number of identically configured polyclts. Mean_fill_object_size is usually about 11KB, but also depends on the workload; check your stats.

At the time of writing, Report Generator does not report actual WSS, but we are working on it.

Caching just WSS worth of data is not sufficient to achieve perfect hit ratios because WS is not updated in an LRU,FIFO,etc. fashion.


WarningPortMgr failed to bind to X:Y

bind(2) system call failed.

If you are using ephemeral port range (default), consider either increasing that range using OS specific tools OR use explicit port range (see the --ports option).

If you are using explicit port range, consider increasing that range.

Occasional warnings of this kind are normal; those are probably due to some kernel race conditions and cannot be completely eliminated with any port mapping scheme.


Warningbuffer pool grew to N x B = S

Polygraph warns you that it had to allocate a yet another chunk of memory to be used for I/O buffers.

Polygraph will need more memory if there are more connections ``stalled'' in non-idle state. Check that you have enough memory to support desired request rates. Make sure that the Polygraph process does not page.

Not all allocated memory will be used immediately. Process ``resident size'' may grow slower than the reported buffer pool level.



$Id$