| PolyDocs: Run-time Messages |
|---|
| Web Polygraph |
This page has been synchronized with Poly 2.2.8.
This page describes run-time messages used by polyclt and polysrv. The scope of this page is human language, text messages rather than cryptic "current statistics" lines also emited by Polygraph. The overall description of the console output is available elsewhere.
Each message has its own minimum verbosity level at which it will be visible on the console. At the time of writing, the maximum verbosity level is 10.
1. Types of messages
1.1 Errors
1.2 Warnings
1.3 Informational messsages
2. Message descriptons
There are three types of messages: errors, warnings, and informational messages.
Error messages are prepended with an error: tag. The errors have low minimum verbosity level, usually 1.
Most errors have special numeric identifier called error number or just errno. Idendifiers smaller than 255 usually correspond to system errors that should be documented in your operating system environment. Errors with higher numbers are Polygraph-specific errors and are described in this document.
Most error messages should not be ignored. However, some errors are a part of the normal simulation. Please see individual message descriptions for details on a specific error.
Warning messages are prepended with a warning: tag. The warnings have medium verbosity level, usually 3 and higher.
It is better to pay attention to warning messages as they often indicate that some condition is getting worse. Please see individual message descriptions for details on a specific warning.
Informational messages are prepended with a fyi: tag (for-your-information) or have no tag at all. These messages have medium verbosity level, usually 3 and higher.
Most informational messages can be safely ignored. They are useful for monitoring the status and progress of an experiment.
Below are descriptions of most common messages. If a message you are interested in is not documented, please let us know.
A "foreign request" error indicates that a Polygraph server has received a request that does not look like anything a Polygraph client (or a proxy on behalf of a client) would produce. To detect a Polygraph client request, servers look for Poly-specific URL format and Poly-specific HTTP extension header fields.
You should not be getting "foreign request errors" on no-proxy runs. If you get this error with a proxy in the loop, you may want to investigate what requests the proxy is sending that the server does not recognize. Enabling the --dump errs option in polysrv may help.
A "foreign reply" error indicates that a Polygraph client has received a reply that does not look like anything a Polygraph server (or a proxy on behalf of a server) would produce. To detect a Polygraph server response, clients look for Poly-specific HTTP extension header fields.
You should not be getting "foreign reply errors" on no-proxy runs. If you get this error with a proxy in the loop, you should investigate what responses the proxy is sending that the client does not recognize. Enabling the --dump errs option in polyclt may help.
The most common source of "foreign reply" errors is a proxy generating a proxy-specific error page. For example, a proxy may report server side connectivity errors or an overload condition. We have also seen products that send a company ad as a part of the response for the first request from a given IP address.
ICP request contains valid URL, but that URL was not in Polygraph format. Specifically, Polygraph failed to extract object identifier from the URL. Perhaps a non-Polygraph-aware client is submitting ICP requests to Polygraph agent?
ICP reply contains valid URL, but that URL was not in Polygraph format. Specifically, Polygraph failed to extract object identifier from the URL. This should not happen because Polygraph requests URLs of valid format only.
This error indicates that a Polygraph client received HTTP response headers but could not extract the protocol version or the response code from the headers. For example, an ``HTTP/1.1 200 OK'' response line indicates that the protocol version is ``1.1'' and the response code is ``200''.
You should not be getting "malformed HTTP request" errors during no-proxy runs. If you get this error with a proxy in the loop, you should investigate what responses the proxy is sending that the client cannot parse. Enabling the --dump errs option in polyclt may help.
At this time, Polygraph servers do not emit this error, but that may change.
Polygraph origin server received request with the Host: header field that does not match server's address(es).
The request is for the object at an address (i.e., the host:port pair) that receiving Polygraph process does not manage.
Polygraph failed to parse the host component of the URL in an ICP message. At the time of writing, only IP addresses are recognized and FQDNs are not supported.
Transmission of HTTP message body terminated before the entire message was received. Usually means that the TCP connection was closed before polyclt read the response.
Transmission of HTTP message header terminated after receiving some portion of the messages header but before the entire message header was received.
The following only applies to versions that do not have ``connection closed before sending headers'' error:
The TCP connection got terminated when Polygraph was reading (client and server sides) or about to read (client side) a message header. Usually happens on the client side when a server or proxy is closing the connection without transmitting the response. One reason for the latter is a race condition allowed by HTTP: A server (proxy) may close an idle persistent connection after the client (proxy) sent the request but before the request reached the other end. Various timeouts and overload conditions are other possible reasons.
This message will be documented on demand.
Client (proxy) has connected to the server and then closed the connection prior to sending any data.
The TCP connection got terminated when Polygraph tried to read the beginning of the next message header on a persistent connection. The most likely reason for the latter is a race condition allowed by HTTP: A server (proxy) may close an idle persistent connection after the client (proxy) sent the request but before the request reached the other end. Since this kind of error is normal for HTTP operation, you may ignore small number of them.
Client received a response with no Content-Length HTTP header field. All ``200 OK'' Polygraph responses have Content-Length header. All ``304 Not modified'' Polygraph responses do not. However, at the time of writing, the client should not be receiving 304 replies because the client does not send If-Modified-Since requests.
The usual cause of this error are various error pages generated by a proxy (see also: ``foreign reply'' error). Note that polyclt checks for Content-Length header before it checks whether the response is ``foreign''.
Polygraph ran out of I/O buffer space (16KB) before an HTTP header terminated.
Polygraph request for an uncachable object was satisfied with a cached response.
Polysrv marks uncachable objects with the following HTTP headers:
Cache-Control: private,no-cache Pragma: no-cacheAt least one proxy is known not to ignore this header and sometimes return cached objects anyway.
Polygraph attempt to ``reload'' an object was satisfied with a cached response.
Polyclt marks ``reload'' requests with the following HTTP headers:
Pragma: no-cache Cache-Control: no-cacheMany proxies are known to ignore some ``reload'' requests, especially under peak loads.
Polyclt first request for a cachable object was satisfied with a cached response.
A typical cause is when two concurrent requests for the same cachable object are satisfied in the reverse order, resulting (from polyclt perspective) in a false hit and a false miss.
This is not a real error in most environments, and it's detection is not enabled by default.
This message will be documented on demand.
This message will be documented on demand.
Polygraph robots ``reserve'' object identifiers (oids) on servers. In other words, a server pre-allocated oids to be later requested by a robot. If a robot does not request the reserved object for a long time, the server complains.
Polygraph attempts to increase internal oid buffers to keep oid reservations longer, to adapt to current run conditions. If you get just a few of these errors in the beginning of a run, ignore them. If you continue to get the errors despite stable request rate, something is broken. Check that reply rate is close to request rate. That is, check that there is no backlog of unsatisfied requests.
Occasional errors of this kind are also unavoidable if you have other transaction errors. If a request is ``lost'' before reaching the server, polyclt would think it has requested an oid and might not requested it ever again.
This message will be documented on demand.
Polygraph robots ``reserve'' object identifiers (oids) on servers. In other words, a server pre-allocated oids to be later requested by a robot. If a server cannot keep up with robot reservations, you get this error.
Polygraph attempts to increase internal oid buffers to request more oids at a time, to adapt to current run conditions. If you get just a few of these errors in the beginning of a run, ignore them. If you continue to get the errors despite stable request rate, something is broken. Check that reply rate is close to request rate. That is, check that there is no backlog of unsatisfied requests.
To support globally shared URL space, Polyclt reserves global object identifiers (oids) from server agents. The servers keep track on what reserved oids has been requested so that all polyclt processes can revisit the corresponding objects to produce a hit in the shared URL space. If a transaction fails, the server may never see the oid of the failed URL, preventing other polyclt processes from revisiting the URL. Thus, each polyclt attempts to report failed oids back to the server. To report a failed oid, polyclt piggybacks that oid to request headers of the next miss transaction.
When transaction error rate is high, polyclt may not have enough successful transaction to report all failed oids, and some failed oids will remain unreported, generating ``cannot send failed oids fast enough'' errors.
These errors are just a sign that something else is not working right (producing failed oids). When you fix the original problem, the errors will go away.
This message will be documented on demand.
This message will be documented on demand.
Repeated ``a server-advertised oid has not been requested for a while'' errors caused internal buffers to grow beyond reasonable limit. Something is broken and Polygraph cannot compensate on its own.
Server world id is a unique identifier attached to each server. The identifier is unique across simulations. It is reported back to robots using extension HTTP extension header fields.
If you restart polysrv while polyclt is running, the latter will notice the change and will complain. Polyclt should be able to recover on its own, but do not restart servers during production runs.
Polysrv inserts Polygraph-specific tags into the body of some responses. If those tags cannot be recognized by a robot, this error is reported.
Unless it is a Polygraph bug, the error means that the proxy is modifying content (i.e., response bodies) on-the-fly. This should not be happening.
Same as ``foreign content <tag>'' error, except a robot failed even to parse the tag (wrong tag syntax).
This is a particular instance of the ``malformed content <tag>'' error. A tag is missing its closing bracket, ``>''.
The per-robot queue that keeps transactions waiting for resources got reached wait_xact_lmt limit (robot's configuration field) and hence cannot grow any more. You need to decrease request rate, increase the number of connections available to a robot, or do something else to resolve the bottleneck unless you have deliberately used a low wait_xact_lmt value.
Transactions that exceed the limit are ignored (never executed).
Polygraph maintains many alarms and timers for internal scheduling purposes. If those timers start getting behind (i.e., the events are not executed on time, getting late), you get this error. You are probably overloading Polygraph process or the machine that process runs on.
Polygraph proxy was requested to serve an object that could not be served from the proxy's cache. At the time of writing, such requests are refused.
HTTP status code is a property of an HTTP response that determines how the response should be interpreted. HTTP defines many status codes. Polygraph robots support (and Polygraph servers emit) several status codes (e.g., ``200 OK'' and ``304 Not Modified''). Under normal operating conditions it is unlikely that other status codes will reach client side of the bench.
The ``unsupported HTTP status code'' errors usually occur when the device under the test attempts to report an unusual condition (such as a network misconfiguration or connectivity error) back to the Polygraph robot. To determine what that unusual condition is, use the ``--dump errs'' option on the client side.
ICP reply contains unsupported opcode. At the time of writing, only three opcodes are supported: hit, miss, and miss-no-fetch.
ICP message has unsupported ICP version number. Only version 2 of the protocol is supported at the time of writing.
ICP message has invalid size. The size is either smaller than 2 bytes or does not match the message length header field.
The reqnum field of an ICP reply is negative or is ut of the range used by the Polygraph ICP client.
Reqnum conflicts in ICP client metadata may indicate that Polygraph cannot keep information about all pending ICP transactions. The same error may also indicate that an ICP reply was delivered very late, when the slot for corresponding request was already occupied by another request.
An ICP server received an ICP reply or an ICP client received an ICP request. Check your ICP ports configuration.
A Robot has detected a suspicious difference between the client and server side clocks. Specifically, a ``first hand'' response from a server had the Date: header more than a minute behind or ahead of the local time (the difference is displayed after the error message).
There are at least two possible reasons for this error message:
Client and server side clocks are indeed out of sync. Run date command on all hosts to check if they are in sync.
It took the Robot more than a minute to receive response headers after the response was issued on the server side. If one minute response time (for small misses) is not normal for your workload, you need to find the source of the delay and remove/fix it. You might be overloading the proxy or Polygraph; does the error occur when the load significantly lighter?
If a proxy changes the value of the server's Date: header on misses, replace the word ``server'' with ``proxy'' in the narration above.
Self-explanatory. Usually is emitted with some extra information to help you to diagnose the problem. Often fatal.
Just an indication that an agent (robot or server) of the specified ``kind'' is ready to work on the specified host.
The message does not mean that a robot will start submitting requests immediately. Launch windows and other reasons may delay the first request.
At startup, active robots are trying to contact all servers to make sure that any robot can re-visit a page on any server after the scan is completed. During this time, the servers allocate new object identifiers and report them to clients. The process is not deterministic due to possible delays and errors. That is why Polygraph cannot give you an exact progress indication or ETA. The scan will continue until N is equal to S.
Note that polyclt locks current (usually the first) phase until the scan is completed. Unfortunately, at the time of writing, polysrv has no clue that the initial scan is going on and does not lock the phase.
The initial server scan is complete. Polyclt will start its normal mode of operation and unlock the current phase.
Polyclt reports its current knowledge of the Working Set Size (WSS). The counters are kept for direct objects only. That is, embedded objects are not accounted for in this message. ( Note, however, that the size estimation uses average object size from the ``fill'' statistics that does include embedded objects. )
Two classes of objects are distinguished: public and private. Public URL space is shared among all Robots. All polyclt processes should have similar values for public WSS at any given time (subject to synchronization delays among distributed polyclts.)
Private objects are specific or local to every (Robot, Server) pair. Polyclt reports the sum of all local private working set sizes (i.e., the sum across all robots within the corresponding polyclt process).
How does one get the total WSS based on the FYI message above? Here is an imprecise formula that you may use:
Objects_Per_Direct_Object = 1.6 Working_Set_Count = Objects_Per_Direct_Object * (G + N*L) Working_Set_Size = Mean_fill_object_size * Working_Set_CountWhere Objects_Per_Direct_Object is taken from PolyMix-2,3 workloads and may differ for other workloads. N is the number of identically configured polyclts. Mean_fill_object_size is usually about 11KB, but also depends on the workload; check your stats.At the time of writing, Report Generator does not report actual WSS, but we are working on it.
Caching just WSS worth of data is not sufficient to achieve perfect hit ratios because WS is not updated in an LRU,FIFO,etc. fashion.
bind(2) system call failed.
If you are using ephemeral port range (default), consider either increasing that range using OS specific tools OR use explicit port range (see the --ports option).
If you are using explicit port range, consider increasing that range.
Occasional warnings of this kind are normal; those are probably due to some kernel race conditions and cannot be completely eliminated with any port mapping scheme.
Polygraph warns you that it had to allocate a yet another chunk of memory to be used for I/O buffers.
Polygraph will need more memory if there are more connections ``stalled'' in non-idle state. Check that you have enough memory to support desired request rates. Make sure that the Polygraph process does not page.
Not all allocated memory will be used immediately. Process ``resident size'' may grow slower than the reported buffer pool level.