The Second IRCache Web Cache Cache-off

The Official Report

Alex Rousskov, Duane Wessels, Glenn Chisholm
polyteam@ircache.net

A two-week benchmarking ``cache-off'' for Web proxy caches was held in January, 2000. Using Web Polygraph benchmark, our group tested 22 proxy caches from 14 different organizations. In this report, we summarize performance data collected during these tests and analyse the results.

Table of Contents

1. Introduction
    1.1 Timeline
    1.2 Terminology
    1.3 How (not) to Read this Report
    1.4 Where to find more information
2. Executive Summary
3. The Rules
4. Web Polygraph
    4.1 The Cache-off Workload: PolyMix-2
5. Benchmarking Environment
    5.1 Location
    5.2 Schedule
    5.3 Polygraph Machines
    5.4 Time Synchronization
    5.5 Network Configurations
6. Test Sequence
    6.1 MSL Test
    6.2 Filling the Cache
    6.3 PolyMix-2
    6.4 Downtime Test
7. Performance Details
    7.1 Normalized Throughput
    7.2 Hit Ratio
    7.3 Response Time
    7.4 Load dependency
    7.5 Downtime test
8. Product Configurations
9. This Cache-Off Controversies
    9.1 Delayed ACKs
    9.2 Working Set Size
    9.3 Filling the Cache
    9.4 ARP Traffic
10. Comments
    10.1 Polyteam Comments
    10.2 Vendor Comments

1. Introduction

IRCache cache-offs address the web caching community's needs for high quality, independent performance data. The cache-off is a snapshot of the caching industry. Every vendor who wants to test the performance of their products has an opportunity to do so under fair conditions. The background, rules, and results of this competition are discussed here in our official report.

1.1 Timeline

Preparations for the second cache-off began on August 8, 1999 with an organizational meeting in Boulder, Colorado. Thirteen companies attended the meeting and intended to participate in the cache-off. Registration for the cache-off began in November and was opened to everybody. Sixteen vendors registered 24 products.

The cache-off started on January 17, 2000. Eolian canceled their participation a few days before the test. CacheFlow simply did not show up for their test slot. The results for the other 14 vendors are discussed in this report.

1.2 Terminology

Throughout this report we use a few terms that have specific meaning for the cache-off. A vendor is an organization that has a caching product. To simplify the terminology, all commercial, non-profit, virtual, etc. organizations are labeled as ``vendors.'' A vendor is allowed to bring more than one product to the cache-off. Each product or entry that a vendor brings counts as one participant. We have one benchmarking harness (bench) for every participant.

1.3 How (not) to Read this Report

We strongly caution against drawing hasty conclusions from these benchmarking results. Our report contains a lot of performance numbers and configuration information; take advantage of it. Since the tested caches differ a lot, it is tempting to draw conclusions about participants based on a single performance graph or a column in a summary table. We believe such conclusions will virtually always be wrong. Here are a few recommendations to prevent misinterpretation of the results.

  1. Always read Polyteam and Vendor Comments sections.
  2. Compare several performance factors: throughput, response time, hit ratio, etc. Weight each factor based on your preferences.
  3. Do not overlook pricing information and price/performance analysis.

Our benchmark addresses only the performance aspects of Web cache products. Any given cache will have numerous features that are not addressed here. For example, we think that manageability, reliability, and correctness are very important attributes that should be considered in any buying decisions.

1.4 Where to find more information

IRCache maintains the Official Cache-off Site, where this report and detailed Polygraph log files from the cache-off are stored. All cache-off information at the Official Site is freely available.

There are no other official sources of cache-off results.

Documentation, sources, independent test results, discussion mailing lists, and other information related to Polygraph benchmark are available at Web Polygraph site.

Only major performance measurements are discussed in this report. For more information, consider these sources:

The links above are also useful if you are afraid of being influenced by our interpretation of the results. We still recommend reading the report afterwards (as a ``second opinion'') because not all test rules and performance matters can be clear from the raw data.

2. Executive Summary

The ``Executive Summary'' table below summarizes the performance results. Section ``Performance Details contains analysis of these measurements.

Product Total
Price

(USD)
Peak
Tput

(req/sec)
Response Time
(sec)
Hit Ratio
(%)
$1,000
can buy
Minutes
Till First
Cache
Age

(hour)
Hit All Miss Doc Byte req/sec hit/sec Miss Hit
Cisco CE-59044,470 951 0.16 1.50 3.12 54.8 57.0 21 12 6.9 6.9 11.3
Cisco CE-7300130,995 2304 0.17 1.48 3.04 54.5 56.7 18 10 4.2 4.2 8.6
Compaq-C250050,814 2400 0.27 1.45 2.77 52.7 54.9 47 25 9.2 9.2 6.5
Compaq-C9007,877 475 0.07 1.72 2.77 38.7 39.9 60 23 7.8 8.0 1.7
Dell-130B5,439 366 0.09 1.81 2.78 36.0 37.7 67 24 2.5 2.5 1.3
IBM-3500M105,234 376 0.08 1.78 2.78 37.1 39.1 72 27 3.0 3.0 1.3
IBM-500011,094 600 0.27 1.75 2.77 40.8 42.2 54 22 5.1 5.1 2.4
IBM-560024,998 1306 0.12 1.65 2.77 42.1 42.7 52 22 4.8 4.8 2.3
iMimic-500014,295 1453 0.07 1.35 2.80 53.2 47.2 102 54 2.9 3.3 7.2
InfoLibria-30i23,805 600 1.63 3.08 3.77 32.3 31.9 25 8 0.1 3.2 3.2
InfoLibria-X1x350,480 1600 0.33 2.06 2.97 34.4 33.6 32 11 0.7 4.1 3.6
Lucent-10020,390 675 0.54 1.71 2.93 51.1 53.4 33 17 1.8 1.8 4.3
Lucent-150a27,390 771 0.40 1.58 2.98 54.2 56.7 28 15 1.8 1.8 8.4
Microbits2,250 120 0.26 1.69 2.82 44.4 46.2 53 24 1.6 1.6 2.4
NetApp6,122 151 0.04 1.44 2.87 50.3 53.1 25 12 1.2 2.0 4.7
Pionex-10004,738 111 0.39 1.51 2.78 52.9 54.8 23 12 2.2 2.3 7.6
Quantex-400020,828 1402 0.12 1.70 2.77 40.3 41.0 67 27 4.2 4.2 2.3
Squid-2.44,319 160 0.24 1.41 2.84 55.0 57.8 37 20 2.4 2.5 15.6
Stratacache F1006,394 400 0.04 1.89 2.78 32.3 34.1 63 20 4.0 4.0 1.2
Swell-10002,339 77 0.17 1.44 2.99 55.0 52.7 33 18 n/a n/a 19.1

Two cache-off entries, Dell-6350 and Stratacache M-500 did not produce successful results and are omitted from the baseline presentation. See the ``Polyteam Comments'' section for more information.

Links in column headers point to bar charts comparing the corresponding measurement. Product links point to a auto-generated individual reports for the entries.

Short labels (in the ``Product'' column) are derived from full product names. The latter are available in the ``Product Configurations'' section at the end of the report.

The ``Total Price'' column is a sum of the list price of all caching hardware, plus the cost of networking gear (switches, routers) that the vendor used. Networking gear costs are included in price/performance analysis to adjust for expensive equipment that might be used to achieve high performance numbers and/or aggregate individual caches into clusters. If the reader already has the required network components in place, the price/performance ratios should be adjusted.

The ``Peak Throughput'' column depicts the highest tested request rate for each product. The PolyMix-2 workload used for this cache-off includes phases with different request rates. Here we report the response rate during the 4 hour top2 phase when the load is at its peak.

Mean response time is reported separately for cache hits and misses to emphasize performance differences on the two most important request paths. The ``All'' column depicts response time for all request classes.

The ``$1000 can buy'' columns shows performance/price ratios. Two performance measurements are used: request rate (the ``req/sec'' column) and hit rate or number of hits per second (the ``hits/sec'' column). Both measurements are normalized by total product price (in thousands of dollars). In other words, the data shows ``how much throughput or hit rate one thousand dollars can buy?''

The ``Cache Age'' column estimates the cache capacity in terms of hours of peak fill (i.e., cachable miss) traffic. For example, a reading of 10 means that, at the peak request rate, the cache becomes full after 10 hours, and must begin replacing objects.

All published performance tests finished with less than 0.1% of failed transactions. Note that the rules disqualify a run with more than 3.0% of errors.

3. The Rules

The majority of the rules were defined and agreed upon by Polyteam and the participants during the organizational meeting on August 17th in Boulder, Colorado. A number of points were raised and subsequently discussed and finalized on the cache-off and Polygraph mailing lists prior to the first day of benchmarking.

The core set of rules are available in the rules document. Here, we will highlight some of the more interesting and controversial provisions.

Availability

Following examples from other types of benchmarking, we adopted a rule that the tested cache products must be available to the public within three months from the end of the cache-off. This is a shorter time frame than the six month period used for the first cache-off. The products described here should be available to consumers by the end of April, 2000. Furthermore, the products should be available for the prices given in this report. Of course, both of these rules are difficult to enforce. If you are aware of a tested cache that is not available within this time period for the specified price, we would like to hear from you.

TIME-WAIT

Under certain common conditions, TCP connections enter a so-called ``TIME-WAIT'' state after they are closed by the application. A connection in a TIME-WAIT state ties kernel resources, but is not visible to the application. For busy servers, the duration of the TIME-WAIT state can adversely affect performance. For example, a system kernel that handles 1,000 requests per second with 60 second TIME-WAIT value, would have to maintain up to 60,000 connections in a TIME-WAIT state.

For the cache-off, everyone agreed to use a 60 second TIME-WAIT value on the caches. We developed a tool to check a product's actual TIME-WAIT setting. This tool was used as a part of the official test sequence.

A more thorough description of the TIME-WAIT state can be found in RFC-793.

Full Caches

As with the first cache-off, we require that caches be full before running a measurement test. Unlike the first cache-off, we do not impose a time limit on filling the cache. Instead, we require the cache to be ``filled'' twice over with a specific workload. Caches must be emptied (flushed) before the fill process begins.

Repeats

Participants are allowed to run as many tests as time allows. Furthermore, participants are allowed to change their system configurations between test sequences. Given two or more successful sequences, the participant can later choose which one to use for this report. Only the results of the test chosen by the participant are available, all other results are discarded. This procedure allows vendors to troubleshoot problems, try different configurations, and search for the peak performance point.

Costs

All the equipment that comprises an entry, including the networking equipment is included in the participants' costs. This allows a more accurate reflection of the performance of the entry and the total deployment costs as some networking equipment can increase the throughput of individual entries, by allowing a bypass of the cache, by balancing the load, or simply increasing the available bandwidth.

No Bail-out

We have removed the bail-out provision that was present in the first cache-off. We found enthusiastic support for removing the bail-out option in our organizational meeting. Once a vendor begins their first official test sequence they are no longer able to bail out.

Referencing Cache-off Results

Any work that is derived from, or uses any of the cache-off results, or this report, must include the following reference to our official site:

A. Rousskov, D. Wessels, G. Chisholm, The Second IRCache Web Cache Cache-off. Raw data and independent analysis at <http://cacheoff.ircache.net/>.

4. Web Polygraph

Web Polygraph is a high-performance proxy benchmark. Polygraph is capable of generating a whole spectrum of Web proxy workloads that either approximate real-world traffic patterns, or are designed to stress a particular proxy component. Designed with the cache-off needs in mind, Polygraph is able to generate complex, high request rate workloads with negligible overhead. Web Polygraph has been successfully used to debug, tune, and benchmark many caching products.

The Polygraph distribution includes two programs: polyclt and polysrv. Poly-client (-server) emits a stream of HTTP requests (responses) with given properties. The requested resources are called objects. URLs generated by Poly-client are built around object identifiers or oids. In short, oids determine many properties of the corresponding response, including response content length and cachability. These properties are usually preserved for a given object. In other words, a response for an object with a given oid will have the same content length and cachability status regardless of the number of earlier requests for that object.

As it runs, Polygraph collects and logs a lot of statistics, including: response rate, response time and size histograms, achieved hit ratio, and number of transaction errors. Some measurements are aggregated at five second intervals, while others are aggregated during a simulation phase.

For the cache-off tests, we used version 2.2.9 of Web Polygraph.

4.1 The Cache-off Workload: PolyMix-2

One difficult part of this benchmark (and indeed most) was to develop the proper workload. Real-world Web traffic is incredibly complicated, both to understand and to simulate. Many workload attributes are well understood by themselves, but not when combined with each other. For example, we have a clear idea of real-world object size distributions. But how does object size combine with popularity and content-type? Are popular HTML objects larger, smaller, or the same as unpopular ones?

Web Polygraph models several key aspects of real Web traffic. We realize that our model is not complete, but rather than wasting months or years up front on developing a perfect model, we feel it is better to model the important parameters now and add new attributes and features in future tests.

Our model addresses the following characteristics of Web traffic:

Noticeably absent from Polygraph traffic model are:

Varying Offered Load
For previous tests, including the first cache-off, the offered load was essentially constant for the entire duration of the test. Now we have a model that varies the load over time. It is intended to simulate two daily peaks in load. Load peaks occur for most installations during working hours and are followed by periods of low or no activity. The picture below shows the load pattern of PolyMix-2 workload.
PolyMix-2 load pattern

The entire test is about 14 hours long, and is divided into the following seven phases:

Phase
Name
Test
Hours
Activity
inc1 00 - 01 The load is increased during the first hour to reach its peak level.
top1 01 - 05 The period of peak ``daily'' load.
dec1 05 - 06 The load steadily goes down, reaching a period of relatively low load.
idle 06 - 08 The ``idle'' period with load level around 10% of the peak request rate.
inc2 08 - 09 The load is increased to reach its peak level again.
top2 09 - 13 The second period of peak ``daily'' load.
dec2 13 - 14 The load steadily goes down to zero.

Most measurements discussed in this report are taken from the top2 phase when the proxy is more likely to be in a steady state.

Reply Sizes

Object reply size distributions are different for different content types (see the table below). Reply sizes range from 300 bytes to 5 MB with an overall mean of about 11 KB. The reply size depends only on the oid. Thus, the same object always has the same reply size, regardless of the number of requests for that object.

Cachable and Uncachable Replies

Some of Polygraph responses are marked uncachable. The particular probability varies with content types (see the table below). Overall, the workload should result in about 80% of all responses being cachable. The real world cachability varies from location to location. We have chosen 80% as a typical value that is close to many common environments.

A cachable response includes the following HTTP header field:

	Cache-Control: public

An uncachable response includes the following HTTP header fields:

	Cache-Control: private,no-cache
	Pragma: no-cache

Object cachability depends only on the oid. The same oid is always cachable, or always uncachable.

Life-cycle model

Web Polygraph is capable of simulating realistic object expiration and modification conditions using appropriate HTTP headers. However, PolyMix-2 uses a simple life-cycle model: All objects were last modified about one year ago and are set to expire one year into the future. These life-cycle settings make refreshing of cache content unnecessary. We used these simple values because the vendors could not agree on a more realistic model for this cache-off.

[Side note: Even with these very conservative life-cycle settings, one cache-off entry insisted on verifying object freshness on every request, significantly increasing hit response time. It turned out that proxy clock was set to year 2004. The problem was detected and fixed before the start of the official test.]
Content Types

PolyMix-2 defines a mixture of content types. Each content type has the following properties:

The actual parameters for the first three properties are given in the table below.

Type Percentage Reply Size Cachability
Image 65.0% exp(4.5KB) 80%
HTML 15.0% exp(8.5KB) 90%
Download 0.5% logn(300KB,300KB) 95%
Other 19.5% logn(25KB,10KB) 72%
Latency and Packet Loss
In the first cache-off, we used server-side ``think times'' to simulate delays in the network. Think time is a per-reply delay that is experienced just before each response is sent. For this cache-off, we improved our model by introducing realistic low-level per-packet network delays and packet loss.

To support per-packet delays and loss, we used FreeBSD's DummyNet feature. DummyNet allows us to specify packet delays, packet loss, and bandwidth limits. Unfortunately, we were unable to fully achieve our goal. DummyNet was not designed with our environment in mind and lacked reliable high performance support for hundreds of network ``pipes'' (bound to Polygraph robots and servers in our case). Thus, we were limited to using only loss and delay features of DummyNet, with two pipes per machine.

In the organizational meeting, we discussed having two types of client-side environments. The ISP environment would have packet delays and bandwidth limits. The Corporate environment would have no delays or limits on clients. In the end, all participants chose the Corporate environment. Hence DummyNet pipes were not configured on Polygraph clients.

On the server-side, we configured DummyNet with 40 millisecond delays (per packet, incoming and outgoing), and with a 0.05% probability of dropping a packet. We kept the normally distributed server-side think time, but changed the parameters to a 2.5 second mean and a 1 second standard deviation. The server think time does not depend on the oid. Instead, it is randomly chosen for every request.

Cache Hits and Misses

As in previous tests, we designed this workload to have a 55% offered hit ratio. In the workload definition, this is actually specified through the recurrence ratio (i.e., the probability of revisiting a Web object). The recurrence ratio must account for uncachable responses. Given that only 80% of responses are cachable, a recurrence ratio of 68.75% gives the desired hit ratio of 55%.

Polygraph enforces the desired hit ratio by requesting objects that have been requested before, and should have been cached. There is no guarantee, however, that the object is in the cache. Thus, our parameter (55%) is an upper limit. The hit ratio achieved by a proxy may be lower if a proxy does not cache some cachable objects, or purges previously cached objects before the latter are revisited.

Object Popularity

Object popularity was one of the more controversial aspects of this workload. Some vendors feel that the uniform model is unrealistic and does not provide an appropriate percentage of memory hits. They argue that a Zipf distribution is a better choice. Other vendors feel that Zipf gives too many memory hits and does not sufficiently stress disk subsystems.

After some debate, but no hard data to support Zipf, we decided to continue using the uniform popularity distribution model. After that decision was made, we have received more evidence that uniform model, combined with other workload characteristics, may be as good approximation of real object popularity as the Zipf model. We realize that more research is needed in this area to develop the best model.

Simulated Robots and Servers

A single Polygraph client machine supports many simulated robots. A robot can emulate various types of Web clients, from a human surfer to a busy peer cache. All robots in PolyMix-2 are configured identically, except that each has its own IP address, simulating a Web surfer. We limit the number of robots (and hence IP aliases) to 1000 per client machine.

A PolyMix-2 robot requests objects using Poisson-like stream, except for embedded objects (images on HTML pages) that are requested simulating cache-less browser behavior. A limit on the number of simultaneously open connections is also supported, and may affect the request stream.

PolyMix-2 servers are configured identically, except that each has its own IP address.

Crosstalk

All PolyMix-2 robots talk to all simulated servers, preventing vendors from splitting the network into isolated segments to improve scalability. Such a split was possible during the first cache-off. Many vendors felt this is a very important addition to PolyMix-2. This feature is difficult to support because it requires a URL space that is shared among all robots.

Working Set of URLs

Working Set is defined as a collection of URLs that may be requested at any given time. After the first 4 hours of PolyMix-2 workload, Web Polygraph maintains constant Working Set Size (WSS), measured in terms of the number of objects in the set. Although the size is frozen, the set of URLs is not. That is, after WSS is frozen, an addition of an object (a first time miss) into the Working Set results in eviction of some other object from the Working Set. The eviction policy is quite complex and simulates interests of individual cache users (robots), rather than a global LRU queue.

Maintaining a constant WSS is important to preserve steady state conditions during an experiment. See Section 9.2 for some side effects of this workload feature.

Persistent Connections

Polygraph supports persistent connections on both client and server sides. PolyMix-2 robots close an ``active'' persistent connection right receiving the N-th reply, where N is drawn from a Zipf(64) distribution. The robots will close an ``idle'' persistent connection if the per-robot connection limit has been reached and connections to other servers must be opened. The latter mimics browser behavior.

PolyMix-2 servers use a Zipf(16) distribution to close active connections. The servers also timeout idle persistent connection after 15 sec of inactivity, just like many real servers would do.

Other details

A detailed treatment of many PolyMix-2 features is available on the Polygraph Web site, along with the copies of workload configuration files.

5. Benchmarking Environment

5.1 Location

The cache-off was held in a Compaq Computer Corporation facility, near Houston, Texas. Polyteam greatly appreciates Compaq's willingness to host this event. An excellent 50,000 ft2 facility met or exceeded all our needs for space, power, food, security, sleep, and remote control cars (the latter were, however, prohibited during the daytime as a ``health and safety'' hazard). Superb logistic coverage provided by Henry Guillen at Compaq helped us to run the cache-off smoothly. Under Henry's guidance, we also thoroughly enjoyed the food and drinks that Texas had to offer.

5.2 Schedule

The duration of the cache-off was two weeks. Half of the entries were tested in the first week, and the other half in the second week. Each vendor had five full days to set up and execute the required tests. Vendors had access to the cache-off facility from 9 AM until 8 PM. Tests were often queued to run overnight.

5.3 Polygraph Machines

We rented 120 PC's for use as Polygraph clients and servers. These machines were Compaq Deskpro EN systems, each with a 450 MHz Pentium II CPU, 256 MB of RAM, an Intel Etherexpress PRO/100+ fast ethernet card, and an IDE disk.

We used FreeBSD-3.3-RELEASE as the operating system for the Polygraph clients and servers. Several required performance patches to the OS were applied to support desired workloads. The tuning procedure is described elsewhere.

The number of Polygraph machines varies for each participant's cache. Peak request rates vary a lot among caching products. Thus, each participant informed us how many Polygraph client-server pairs were needed to drive their cache at its maximum capacity.

During the cache-off, we never used more than 400 requests per second per machine for official tests.

Each bench also had a monitoring PC connected to the harness network. That PC was used to start Polygraph runs, display run-time statistics, collect logs after the completion of a run, and generate Polygraph reports.

5.4 Time Synchronization

We ran a xntpd time server on the Polygraph machines and the monitoring PCs. The monitoring PCs were synchronized periodically with a designated server maintained by Polyteam. We ran xntpd on all machines rather than just synchronizing clients and servers before each test as was done during the first cache-off. While running xntpd could introduce small CPU overhead, we were concerned that with no periodic synchronization local clocks may drift apart during a 14 hour test (the first cache-off used 1 hour long tests).

5.5 Network Configurations

Each test bench consists of Polygraph machines, the monitoring PC, the participants proxy cache(s), and a network to tie them together. The networking equipment falls under the participant's domain. That is, each participant is responsible for providing the networking equipment need to connect the Polygraph machines to the caches. Furthermore, the networking equipment that the participant brings contributes to their costs in the price/performance results.

Each Polygraph machine requires a fast ethernet port, so the participant must have enough ports to connect all of Polygraph machines within participant's cluster. The monitoring PC must be able to talk to all clients and servers on IP level.

We ran bidirectional netperf tests between each client-server pair to measure raw TCP throughput. We also selectively executed Polygraph ``no-proxy'' runs to ensure that clusters can generate enough throughput to sufficiently drive the cache under test.

With a few exceptions, Netperf results showed 87-89 Mbps throughput. InfoLibria-30i showed a lower raw TCP throughput of 33-35 Mbps due to network topology: all TCP traffic had to go through a single 100 Mbps link twice. The Microbits and Squid-2.4 benches reported 34 Mbps bandwidth because all traffic went through a hub rather than a switch.

All ``no-proxy'' tests were successful, delivering desired throughput and negligible response time overheads.

6. Test Sequence

This section describes the official testing sequence. The complete sequence was executed at least once against all cache-off entries.

6.1 MSL Test

As we described earlier, all entries must have a Maximum Segment Lifetime (MSL) of 30 seconds, producing a TIME-WAIT state of 60 seconds. Any product which fails this test is disqualified.

To determine the MSL on each product, we probe its TCP stack and monitor connection requests. If the system accepts a new connection with the same sequence number in under 60 seconds, it fails the test.

All cache-off entries reported TIME-WAIT state of 60 seconds.

6.2 Filling the Cache

The cache-off rules require that caches be full for the PolyMix-2 tests. Measuring performance for empty or partially-full caches during the cache-off is wrong because those are not steady-state conditions. Moreover, in the past, filling the cache just once proved to be insufficient to reach steady state. In our experience, after the first fill, many products have ``optimal'' or unusually even placement of objects on disks. The second fill is necessary for the replacement policy to kick in and introduce some natural randomness in object placement on disks.

While the objects from the fill test are not reused during the PolyMix-2 test, the placement of the former affects the future replacement operations. An optimal placement of fill objects may improve proxy performance in the beginning of the PolyMix-2 test, delaying the steady state conditions.

Caches are filled using the PolyFill-2 workload. PolyFill-2 uses a ``best effort'' request submission mechanism. That is, each robot submits the next request right after the previous reply is received. This mode does not require specifying a request rate for the fill. Also, it is more robust in case the proxy experiences performance problems during the fill (the actual request rate may decrease if a proxy is slowing down). However, some entries tried to take too much load and were either failing or changing their run-time heuristics to handle this kind of heavy traffic. Since vendors are prohibited from rebooting the proxy or changing proxy settings after the fill, some vendors believed that their proxies were in sub-optimal state at the beginning of the PolyMix-2 test.

The ``Full cache'' requirement (along with a limited time for the cache-off) has various implications for restarting failed runs, and dealing with crashes. A participant with a cache that has a slower filling rate (or with a larger cache) requires more time to refill the cache than a participant with a faster (or smaller) cache. Hence, the former participant has less time to rerun or finish a given number of experiments. This inequality might penalize products that are, performance-wise, comparable with or superior to their competitors.

While ``Filling the Cache'' is a part of the test sequence and is required prior to beginning ``PolyMix-2'', no results of this part of the sequence are reported. Again, fill objects are not reused in the consecutive tests.

6.3 PolyMix-2

The ``PolyMix-2'' test is the main performance test which generates the vast majority of the reported numbers. This test is discussed in the ``Cache-off Workload'' Section.

6.4 Downtime Test

The ``Downtime Test'' is performed only after a successful PolyMix-2 run. Only one client-server pair is used. During the first 10 minutes of the test, Polygraph creates a 3 req/sec load through the proxy. The power to all participant devices, including networking equipment, is then manually turned off. After about 5 seconds of ``downtime'', the power is turned back on, and the measurement phase begins. We measure the times until the first miss and the first hit. Polygraph continues to emit 3 requests per second during the entire test. The precision of this test is around 5 seconds.

It is important to note that the cache(s) and networking gear were plugged into power strips. We turn off the power strips and not the equipment boxes to simulate realistic conditions of an unexpected software, hardware, or power failure. Vendors are not allowed to assist the reboot process. UPS devices of any kind are not allowed during this test.

We realize that the downtime test and execution rules are simple, if not primitive. However, even this test provides very useful data to cache administrators. Depending on the installation environment and reported cache performance, one can decide whether to invest in UPS systems and/or redundant configurations. We will work on improving the workload for this test.

7. Performance Details

This section gives a detailed analysis of major performance measurements. The PolyMix-2 workload has several phases. For the baseline presentation, we have selected the top2 phase. Top2 is the second 4hour phase with peak request rate. The first peak phase, top1, often yields unstable results. For example, see run-time traces of Stratacache F100 and Dell-1300 entries. On the other hand, a cache is usually in steady state during the top2 phase which explains our choice. Note that a few entries did not show very steady performance even during the second peak phase (e.g., Compaq-C2500 and Quantex-4000). Changes in performance are often attributed to insufficient cache size and hence loss in hit ratio when the cache becomes full too soon and starts deleting useful objects.

The bar charts below are based on data averaged across the top2 phase. Averages are meaningful in situations where performance does not change with time, or when changes are smooth and predictable. Chart descriptions below point out exceptions where average values are poor representations of actual performance.

The variations in performance among different phases and during a single phase often give us an insight into proxy performance. We discuss these variations as well. Auto-generated Polygraph reports for individual entries (linked from the product label in the executive summary table) show the variations in detail.

As with any benchmark, Polygraph introduces its own overheads and measurement errors. We believe that margin of error for most results discussed here is within 10%. In most cases, the reader should pay attention to patterns and relative differences in product performance rather than absolute figures.

The entries are charted in the ascending order of the request rate.

7.1 Normalized Throughput

Presenting throughput results is a tedious task. Due to tremendous differences in request rates, a simple graph with raw request rates from the ``Executive Summary'' table is not very informative. Moreover, comparing throughput of a $50K three-head cluster with a small $5K PC is usually not interesting.

Instead of charting raw data, we normalize the throughput results by the price of a product. The price is a universal, albeit not perfect, measurement of product complexity and ability. The normalized graph not only provides a fair comparison but answers an important question: ``How much throughput one thousand dollars can buy?''

Normalized Throughput

In terms of throughput alone, the iMimic entry shows the best return on a dollar. Cisco finishes last. There appears to be no strong correlation between performance/price ratio and absolute throughput (or price): Products showing good return on a dollar can be found on both ends of the throughput scale. Products from a single vendor usually yield similar performance/price ratios, implying a consistent, performance-dependent pricing policy.


We have argued that raw throughput figures should be normalized for a meaningful comparison. Similar reasons make analyzing the (hit rate / price) ratio worthwhile, especially when high hit ratio is an important factor in a given environment. This interesting ratio may improve the relative position of entries with large caches or, to be precise, with better hit ratios.

Note that normalizing some of the performance measurements by product price is not needed and does not make much sense. For example, hit ratios and response times should approach some ``perfect'' level regardless of the request rate supported by the product. It is the closeness to that ``ideal'' that characterizes the quality of the product in this case, not the absolute value of the measurement. On the other hand, various throughput results, of course, do not have an ``ideal'' level and absolute throughput measurements are meaningful.

7.2 Hit Ratio

Hit ratio is a standard measurement of a cache's performance. The PolyMix-2 workload offers a hit ratio of 55% -- a cache cannot achieve a higher hit ratio. However, due to various overload conditions, insufficient disk space, deficiencies of object replacement policy, and other reasons, the actual or measured cache hit ratio may be smaller than the offered 55%.

Document Hit Ratio

The ``Document Hit Ratio'' chart shows how a cache maintains cache hit ratio under highest load. The second cache-off results differ a lot from previous tests. Squid and Swell (a Squid-based product) lead the pack with perfect document hit ratios. Most entries failed to sustain even 50% hit ratio. Stratacache F100 and InfoLibria-30i finish last with 32% hit ratio. Note that the cache-off rules disqualify runs completed with hit ratio less than 25%.

The Byte Hit Ratio (BHR) chart is less interesting because the PolyMix-2 workload does not accurately model the relationship between object size and popularity. Most entries have BHR slightly higher than their DHR. iMimic and Swell entries apparently did not cache very large objects and reported noticeable loss in BHR compared to expected values. We know that Swell entry was explicitly configured not to cache objects larger than 384KB.

Cache ``Age''

The two primary reasons for losing hits are insufficient disk space and proxy overload conditions (various by-pass modes). The ``Cache Age'' chart below shows an estimated maximum age of an object purged from the cache (the objects are purged to free room for incoming fill traffic).

To estimate that maximum age, we divide cache capacity (as specified by the vendor) by the fill rate. The latter is the rate of stream of cachable misses as measured by Polygraph client. Raw fill stream measurements can be found in the ``Stream Rates'' table on individual entry reports. We believe that our formula yields a ``good enough'', albeit not precise, approximation of real word measurements.

Cache Age

The majority of entries have cache capacity capable of storing just a few hours of peak fill traffic. In an effort to improve throughput/price ratios, many vendors designed or configured their products to deliver highest request rate at minimum disk storage costs. Consequently, their boxes do not have enough storage space to keep objects longer than for a few hours. Hit ratio traces illustrate the phenomenon (e.g., Stratacache F100, IBM-3500M10, and Compaq-C900). Some caches were able to hold objects longer, but did miss some hits at the end of the test (e.g., Lucent-100). The latter can be attributed to the difficulties in predicting the exact Working Set Size (see the ``Working Set Size'' Section).

Many cache administrators believe that an production cache should store about 2-3 days of traffic. Note that Polygraph reports a similar measurement for peak fill traffic only (cachable misses). Since not all replies are cachable misses, fill rate is about 25% of request rate. Realistic object popularity distributions require extra cache capacity compared to a simple FIFO policy. Also, a typical proxy works at its peak level for less than 8 hours per day. Thus, the 2-3 days rule of thumb probably corresponds to 5-10 hours of fill traffic.

The cache capacity requirement depends on your environment. When configuring a caching system based on our performance reports, make sure you get enough disk storage to keep sufficiently ``old'' traffic. You may have to increase the price and re-compute performance/price ratios if a product you are considering does not have enough storage. You should also check that the product is actually available with the additional disk space. These adjustments may significantly affect the choice of a price-aware buyer.

7.3 Response Time

The Mean Response Time chart (below) shows response times under peak load. To simulate real-world conditions, the PolyMix-2 workload introduces an artificial delay on the server side. The server delays are normally distributed with a 2.5 sec mean and 1 sec deviation. These delays play crucial role in creating a reasonable number of concurrent ``sessions'' in the cache.

To simulate WAN server side connections, we introduce packet delays (80 msec round trip) and packet loss (0.05%). These delays increase miss response times and, more importantly, reward caches for using persistent connections (TCP connection setup phase includes sending several packets that also incur the delay).

The delays, along with the hit ratio, affect transaction response time. Ideal mean response time for this test is impossible to calculate precisely because the model is too complex. We estimate the ideal mean response time at about 1.3 sec.

Mean Response Time

iMimic reports the best response time. Many entries show good response times below 1.5 sec. InfoLibria results are noticeably worse, about 2 sec and 3 sec.

Slight periodic irregularities on InfoLibria-30i traces (especially visible on the response time trace) are attributed to their bypass algorithm that was tuned ``just right'' and changes the flow of traffic to the cache quite often.

Both Cisco entries, Cisco CE-590 and Cisco CE-7300, show an interesting pattern of periodic response time spikes, about one hour apart, and less than a minute long. Those spikes are caused by proxy software checkpoints that do not allow the cache to reply promptly.


Hit Ratios affect, but do not define response times. In an ideal scenario, it takes a negligible amount of time to deliver a cache hit to the client. Fast cache hits decrease average response times. In the same unrealistic scenario, it takes only ~2.6 sec to deliver a cache miss. In practice, both hits and misses may incur significant overheads.

The hit and miss response time charts help to explain the ``Mean Response Time'' values shown above. For example, InfoLibria-30i hits take 1.6 sec on average, providing relatively small improvement to overall response time.

A median response time chart is also available. Note that median response time should be interpreted with special care. Response time median is highly susceptible to average document hit ratio (DHR), essentially reporting response time for hits if DHR is higher than 50%.

7.4 Load dependency

The PolyMix-2 workload allows us to check whether proxy performance is load dependent. By comparing proxy performance during the idle phase with the performance during top2, we can measure how hit ratio and/or response time change with offered load.

Many proxies show no or negligible dependency (e.g., NetApp). Others have to decrease Hit Ratio (e.g., InfoLibria-30i and IBM-5000) and/or slow down hits (e.g., Lucent-100 and InfoLibria-X1x3) to cope with higher load.

7.5 Downtime test

The downtime test is designed to estimate the time it takes a product to recover from an unexpected condition such as power outage or software failure. Polygraph measures the time until the first miss (TTFM, see the chart below) and the time until the first hit (TTFH).

Time until first miss

From a user's point of view, the time until the first miss is somewhat more important. As soon as the caching system is able to deliver misses, the user is able to access the Web again. Delivering hits is also important to reduce outgoing bandwidth usage and from a quality-of-service point of view. Details of the test are described elsewhere.

All entries except Swell were able to complete the test. The Swell product required manual intervention to reboot (pushing the power button) because the BIOS would not allow for an automatic boot after the power has been turned back on. Such an intervention was prohibited by the cache-off rules.

Note the near-zero TTFM for InfoLibria-30i entry attributed to the DynaLink component of the product. InfoLibria-X1x3, the only product using a L4 switch at the cache-off finishes second: TTFM for a L4-based cluster is the time it takes for the switch to boot and detect that a proxy is not yet ready to respond.

For systems with large amounts of RAM and/or many disks, TTFM depends primarily on how much various housekeeping operations are optimized and parallelized. For example, scanning 4 GB of RAM twice or spinning all 10 SCSI disks sequentially significantly increases TTFM readings. We note that the Downtime test is a new requirement in the Polyteam test suite, and all vendors are likely to work on improving their results.

Most entries show negligible delay between the time until the first miss and the time until the first hit. Recall that the precision of this test is around five seconds.

8. Product Configurations

Here are the configuration details for all tested products.

Label Full product name Product version Price
(US$)
Availability date Cache units CPUs
(n x MHz)
RAM
(GB)
Cache disks
(n x GB)
NICs
(n x Mbps)
Switching gear Cache
(GB)
OS
Cisco-590 Cisco Cache Engine 590 2.3-CE590 beta 42975 May 01 1 1xPIII-600 1 7x18 1x100 Cisco Catalyst 2912 XL 108 CE
Cisco-7300 Cisco Cache Engine 7300 CE7300 127500 May 01 1 2xPIII-Xeon-733 2 21x18 1x1024 Cisco Catalyst WS-C3524-XL-A 200 CE
Compaq-C2500 Compaq TS-C2500 1.2 39500 May 01 1 1xPIII-Xeon-800 4 18*9.1 2x1024 Compaq SW-5450 169 Novell ICS
1.2.76
Compaq-C900 Compaq Task Smart C900 1.2 6955 May 01 1 1xPII-500 0.5 2x9.1 1x100 Compaq 5708 TX 13 Novell ICS
1.2.76
Dell-1300 Novell ICS Powered By Dell PowerEdge 1300 1.2 6263 Feb 01 1 1xPIII-500 0.25 2x9.1 1x100 Netgear FS-105 8.1 Novell ICS
1.2.76
IBM-3500M10 IBM-3500M10 1.2 4999 Feb 02 1 1xPIII-550 0.25 2x9.1 1x100 Netgear FS-108 NA 8.1 Novell ICS
1.2.76
IBM-5000 IBM-5000 1.2 9099 Feb 02 1 1xPIII-600 0.5 3x9.1 1x100 Bay BayStack 350T 21.8 Novell ICS
1.2.76
IBM-5600 IBM-5600 1.2 19999 Feb 02 1 1xPIII-600 1 6x9.1 1x1024 Foundry FastIron WorkGroup w/ Gbit 47.3 Novell ICS
1.2.76
iMimic-5000 iMimic DataReactor 5000 1.0beta 12995 May 01 1 1xAMD Athlon-700 0.75 8x18 1x27 1x1024 Netgear FS-518 144 FreeBSD
4.0
InfoLibria-30i DynaCache-30i 3.0 beta 23495 May 01 1 1xPIII-600 1 2x18 1x9 1x100 DynaLink-2 36 Proprietory
InfoLibria-X1x3 Cluster of DynaCache-X1 3.0 beta 40485 May 01 3 3xPIII-500 3 6x18 3x9.1 3x100 Foundry ServerIron (16Port) 108 Proprietory
Lucent-100 Lucent IPWorX WebCache 100 2.0 beta 17995 May 01 1 1xPIII-450 0.75 4x9.1 1x10.2 1x100 Lucent Cajun P120 32 FreeBSD
3.3
Lucent-150a Lucent IPWorX WebCache-150a 2.0 beta 24995 May 01 1 1xPIII-600 1 4x18.2 1x10.2 1x100 Lucent Cajun P120 64 FreeBSD
3.3
Microbits Microbits Intelli-App Pizza Box 1.0 2150 Feb 01 1 1xCyrix-266 0.125 1x6.4 1x100 SMC EZ Hub 100 5004TX 4 Novell ICS
1.2.76
NetApp Network Appliance NetCache Appliance 4.1 5950 May 01 1 1xCeleron-433 0.25 1x9.0 1x100 Netgear FS-108 8.1 Proprietory
OCD-F100 StrataCache F100 1.2 5995 Feb 01 1 1xPIII-500 0.25 2x9.1 1x100 3Com Dual Speed Switch 16 ports 8.4 Novell ICS
1.2.76
Pionex-1000 Pionex PCA-1000 1.0 4499 Feb 01 1 1xAMD-K6-450 0.25 1x10 1x100 Intel InBusiness 8 port 10/100 switch 9 Novell ICS
1.2.76
Quantex-4000 Quantex WebXL-4000 1.2 18999 Apr 15 1 1xPIII-Xeon-550 1 6x9.1 1x1024 Intel Express 460T Standalone Switch ???? Novell ICS
1.2.76
Squid-2.4 Squid 2.4.DEVEL2 4,210 Feb 01 1 2x450 0.5 6x8 1x100 3Com TP800 24 FreeBSD
3.3R
Swell-1000 Swell Tsunami CPX-1000 0.99beta 2239 Feb 15 1 1xAMD-K6-500 0.5 3x13.5 1x100 Netgear FS-105 i17 Linux
2.2.14

All numbers are totals for the tested configuration unless noted otherwise. See individual product pages for more details about box configurations.

9. This Cache-Off Controversies

9.1 Delayed ACKs

Perhaps the most controversial aspect of this cache-off revolved around TCP delayed ACKs and an error in our preparation documentation.

The phrase ``delayed ACKs'' refers to a feature of most TCP implementations. Instead of acknowledging every single data packet, ACKs can be delayed under the assumption that another data packet will arrive soon. This benefits the network because fewer packets are sent on the wire. In many implementations, ACKs are delayed no longer than 200 milliseconds. This is important for telnet and similar applications.

Delayed ACKs often have a noticeable effect on proxy cache performance in our tests. Delayed ACKs can increase mean response time, compared to the case without delayed ACKs. At the same time, however, delayed ACKs result in fewer packets on the network. We have observed that some products can achieve higher throughput (requests/sec) with delayed ACKs. Thus, delayed ACKs may present a tradeoff: Do you want higher throughput or lower response time?

In our experience, delayed ACKs usually cause mean response time to increase. However, disabling delayed ACKs does not always cause throughput to increase.

We brought this issue to light on the cache-off mailing list on November 29, 1999. After a week, we received no input or opinions from cache-off participants. Our discussion on FreeBSD mailing list showed that many belive that delayed ACKs are usually harmful and should be disabled by default. Thus, we decided to continue disabling delayed ACKs.

The controversy arises in our documentation. We wrote a ``FreeBSD-3.3 Tuning'' document. This document (correctly) instructed people to disable TCP ACKs, among other things. In order to sufficiently tweak FreeBSD-3.3 for benchmarking, we also made some kernel source code modifications. To make it easier for people to get a correctly configured kernel, we made a source tree available via rdist. Note, however, that we did not disable TCP ACKs in the kernel source. Rather, it was done with a sysctl command in the boot scripts.

On December 7, 1999, we published a detailed document describing how to practice for the cache-off. This new document advised people to read the FreeBSD-3.3 Tuning document, OR to use rsync to get the kernel source code. This is our error. People who used only rsync, without reading the Tuning page, did not see the part about disabling delayed ACKs.

Many participants had set up FreeBSD-3.3 in their own labs, but did not have delayed ACKs disabled. At the cache-off, some of them found that they could not achieve the same throughput that they had seen in their own lab.

9.2 Working Set Size

The notion of Working Set Size was discussed in ``Working Set of URLs'' Section. We asserted that maintaining constant WSS was important to preserve steady state conditions during an experiment. On the other hand, a vendor can try to estimate the minimum cache capacity required to store a complete Working Set. Such a cache would not waste storage for objects that are never requested, improving price/performance factors.

The procedure of estimating the minimum cache capacity is not trivial. Most caches use LRU-like replacement scheme, while Polygraph maintains large number of independent FIFO queues that ``move'' at different speeds. This difference makes simple request rate based calculations useless. An Web object from a ``slow'' group may remain in the Working Set for a long time, while objects from ``fast'' groups are evicted sooner. It is not clear whether multiple-FIFOs or LRU models are closer to reality. Both can be justified using real world phenomena.

Some vendors were puzzled by the complex relationship between WSS and cache capacity and suggested that an LRU-based model must be used instead. Unfortunately, those suggestions came too late. Technical difficulties of implementing a global LRU scheme in a distributed environment did not allow us to consider LRU model.

We believe that many cache-off entries had disproportionally small caches regardless of the WSS model in use. Real world proxies must cache at least a few days of traffic, and many entries would not be able to do that at advertised request rates, even after adjusting for hours of non-peak load. The latter is a problem that cannot be solved completely in a benchmarking environment because we cannot run a single test for weeks at a time.

9.3 Filling the Cache

Cache-off rules required pre-filling the cache before performance tests using the PolyFill-2 workload. PolyFill-2 requires cache size as an input parameter. The rule said that ``This size is not affected by high watermarks that some caches may be configured with. S is simply the sum of disk blocks where a Web object may reside''.

At the cache-off, Polyteam used the physical size of disk media to calculate the cache size. Our argument was that it is impossible, in general, to have a reasonable guarantee that cache or OS data placement policy will not use all disk blocks (not necessarily at the same time) to store data. Our interpretation was simple to enforce because physical size of cache drives is a well-known proxy configuration parameter.

Several vendors have configured their caches to use only part (e.g., 60%) of the available disk space. Some argued that they can guarantee that the data will at no time ``touch'' all disk blocks. These vendors asked to reduce the cache size parameter of PolyFill-2 in order to speed-up the fill. We refused to do so in order to preserve identical fill conditions for all entries.

It remains unclear how to calculate the cache size so that both sides are satisfied. This will become a subject of future cache-off planning discussions.

9.4 ARP Traffic

The configuration of polyclts and polysrvs required a number of aliases to be allocated to each physical interface. This created a large number of ARP requests and responses on the local segment as a request is generated for each IP bound to an interface. Therefore the ARP component of traffic is larger than it would be in a more standard environment. Cache servers are also required to maintain and update a much larger ARP table than normal as all clients and servers are on the local segment not as would normally be the case accessible via one or more router interfaces. The Address Resolution Protocol is defined in RFC-826.

10. Comments

10.1 Polyteam Comments

Dell-6350

The Dell-6350 entry did not complete a PolyMix-2 test during the alloted time. Dell's pre-cacheoff expectations were different due to an error in the cache-off preparation documentation, as described in the ``Delayed ACKs'' Section of the report.

Polyteam allows any vendor that attends a cache-off to test their products in the PolyLab after the cache-off. Polyteam will test Dell-6350 product in March 2000, and those results will be available on the Polygraph Web site.

Squid-2.4

Polyteam members acknowledge the fact they have participated in the development of Squid software. We have limited the interaction between Squid and Polygraph development activities to the extent possible. The configuration and tuning decisions for the Squid-2.4 entry were made well before we had any private knowledge about future cache-off entries. We have applied the same rules to Squid-2.4 entry as to any other cache-off entry.

Stratacache M-500

The Stratacache M-500 entry did not complete a PolyMix-2 test during the alloted time. Stratacache's pre-cacheoff expectations were different due to an error in the cache-off preparation documentation, as described in the ``Delayed ACKs'' Section of the report.

Polyteam allows any vendor that attends a cache-off to test their products in the PolyLab after the cache-off. Polyteam will test Stratacache M-500 product in March 2000, and those results will be available on the Polygraph Web site.

10.2 Vendor Comments

It is a Polyteam tradition to give cache-off participants a chance to comment on the results after they have seen the review draft. The comments below are verbatim vendor submissions. Polyteam has not verified any of the claims, promises, or speculations that these comments may contain.

Cisco
http://www.cisco.com/go/cache

Cisco would like to thank the IRCache PolyTeam for the opportunity to demonstrate that the Cisco Cache Engine 590 and 7300 have the industry's best overall performance as measured by throughput, latency, and byte hit ratio.

Cisco believes that customers deploy caching for two main reasons:

To this end, we believe that the Cisco Cache Engine 590 and 7300 as demonstrated in this cache-off are more than capable of achieving both -- in many cases where competing products are only achieving one of these goals, or, in some cases, neither.

In this cache-off, Cisco chose to bring prototype products that are designed for the high-end Service Provider market.. As such, while the products achieved exceptional overall performance, some aspects of the 'prototype' nature of the appliance have been highlighted by the PolyTeam. In particular, a "once an hour" latency spike was observed to occur every hour for up to 10 seconds. This was (as the PolyTeam have correctly hypothesized) as a result of the cache file-system checkpoint activities. The final shipping product will have cache file-system check-pointing without any noticeable side-effect on response time for the duration of the checkpoint. We look forward to benchmarking the final shipping product with the PolyTeam at a future date.

Finally, in interpreting the results of the cache-off, it is important to take into consideration the total cost of ownership of a network caching solution - the total cost beyond that of the initial hardware purchase, as well as the feature set and future direction of the product.

Cisco would like to thank the IRCache PolyTeam for once again raising the standard in public web-cache benchmarking. The Polymix-2 workload used in this benchmark has been the most accurate representation of "real" web traffic to date. Cisco looks forward to participating in future benchmarking events, and hopes that other vendors who chose to not participate will do so in future events.

Compaq
http://www.compaq.com/tasksmart

The outstanding performance of the TaskSmart C-Series enables Compaq customers to dramatically accelerate delivery of Web content, enhance web server farm efficiency, improve users' experience with their website, and delay costly network bandwidth upgrades. Compaq's TaskSmart appliance server technology is designed to enhance organizations' e-business solutions by allowing more of their customers and users to access the Internet smarter and faster than with any other caching solution available.

Dell-130B
http://www.dell.com/us/en/biz/topics/products_sltn_pedge_000_cache.htm

The Novell Internet Caching System - Powered By Dell (Dell ICS 130B) used in the Second IRCache Web Cache Cache-off tests, is currently available from Dell Computer. As an integral part of the Dell ICS cache appliance product offering, the Dell ICS 130B has enjoyed great success with a proven price/performance track record among our customers.

The price/performance success achieved at the IRCache Cache-off by the Dell ICS 130B - in the top three price/performance ratings - again speaks to the proven performance this product is able to provide towards accelerating your Internet access, and reducing network infrastructure costs.

The 90ms cache hit mean response time achieved by the Dell ICS 130B, validates the performance acceleration customers can expect from Dell ICS cache appliances.

Dell Computer is continuing efforts to improve the price/performance value of the Dell ICS cache appliance product offering, and will soon be introducing a complete line of rack dense cache appliance servers, that will feature improved remote management capabilities, even faster performance, and greater storage capacity.

Each Dell ICS system is available with Dell's BusinessCare support program including:

IBM-3500M10
http://www.ibm.com/

The IBM Netfinity 3500 M10 was the top performer in it's price range. With flawless performance it generated no spikes or downtime at 376 hits per second in the Web Polygraph 2 tests. This new generation, entry-level server with superior capacity, sets the standard for e-business in small to medium size businesses, or for a workgroup server in your enterprise network. The economical price for this server appliance is an excellent resource for providing caching services at remote offices or as part of a tiered cache cluster. Implement Novell's newest caching software today on the Netfinity 3500 M10.

IBM-5000
http://www.ibm.com/

To achieve the performance of the Netfinity 5000, ISPs or Enterprise IT managers would need to spend twice as much to gain similar 600 hits per second in the Web Polygraph 2 tests performance and still not achieve the reliability shown with no spikes or downtime. The Netfinity 5000 is another price performance caching appliance leader with an affordable blend of power and manageability. Implement Novell's newest caching software today on the Netfinity 5000.

IBM-5600
http://www.ibm.com/

For mission critical caching needs, the IBM Netfinity 5600 gives customers reliable availability with no spikes and/or downtime at 1300 hits per second in the Web Polygraph 2 tests recently run at the NLANR cache cacheoff. The IBM Netfinity 5600 offers a top tier compatibility tested Internet/Intranet cache solution to web enabled organizations, content publishers and ISPs where performance is key to delivering successful e-commerce and ensuring the success of any e-business. Implement Novell's newest caching software today on the Netfinity 5600.

iMimic-5000
http://www.imimic.com/

iMimic Networking is very pleased with the performance of the DataReactor 5000 web caching solution. The DataReactor 5000 has set a new price/performance record, achieving top-tier throughput at mid-range cost. Our product also provided the best overall response time, a key requirement for end-user experience. The DataReactor 5000 will be available for purchase on April 30, 2000.

We would like to thank Polyteam for their efforts in developing the Polygraph benchmarking software and coordinating the Cacheoff event. We look forward to showcasing more members of the DataReactor family of web caching products at future IRCache Cacheoffs, with the DataReactor 5000 serving as our mid-range offering. Please visit our web page for more information on iMimic Networking and our DataReactor web caching solutions.

InfoLibria
http://www.infolibria.com/

InfoLibria was pleased to participate again in the IRCache cacheoff. InfoLibria was one of the very first supporters of this cacheoff process, believing that all caching vendors should participate for the benefit of the user community.

Ten months after the first event, InfoLibria has managed to maintain its lead in transparent caching reliability, which is measured for the first time in this cacheoff's "downtime test" (section 6.5).

Reflecting InfoLibria's commitment to transparent caching, both of InfoLibria's entries uniquely tested in fully transparent mode. So, the InfoLibria prices shown in this report were the only ones to include the cost of the necessary L4 capability (a switch in one case, and InfoLibria's own DynaNIC/DynaLink in the other). An accurate apples-to-apples price/performance comparison should take this into consideration.

InfoLibria was the only vendor to test an L4-switch-based cache cluster in this cacheoff. Not only does this demonstrate performance scalability, but also the commitment to enter real-world configurations in the cacheoff.

In the course of the ten months between the first and this second cacheoff, InfoLibria has dramatically improved its cache performance, capacity and density. These improvements continue at the same pace in all aspects of software and hardware.

Caching, for InfoLibria, represents only one component in Internet Content Distribution and Delivery (CDD) systems. CDD systems possess many additional attributes that are not yet measured in the IRCache cacheoff. The CDD systems that InfoLibria builds today, based on its caching platform, focus on, and excel in the following key attributes:

InfoLibria would like future cacheoffs to start measuring some of these central quantities.

Lucent
http://www.lucent.com/ins/solutions/ipworx.html

Lucent appreciates the opportunity to participate in the second IRCache cache-off.

Overall, we are pleased with the results from these tests. The Lucent IPWorX products deliver a full range of performance characteristics at a competitive price. Our 675 request / second WebCache 100 unit is designed for use in clusters scaled to fit the user's network needs. The WebCache 100 includes 32 GB available cache storage per unit in a 2U stackable form factor with two Fast Ethernet interfaces. Accurate caching rules for HTTP and FTP, address bypass, 10 Year MTBF, administrative control through both a secured web interface and a command line interface, activity logging, and usage statistics complement its performance characteristics. All of this is backed by Lucent Sales and Service experience.

The WebCache 100 and WebDirector 80 products will be generally available on May 1, 2000. Contact ipworx@lucent.com for more information.

Microbits
http://intelliapp.microbits.com.au/

Mission accomplished.
Microbits entered its first Cacheoff to introduce to the world the highest performance, best value for money caching device available for the 95% of Internet users with 10Mbit, or less, access. Our entry, the Pizza Box, was the lowest cost, the smallest form factor and achieved the best price performance under US$5000 of any entry.

Microbits released its Intelli-App range of 6 Internet caching appliances, based on Novell's ICS, in August 1999. Our entry level Pizza Box has, by our tests, the worst Price / Performance quotient as measured in the Cacheoff of any of our models. Our focus is giving our customers the best value for money for a cache that will meet their current needs and scale with their needs. For 95% of organizations that's the Pizza Box and that's why it was our entry in the Cacheoff.

Our other 5 models are specifically designed for Large Organizations, ISPs and Telcos, who require large cache sizes and high bandwidth handling capabilities. Handling from 300 requests per second and up, they all have Intel Validated Server reliability, optional rack form factor, redundancy, and are very keenly priced.

NetApp
http://www.netapp.com/products/netcache/

These results demonstrate that Network Appliance clearly leads in delivering the best end-user experience. NetCache's fast data architecture and advanced caching algorithms achieved the best median response time, the best mean hit response time, and an unparalleled Time Till First Miss (TTFM)--equating to very fast access and high availability for the end-user. It should be noted that our 1.2 minute TTFM is not the time it took for the switch to reboot, but impressively the actual recovery time for the NetCache itself. Equally significant, NetCache delivered these fast response times and a greater than 50% hit rate without requiring a large number of drives. Compared to our competition in this segment, NetCache delivered up to a 56% higher hit rate and up to 15 times faster response time. NetCache's unique ability to support all 3 major streaming media protocols, as well as HTTP, FTP, and NNTP, give our customers the best user experience and the most bandwidth savings.

Network Appliance thanks Polyteam for their continued efforts in evolving this benchmarking tool and looks forward to enhancements in HTTP and streaming media benchmarks.

Pionex-1000
http://www.pionexelite.com/pca

Pionex would like to extend its appreciation to IRCache's PolyTeam and all the vendors that made the 2nd Web Polygraph cacheoff a success.

Designed ground up to be a true *plug-and-accelerate* caching appliance, the Pionex Elite PCA-1000 proves to be a well-balanced caching solution. The PCA-1000 tested well in all categories, providing excellent cache hit rates and response times as well as cache age numbers that were 30% better than the average of all tested caches. The PCA-1000 did not sacrifice one performance area to get better results in others. In determining a well-balanced caching solution, consideration should be acknowledged not only in areas of performance and price, but also manageability, scalability, ease of use, and total cost of ownership. Based on Novell ICS, the Pionex Elite PCA-1000 excels in each of these other areas, providing web-based manageability, clustering capabilities, and un-paralleled ease of use

Quantex-4000
http://www.quantex.com/webxl

Quantex would like to thank IRCache's PolyTeam for their superb performance and impartial execution of the 2nd Web Cache Cache-off, maintaining the integrity of such an event while providing valuable information to help the caching user community better understand the multitude of available cache offerings and options. Quantex is extremely pleased with the performance results generated by its beta version of the WebXL 4000 caching appliance, powered by Novell ICS.

Although the test generated several good performance metrics (i.e. throughput, response time, hit ratio), the Quantex WebXL 4000 also provides other strong attributes that need to be considered in selecting the optimal caching solution. Included in this list are ease of use and implementation, robust web-based remote management and performance monitoring features, scalability and availability features like clustering, rack optimized form factor, and additional value-added services like URL filtering/blocking. With this combination of price/performance and features, the Quantex WebXL 4000 will be one of the best caching values on the market.

Stratacache F-100
http://www.stratacache.com/

The Stratacache Flyer F-100 Model represents an excellent choice for entry level caching with an aggressive mix of high performance caching (400 tps), superior product management and a favorable product price. Leveraging the power of Novell ICS in concert with a high performance caching hardware architecture, we feel that the Flyer offers a superior caching solution in a slim 1U rackmount case.

Within our actual Polymix test runs, our lower than average hit ratio (32%) is a result of our decision to test the Flyer with our standard shipping configuration of 256Mb of RAM. We feel that the configuration of the system with 256Mb of RAM provides the best cost/performance ratio and overall customer value for use in the typical Corporate, ISP/ASP or Carrier environment. However, in a Polymix-2 test environment, 256Mb of memory does not provide our cache the optimal level of memory for the sustained Polymix-2 workload. We had the option to use 512Mb of memory in the system to "alter" the memory pool size and improve cache performance for the purposes of the test, but felt that since our standard shipping configuration uses 256Mb it was in the best interest of our customers to provide results on a standard Flyer model rather than a unit specially enhanced for the Cacheoff. Unlike some other configurations tested, we test and ship the same product models without alteration.

Please note the exceptionally fast Mean Hit response time (41msec). We feel that the 41msec Mean Hit and 400 req/sec rate combined with the 63 req/sec/$1k make our system a standout in the field.

Please check the IRCache web site over the next several months as we continue to release updated performance results from Official Polymix testing of further product enhancements.

The Stratacache team would like to take the opportunity to thank the Cacheoff team for the opportunity to participate in the Second Cacheoff and we support the use of Polygraph as a tool to provide objective performance results for caching technology products.

Stratacache M-500
http://www.stratacache.com/

The Stratacache M500 entry was not able to complete a full Polymix-2 test during the allotted time due to differences in testing environments and test preparation, including the "Delayed ACK" issue as stated by Polyteam The configuration in our internal Stratacache test labs differed from the Polymix-2 testing environment in Houston.

Our lab testing environment differed from the actual Cacheoff environment as we did not have the same delayed ACK configuration in our labs as was used at the Cacheoff. Our internal lab test configuration also included a router in the testing environment. The use of the router in our internal testing environment masked the significance of the ACK configuration error on our Polygraph stations in our internal tests. Post Cacheoff testing, enhancements have been made to our proxy module to reduce/eliminate the problem surrounding the ACK configuration differences, and we have also properly updated the configuration in our testing lab. We were free to bring a router to the Cacheoff but did not to provide the most aggressive cost/performance testing results possible. Unfortunately the differences in testing environments prevented us from achieving a successful Polymix-2 test run.

The goal in testing in our in-house simulated network environment is to provide performance results that are accurate to the "real world" environment of our customers. That "real world test" includes a routed architecture in nearly every customer scenario, which provided for a different traffic architecture than we tested at the Cacheoff.

In the next several weeks, we will be retesting our Metroliner M500 series system at the NLANR Labs in Boulder and we will be providing updated results on the Polygraph web site.

If you are considering the purchase of a very high performance caching system providing 2000+ TPS, it would be well worth the time to check out our updated results on the Polygraph site.

The Stratacache team would like to take the opportunity to thank the Cacheoff team for the opportunity to participate in the Second Cacheoff and we support the use of Polygraph as a tool to provide objective performance results for caching technology products.

Swell-1000
http://www.swelltech.com/

Swell Technology would like to thank the Polyteam for providing a fair and thorough test of caching products. It was executed admirably and we look forward to the next cache-off, where we plan to display a wider range of our caching products.

Though we experienced many minor difficulties preventing us from having more than one completed run, we're very excited about the results we achieved. The CPX-1000's nearly ideal hit ratio and large cache size add up to unmatched bandwidth savings. Further, the CPX showed response times that were among the best even under heavy loads and not equaled for under $4000.

Overall, the cache-off results were very nearly what we came expecting, and we're happy to have had an opportunity to prove our products in a controlled environment under heavy load. We regret not being able to perform a down-time test, due to a BIOS bug that was not fixed by the time of the cache-off. We expect our shipping product to exhibit excellent down-time recovery.

In summary, the Swell Tsunami CPX-1000 has proven itself to be a fast and reliable performer in the low end market. We feel that this, combined with a friendly administration interface and a full range of access and content control features, makes the CPX an excellent choice in the low to mid range web caching proxy category.



$Id: index.sml,v 1.38 2000/02/29 00:49:08 rousskov Exp $