Results: CacheFlow DataComm-1

Web Polygraph

In November 1999, we benchmarked two CacheFlow products: a single CF-5000 unit and a cluster of five CF-5000s. The caches were tested in our Lab using DataComm-1 workload. We have followed the DataComm rules, methodology, and presentation format to ease comparison of the results.

DataComm tests were first executed for Data Communications magazine in June 1999. A detailed description of DataComm-1 workload and June tests is available elsewhere. As many of our readers know, CacheFlow participated in June tests with a two-head CF-5000 cluster. Since June, CacheFlow changed CF-5000's configuration and improved the performance of CF-5000 under DataComm-1 workload. We label June results in this presentation as ``CF-June'' to ease the comparison. See ``Box Configurations'' section for important configuration details of tested products.

As always, we do not give purchasing advise. We provide performance measurements and expect the reader to use this information, along with other important factors, in their decision making process.

1. Executive Summary

The ``Executive Summary'' table below summarizes the performance results. Complete Polygraph logs are also available for those who want to reproduce the results or extract measurements not presented here.

Box Total
Price
($)
Through
put
(req/sec)
Mean Response
Time (sec)
Hit
Ratio
(%)
Price/Perf
(req/sec/$)
Persistent
Connections
Hit All Miss Clt Srv
CF-June
two beta CF-5000s
191,990 2000 0.27 2.05 3.90 50.91 10.4 yes no
Unit
one CF-5000
179,990 2300 0.08 1.49 3.21 54.98 12.7 yes yes
Cluster
five CF-5000s
805,985 13000 0.08 1.51 3.21 54.67 16.1 yes yes

The ``Total Price'' column is a sum of the list price of cache(s) and the network gear in the test setup. In our price/performance analysis, we use total price rather than cost of the cache alone to adjust for expensive equipment that may be required to achieve high performance numbers and/or aggregate individual caches into clusters. If the reader already has the required network components in place, the price/performance ratios should be adjusted.

Mean response time is reported separately for hits and misses to show the proxy savings and overheads on the two most important request paths. The ``All'' column depicts response time for all request classes.

Price/Performance column shows throughput (in req/sec) normalized by Box price (in thousand dollars). In other words, ``how much throughput one thousand dollars can buy?'' To be precise, the column should have been named ``performance/price'', but we decided to use the traditional wording.

The ``Persistent Connections'' column shows if a box supports HTTP/1.1 persistent connections on server and client side of a proxy.

All runs finished with at most 0.1% of failed transactions. Note that the rules disqualify a run with more than 3.0% of errors.

2. Performance Details

This section gives a detailed analysis of major performance measurements.

2.1 Throughput

The chart below shows request rates from the ``Executive Summary'' table. Considering that CF-June is a cluster of two CF-5000 units, the request rate supported by CF-5000 more than doubled compared to a similar unit tested in June. Interestingly, a cluster of five identical caches can support higher request rate per-unit (2600 req/sec versus 2300 req/sec).

Raw Throughput

The second throughput chart normalizes raw request rate by the price of a Box. This chart is helpful in comparing performance of caches from different price and throughput ranges. The price is a universal, albeit not perfect, measurement of Box complexity and ability. This chart answers an important question: ``How much throughput one thousand dollars can buy?''. This chart is often good place to start your price/performance analysis.

Normalized Throughput

Throughput wise, clustered solution shows the best return on a dollar. This is not a surprise given lower per-unit price and higher per-unit throughput of the cluster.

2.2 Hit Ratio

Hit ratio is a standard measurement of cache's performance. The DataComm-1 workload offers a hit ratio of 55% so a cache cannot achieve a higher hit ratio. However, due to various overload conditions, insufficient disk space, deficiencies of object replacement policy, and other reasons, the actual or measured cache hit ratio may be smaller than offered 55%.

Document Hit Ratio

The ``Document Hit Ratio'' chart shows how a cache maintains cache hit ratio under highest load. In June tests, CacheFlow demonstrated sub-optimal hit ratios around 50%. Clearly, November version of CF-5000 does not suffer from hit ratio degradation and maintains close-to-ideal hit ratio with marginal degradation of 0.6% for the clustered solution. CacheFlow attributed past hit ratio problems to the L4-switch limitations. The caches tested in November used different switching gear.

2.3 Response Time

The Mean Response Time chart below shows response times under peak load. To simulate real-world conditions, the DataComm-1 workload introduces an artificial delay on the server side. The delays are normally distributed with a 3 sec mean and 1.5 sec deviation. They play crucial role in creating a reasonable number of concurrent ``sessions'' in the cache. The delays along with 55% hit ratio also affect transaction response time. Ideal response time for this test would be 1.35 sec (45% of 3 sec mean), corresponding to zero-delay hits and 3 sec delay misses. This ideal time is shown on the chart with a black horizontal line.

Mean Response Time

Based on response time data, CacheFlow achieves good scale with no degradation of performance due to clustering. Past response time problems are also solved (probably due to improved hit ratio).

Less than a 100msec (about 6%) separates CF-5000 from average response time leaders of June tests. The difference may be attributed, in part, to short periods of performance degradation observed during the 4 hour tests (see ``Traces'' section). We do not know the exact reason for those response time bumps. However, since the problems

we speculate that the caches were at fault.

3. Raw Traces

The graphs below show document hit ratio and response time traces averaged at 5 minute intervals. The red lines correspond to a single Polygraph client, while blue lines are the averages across all clients. When the variance is small, blue lines may be hard to see because averages are very close to individual traces.

The traces are plotted with the same scale to ease the comparison. Both warm-up (first 24 minutes) and measurement phases are shown.




4. Box Configurations

Here are the configuration details for tested products. Note that despite the same name and software version number, CF-June configuration differs a lot from CF-5000 tested in November. The caches tested in November are currently available for purchase.

Product and version tested Total price
(US$)
Unit list price
(US$)
Units RAM
(GB)
Cache
(GB)
Disks Switching gear NICs
(Mbps x n)
OS
CF-5000/2.1
June edition
(cluster)
191,990 79,995 2 8 237 14 Foundry ServerIron, FastIron 100 x 6 CacheOS
CF-5000/2.1
November ed
(unit)
179,990 150,000 1 4 243 27 Two Alteon-180 1000 x 2 CacheOS
CF-5000/2.1
November ed
(cluster)
805,985 150,000 5 20 1215 135 ArrowPoint CS-800 1000 x 10 CacheOS

All numbers are totals for the tested configuration unless noted otherwise. CF-5000 is a single CPU unit. Photographs of the configurations are available.

5. Vendor Comments

It is a Polyteam tradition to give vendors a chance to comment on the tests after they have seen the review draft. The comments below is a verbatim vendor submission. Polyteam has not verified any of the claims, promises, or speculations that these comments may contain.

In November 1999, CacheFlow engaged Polyteam to follow-up on the June bake-off with updated Polygraph testing on the CacheFlow 5000 Internet caching appliance. Our thanks go to Polyteam for working these tests into their schedule.

With the results documented in this report, the CacheFlow 5000 has assumed the Polygraph performance leadership position for single unit and cluster configurations. It is important to note that CacheFlow produces a full range of caching appliances (110, 500, 3000, 5000) for varying sets of requirements. All products are based on patent-pending CacheOS, the only purpose-built caching operating system in the industry, so all products will generate the high throughput and low response times validated in these tests.

In addition to outstanding performance, CacheFlow appliances running CacheOS deliver other key advantages, including content freshness, content control, ease of management, and high reliability. CacheFlow's technology recently received important recognition from Information Week and PC Magazine, who both selected the CacheFlow appliance as one of their "Best Products of 1999".

Please visit our Web site at www.cacheflow.com to learn more about our technology and successes with E-commerce, Enterprise, and Service Provider customers.

6. Polyteam Comments

The June and November performance results are especially interesting because they show the change in performance of the same line of products with time. Significant performance improvement of CF-5000 under DataComm-1 workload is clear. We however caution the reader from interpreting the improvement as just the result of benchmark- or workload-specific optimizations. While Polyteam cannot verify whether such modifications took place, earlier CacheFlow statements, including ``vendor comments'' for June tests suggest that the positive change in performance can be attributed to an overall improvement of the product. While June version of CF-5000 was essentially a pre-beta release, the November edition is available for purchase and has been working at many customer sites.

Web Polygraph benchmark often drives the development of caching products. While this keeps Polyteam in business, we understand that some benchmark-driven optimizations may not be important in real-world environments and, vice versa, some optimizations that are noticeable in real-world conditions may not be stressed by the benchmark. This problem is, of course, not specific to Polygraph. The only practical solution is to constantly improve and verify the workloads. The second IRCache bake-off and our new PolyMix-2 workload are steps in that direction.



$Id: index.sml,v 1.5 2000/01/10 17:03:48 rousskov Exp rousskov $