Computerworld

Cache Panache Cuts WAN Costs

FRAMINGHAM (04/10/2000) - Effectively placed caching hardware can help you deliver fresh Web pages to end users, improve the efficiency of your Web site, cut WAN access costs and even shore up security against outside hacker attacks.

ISPs and carriers have been quick to cash in on the technology's back-haul savings, but new freshness algorithms, which allow cache devices to better anticipate which Web objects to store, are making caching more attractive to businesses as well.

Here's the basic idea: Storing Web data close to end users allows you to conserve WAN bandwidth and save money, because it's faster and cheaper to retrieve objects from the edge of your network than from the bowels of the Internet.

Reverse caching, or placing a cache between your Web servers and visitors to your site, can pump up Web server performance while adding Web-site traffic surge protection. To gain the most benefit from caching technology, you can combine the two types of caches, installing the devices at e-commerce Web server farms, in remote offices, at your corporate headquarters and in front of intranet Web servers.

While a caching device can work with a firewall, it also serves as its own security layer by limiting outside access to internal corporate network resources. Since all requests for Web pages go through the cache, and then are forwarded by the cache if the page can't be found, the cache provides another layer of protection against outside hackers who want to gain access to internal IP addresses.

There are many variables in determining which type of cache deployment is best for you. Some factors you need to consider are deployment price vs. raw performance speed, hit rate vs. cache location, the types of freshness algorithms employed and placement of the cache with respect to end users.

To analyze cache deployment prices against raw performance speeds, see our review table of the products tested in IRCache's bake-off. IRCache combined total price, including cost of installation, with raw performance speed to come up with a figure that represents how much performance $1,000 can buy.

Multimedia features

Of course, there are other features to look at besides the price/performance.

For example, Network Appliance Inc.'s NetCache is a high-end device targeted at ISPs, carriers and large enterprise customers. It has a feature that allows you to distribute, as opposed to store, Apple's QuickTime, Microsoft's Windows Media or RealNetworks' G2 multimedia formats. The cache acts as a transparent proxy, allowing live content to be split and shared by nearby end-user clients.

Moving the multimedia distribution to the network edge can drastically reduce origin server load and WAN bandwidth consumption.

InfoLibria's DynaCache, like NetCache, is also targeted at ISPs and enterprise networks. DynaCache's failover pass-through allows your network to avoid traffic interruption when the power is lost - its electromechanical bridge lets traffic bypass the cache. But configuring DynaCache presents a steep learning curve.

Conversely, Cobalt Networks' CacheRaQ 2 offers a jump-start into caching for little money and expertise. CacheRaQ 2 offers an LCD and three buttons that enable administrators to completely configure the unit with IP information and have it running in less than 15 minutes. The Web-based interface can be used to monitor the size of the cache, providing information about overall performance, transmit and receipt rates, and caching efficiency.

Also targeted at small and midsize companies is Quantex Microsystems' WebXL, a Novell ICS engine cache appliance designed for rapid deployment in an ISP or enterprise environment. WebXL is in its first release, so technical support is still spotty. WebXL features installation wizards with easy menus to simplify the process.

If you're looking for variety, CacheFlow offers products that range from the entry-level CacheFlow 110 to top-of-the-line CacheFlow 5220. Each device offers an optimized operating system called CacheOS. CacheOS compares each URL and indexes Web objects according to type and similarity in order to gain fast retrieval - for example, storing all sports-related Web objects in the same area on the hard drive.

CacheOS also identifies and fetches popular content in advance so that it will be fresh when the next request arrives. Called adaptive refresh algorithm, the feature monitors the frequency of use and change for each stored Web object.

Using a heuristic, it then predicts when the object will become stale and proactively refreshes the object before the next hit is likely to happen.

If you're really concerned about price and are willing to invest time and skills, there's NLANR Squid, a free open source cache product. Squid requires considerable technical expertise to configure, run and maintain.

Cache comparisons

Once you've scoped out available products, you can evaluate further, focusing on hit rate and freshness algorithms.

First consider the hit rate, or the percentage of times a cache serves up a request, compared with passing it on to the origin server. Much of the hyperbole in vendor advertising is about hit rate, but here's the kicker:

Caches used on the edge of a carrier or ISP network, where users access millions of pages of data from the Internet, rarely see hit rates above 50 percent. However, if the cache sits in front of a company's Web site where server information and the number of data pages is finite and easy to predict, hit rates of 80 percent or higher are possible.

Fresh or stale

Besides hit rate, you should consider the freshness feature of the caching hardware. Because caches store objects that change over time, the devices must determine the freshness factor of each object and replace outdated objects as they change. Vendors differentiate cache devices with freshness algorithms that automatically update a Web object. Algorithms can define refreshness time, send get-modified requests, predict object life expectancy and perform active caching.

An algorithm can allow you to lengthen the cache refreshness time to limit the number of times the cache has to search for the origin server.

Another algorithm can send a "get if modified" request to the origin server.

Each time an object is requested by a client browser, the cache checks with the origin server and only makes a change when needed. Another algorithm lets administrators set controls by estimating the life expectancy of each object based on the time elapsed since the object was last modified.

Cashing in on location

Once you get past freshness and hit rate, it's all about location. As a rule, always store frequently accessed content as close as possible to the user.

For large organizations, caching at the network edge reduces WAN traffic by 25 percent to 30 percent. Besides decreasing costs, it enhances response rates and improves the quality of service to users accessing the Web site.

In fact, companies can deploy proxy caches,in which each browser is manually pointed tothe cache, and transparent caches, in which a switch diverts HTTP requests to the caching device in tandem across the Internet and intranet to save money.

High-speed remote access to people working at home requires caching at the ISP point of presence due to the slowness of the public switched network. ISPs use transparent caches so they don't have to reconfigure each browser, and there's money saved on back-haul connection costs for each Web object retrieved from the cache.

Caches can also be a cost-efficient alternative to replication at mirror sites.

The quality of service dramatically improves for remote or international offices when a cache server is placed at each site. Long distances for back-hauled traffic, multiple router hops and points of congestion are speed killers. Putting an accelerator in a far-flung office negates the latency issues and keeps down access costs.

Proxy caches are particularly useful on enterprise intranets. They act as firewalls to shield servers against Internet attacks. A proxy also can act as policy enforcer for a company's Web access, because browsers are configured to send requests for Web objects directly to the proxy cache.

Additionally, reverse proxy caching in front of your Web server farm can decrease the number of origin servers needed. It also acts as a buffer to protect Web servers against spikes in traffic and guarantees fast Web object display.

Cache management

Like any device, caches need to be installed and managed. Only two vendors, Cobalt and Quantex, offer caching hardware that installs in 15 minutes flat.

While Squid and some other products are free, the real costs are setup and management, in addition to learning more about Linux and Unix. Also consider each product's management capabilities. Notable features include Management Information Base II support, e-mail or pager event notification, and the ability to remotely monitor and configure the cache device.

To ease administration, Cobalt Network's CacheRaQ 2 offers browser-based and command line interfaces, so the device is simple to configure and easy to manage. CacheFlow's and Quantex Microsystems' devices provide an additional resource by employing a Java-based graphical user interface that links to indexed online documentation. This FAQ index offers technical advice for increasing hit rates and improving cache freshness. Because a cache must be tuned for each network, administrators benefit from this available management and statistic reference for optimizing the cache setup.

On the management side, InfoLibria's DynaCache offers improved capability - not only does it log an event, but an administrator can set the cache to automatically invoke a bypass mode or completely shut down the cache.

Cluster Management

For managing cache clusters, Network Appliance's NetCache provides cluster management tools. Cache clusters can be managed in virtual and logical configurations.

Managing a cluster requires administrators to designate a master cache along with slave caches in which information gets checked against a battery of stored Web objects. Administrators designate settings for keeping a set number of object copies, storing like information contiguously, in addition to setting cluster table updates.

With good price/performance, hit rate, algorithms and a strategy to locate caches, an object saved to cache could be money earned.

Clegg is founder and principal of Beacon Strategies, a Mountain Green, Utah company that specializes in competitive analysis for the Internet backbone and telecommunications industry. He can be reached at sclegg@beaconstrategies.com.