Computerworld

Copying grows along with data, driving attempts to rein it in

There are reasons to duplicate data, but it can get out of hand, analysts say

All that new data flowing into enterprises can bring along an expensive partner: multiple copies.

For better or worse, many types of data are copied multiple times for multiple purposes, including backup, archiving and development work, according to IDC analysts Ashish Nadkarni and Laura DuBois. And once created, those extra bits usually hang around.

"We're really a pack-rat society," DuBois said. "Deletion of data is not really a concept that's frequently practiced."

Bryon Bua, vice president and enterprise technology manager at Admirals Bank in Providence, Rhode Island, has seen that problem firsthand. Before the bank adopted a copy management system throughout its infrastructure, individual departments increased the storage load without really thinking about it.

"The applications guys, the development group, they would make copies of servers," Bua said. "They needed to do this kind of thing for their own testing, but they would never get rid of it, it would be just laying around, using extra space."

Actifio, a startup that specializes in copy data management, has helped draw attention to the issue with products designed to control duplication of data across all enterprise systems. Other vendors have different approaches to handling copies, and the market for technology in this area may soon grow broader.

In a white paper released this week, sponsored by Actifio, IDC attempted to scope out the size of the problem. Its findings are based partly on an international survey of just over 700 enterprise IT managers. Among the survey's findings, a majority of respondents said that less than 25 percent of their storage spending currently goes toward copies of data. But when they looked toward the next 12 months, that figure flopped, with a majority expecting to put 25 percent or more toward copies.

Saving copies of data isn't inherently wasteful, and each type of enterprise has its own requirements for holding onto information, Nadkami said. But in many cases, data is copied by different applications and storage platforms, and by employees within their own departments, without any central oversight, he said. This can produce unneeded copies that take up costly storage capacity.

Another analyst, Jason Buffington, of Enterprise Strategy Group, holds a similar view.

"Every copy that gets created, it's typically created for the right reasons," Buffington said. But taken together, those good deeds can add up to a huge amount of data that's not needed. Meanwhile, two of the top three areas of capital investment in storage relate to copies and not primary data, Buffington said. "If you can be smarter on how you do this, then that reduces one of the largest capex investments in IT."

"I don't think we have the data yet to say what is too much," IDC's Nadkami said. But the results of the survey indicated respondents made between 10 and 120 copies of some of their data, he said. "Some people go to the extreme." The key is for enterprises to manage their copies and copying procedures with an eye to the whole organization's storage costs and efficiency, according to IDC.

Actifio was an early entrant to this space, introducing its object-based distributed file system in 2010, but bigger names are starting to take notice, Nadkami said. Last year, Hitachi Data Systems introduced a data-instance manager to administer data copies across its own storage platforms, and it's likely other major storage players will step in, he said.

There are already a variety of products for making data protection more efficient, such as NetApp's FlexClone software and CommVault's Simpana system. Data deduplication, which accounts for identical data so only new bits are copied, is used in many storage platforms.

Actifio's technology, which is available in appliances for internal use and indirectly through service providers, is designed to let enterprises use the same copy of a given piece of data for many different purposes. When an employee or an application requests data that's been backed up, Actifio can make a virtual copy of it without duplicating the entire chunk of data and putting that on primary storage, said Ash Ashutosh, Actifio's CEO.

The system isn't limited to creating just that one copy, Ashutosh said. Customers can prescribe any number of copies of a given piece of data, including extra copies on tape or in remote locations, he said. IT administrators can use Actifio to set specific service levels for each type of data. The system is compatible with any vendor's storage equipment and any operating system or virtualization platform, though not on mainframes, he said. To make copies, it works directly with applications rather than with primary storage platforms, Ashutosh said.

EMC, the dominant provider of purpose-built backup appliances, thinks old backup practices have been the main culprit in producing excess copy data. Newer backup systems have largely solved the problem just by using deduplication, said Rob Emsley, senior director of product marketing for EMC's Backup Recovery Systems Division. EMC's own backup systems work with other vendors' platforms for primary storage, Emsley said.

Though deduplication can save some storage capacity, EMC's approach perpetuates the inefficiencies of conventional backup, said Andrew Gilman, Actifio's senior director of marketing. Because it creates copies in backup that need to be shifted to primary storage when needed, EMC's technology drives the need for more storage capacity, he said.

Some users of Actifio say it has cut down on the share of their storage capacity that's dedicated to copy data. That's important to them because the amount of underlying data from which those copies are made continues to grow at a rapid pace.

Bua, at Admirals Bank, said the portion of his storage capacity that's devoted to copy data has fallen from more than 30 percent to less than 5 percent since the bank adopted Actifio almost a year ago. Yet Bua forecasts his company will add about 24TB of capacity this year just as it has in the past few years.

"We love data. We collect data like crazy," Bua said. Because it specializes in commercial real estate, Admirals Bank constantly collects data about buildings in its region, including ones it might want to acquire, he said.

Instituting a new data copying regimen can take some IT effort, according to Robert Reeder, CIO at Rezolve Group, a college financial-aid services company. Rezolve implemented copy management with Actifio at the same time it switched from direct-attached storage to a SAN (storage area network), and employees supported both efforts "philosophically," Reeder said.

"But we said, in doing that, 'it also means you can't just prolifically add multiple copies,' and [defined] the procedure. We just had to be responsive" when employees needed the new technology set up or explained, Reeder said.

"It's mostly people and process, more than it's technology," ESG's Buffington said. However, introducing a copy data management system can be the catalyst for better practices in an enterprise, he said. "This, I think, is a case of, 'If you build it, they will come.'"

Stephen Lawson covers mobile, storage and networking technologies for The IDG News Service. Follow Stephen on Twitter at @sdlawsonmedia. Stephen's e-mail address is stephen_lawson@idg.com