Prognosys Biosciences

Zmanda protects biotechnology data

Colossal amounts of molecular data are critical inventory for Prognosys Biosciences, a biotechnology company based in La Jolla, CA. The data, which is collected from DNA sequencers, resides on a RAID storage server that has approximately 20TB of usable storage space.

“The data is critically important to the company, and it needs to be protected from equipment failure and other risks,” says Dr. Mike Thompson, Ph.D., Prognosys.

The human genome has three billion base pairs. The sequence of bases determines differences between people, and the information about these differences is used to improve medicine and to understand the effect of specific drugs. Prognosys uses a sequencer called the Illumina Genome Analyzer II, which generates results for discoveries in genomics, epigenomics, gene expression analysis, and protein-nucleic acid interactions. Each run of the sequencing instrument results in almost a terabyte of data. Once collected, the raw data doesn’t change. Dynamic data results from calculations performed on the raw data.

Open source backup moves in
Recently, as Prognosys ramped up both internal projects and its sequencing services operation, it became clear that a solid backup and recovery system was needed. Proprietary backup and recovery software products and also Zmanda’s open source software were researched for backup of both unchanging raw sequencing data and dynamic data derived from computational analysis.

The deciding factors in choosing Zmanda’s Amanda Enterprise Backup Server software and Solaris Client included: price (Amanda Enterprise Backup Server is 80 percent less than proprietary software); the ability to encrypt data on the client or server; accessibility of open source code; and the promise of customer support.

“We compress the data at around 70 percent before it’s written to tape. If anything happens to the Zmanda software down the road and we aren’t able to use it, we can use regular gzip to uncompress the data,” says Thompson. “Zmanda remotely installed the software on the backup server and storage server, tested it, and made sure we were comfortable with the product. The interaction we had with them for the price was unbeatable.”

During the installation, Zmanda’s technicians helped Prognosys overcome some network challenges in addition to enabling the company to backup raw data sets more efficiently. The Zmanda technician helped Prognosys increase the bandwidth between the backup server and the storage server so that Prognosys could more effectively harness the power of its smart scheduling capabilities, which keeps network traffic low while machines collect data.

“The static backup of raw data sets was new to Zmanda,” says Thompson. “They wrote a script to automate their backup and were not only knowledgeable in what they were used to doing. Zmanda’s engineers went out of their way to develop a solution that worked for us.”

Zmanda tames static and dynamic data
The Prognosys Biosciences network consists of Linux, Solaris, and Mac OS X servers and clients. Scientific instruments connect to Windows clients.

Computation occurs on a Linux server, which analyzes data from the Illumina Genome Analyzer II and aligns sequence data against human and other genomes. The Genome Analyzer II generates 50 million 36-base reads per run.

A Dell PowerEdge 2950 serves as backup server. The 2950 has quad-core Xeon processors, 1.5TB capacity, and Red Hat Enterprise Linux 5. It connects to a Sun Fire X4500 RAID storage server running Solaris, which backs up the data daily and dumps it when full to a Dell PowerVault ML600 tape library.

The Dell library has 36 slots for 800GB LTO tapes. Dynamic data populates four tapes that recycle in a four-week rotation. Other tapes are dedicated to the backup of raw data. After Prognosys finishes a run with the Illumina Genome Analyzer II, it writes the data to tapes. Backup tapes are sent to offsite secure storage.

The average weekly backup size for raw datasets is one terabyte and the average daily backup size for dynamic data is around 500GB. The average time for weekly full backups to tape is seven hours. Incremental backups of dynamic data to disk average 2.5 hours.

Prognosys Biosciences leverages Zmanda for peace of mind
Since implementing Zmanda’s software, Prognosys has been able to scale up operations without risking data loss. The Amanda Enterprise solution allows Prognosys to grow its environment and continue adding clients as needed. In the near future, Prognosys plans to add more compute capacity and possibly more storage servers.

Prognosys has backed up approximately 20TB since the implementation. Thompson has also tested the restoration. Once the data is dumped to tape, a restore tab is selected in Amanda Enterprise. Amanda Enterprise holds a database that contains information on all of the directories and files that have been backed up. The user requests restoration from a specific time period and Amanda Enterprise tells the user which tape to load. Server load has improved as has the bandwidth between the server and client.

“Another thing we really like about the Zmanda backup software is that it’s a Web application,” says Thompson. “We can monitor the progress of the backup from any machine on the network.”

In addition, the Web application interface allows for backups to be administered by non-technical staff. This is a plus for companies without dedicated storage administrators.

For companies that process massive amounts of data, robust and dependable backup and recovery are required for success. Companies from a wide variety of vertical markets are successfully protecting their data assets with a nimble open source solution rather than over-engineered, expensive proprietary software.

