IS COPY DATA MANAGEMENT A REPLACEMENT FOR BACKUP? (EASY!)
I invite you to read up the article below I included the source and all of the article for your reference.
Data has deserved a new way of management and an entire overhaul for a long time now. Copy data management as a subject is one of those new ways. However it is still lacking a few pieces.
First, what is copy data management?
You have data, that data requires copies, these copies are for multiple purposes. One of those purposes is backup, redundancy, test and Dev, QA. Each of these types of copies of data usually have a specific requirement.
Let’s review some of these requirements as examples:
QA = quality assurance… Depending on the applications, clients require same IO performance when spinning up a copy for QA. Given that this is not a Functionality type of QA only.
Backup …. Clients ask that they need to be able to search the objects being protected and granularly restore that object.
Test and Dev = Testing and Development, same as QA, same IO Performance capability is required as response time on applications can come under test.
Redundancy = Replication, some tight RTOs and RPOs near zero are sometimes required. This is a live or replica copy for redundancy purpose and also requires same IO performance.
The graphic above was found in the actifio site a few months ago.
Let’s remember that copies also lie in the mobile devices, laptops and desktops.
Furthermore; as the article below mentions, be aware of copy data management limitations.
There are companies that have a LONG history in the market and do copy data management already but do not market themselves in this way. My suggestion is, get with the program CMOs!
Quick example: Netapp! And no, there is no PERFECT copy data management platform! No matter what anyone tells you, there is no perfection.
When you look at copy data management platforms ask the following questions and see if they meet your requirements.
- What is the shortest RTOs and RPOs?
- So you can address your toughest issues, if you can address your toughest issue by default you will able to address any less critical issue as far as RTO and RPOs are concerned.
- Is data being separated from live environment onto a protection environment?
- Best practice to protect data is to separate platforms
- For this critical application, how many copies of data will you eliminate?
- This copy data management platform, how many copies of data can it hold for this particular critical application?
- Considering other less critical applications and in combination with this critical app, How many copies of data can you hold in this copy data management platform?
My personal recommendation is, know your current storage platforms first. Do they take snapshots? Can it replicate snapshots to a different tier of storage on a separate controller? Can that controller or tier of storage give you the IO you require? Do you have enhanced snapshot management tools that quiescent the DB or application so you capture a restore point in an application consistent form? How many snapshots can you take?
You might find that you are already in possession of a copy data management capable technologies you just needed a lite help in identifying it!
Actifio was the first to talk about Copy Data Management as a subject then followed HDS.
– – – Start of article info – – – –
I found the following article published at: http://searchdatabackup.techtarget.com/answer/Is-copy-data-management-a-replacement-for-backup?utm_medium=EM&asrc=EM_ERU_42220875&utm_campaign=20150427_ERU%20Transmission%20for%2004/27/2015%20(UserUniverse:%201490131)firstname.lastname@example.org&utm_source=ERU&src=5383111
Is copy data management a replacement for backup?
Copy data management can be a backup replacement in certain situations, but organizations need to be aware of its limitations.
A number of vendors have copy data management offerings, and each has its own philosophy about how the technology should be used. That said, copy data management is more about storage efficiency than data protection.
Generally speaking, redundancy accounts for a large percentage of an organization’s storage cost. According to some estimates, the average enterprise creates eight to 10 copies of every production data source. These redundant copies are used for a wide variety of purposes, including dev/test, support, end-user training and reporting. If these estimates are true, a relatively modest 100 GB database could ultimately account for up to a terabyte of storage consumption. When you factor in the number of production data sources that exist within the average enterprise environment, you can begin to see how quickly redundant data can consume the available storage.
Copy data management allows everyone to work from a common copy of a data source. Rather than a development team making a full-blown copy of a production database, the copy data management software might instead usesnapshot technology to provide the development team with an isolated development environment that perfectly mimics the production environment. In other words, the dev team is using the production database, but in a way that protects the integrity of the production data.
Although copy data management could conceivably be used to create data recovery points, the software never creates a true backup copy of the data source. Any redundancy exists only at the storage level. With the proper level of redundancy, copy data management might be able to act as a backup replacement, but it is not a good solution for organizations that require an offline (tape) copy of their data.
This was first published in March 2015
– – – End of article info – – – –
Hope you enjoyed my comments and recommendations. I urge you to contact KIO Networks and explore how we can help you reduce copies of data in your environment indepandant from any technology sometimes, new practices can help you streamline a bunch!
Have a wonderful week.
Do you have a storage strategy? Are you getting the most of all your storage platforms? Are you underspending with substandard results or are you over spending with AOK results?
There are multiple forces driving you to a choice and to help you narrow down to the basic aspects here is a picture. In the center we have our users. Our users want it all, Reliability, Performance and Low Cost. Well, low cost, highly available and high performance storage doesn’t exist! You will have to choose a compromise between these key aspects. In a nutshell you have to choose how much of each aspect you get and there is no way to get all for low cost.
In this triangle we show 3 layers of technologies, Media, Access Type and Features on storage. In the media type as we change media we lower costs. In the Access layer we show some basic types you will recognize as NFS, SMB or CIFS, WebDav, S3, Block storage over FC or over iSCSI. Yes, there are many more as block over InfiniBand and more protocols as FTP, SFTP. In the storage feature layer we start to see levels of data protection and higher availability as local or remote snapshots, replication, offsite backups and storage clustering there are RAID levels and # of controllers that increase availability as well.
To dig deeper into the media type, lest look at the technologies invloved. For the highest performance we have Flash, SSD, then we lower performance with SAS, SATA and finally the lowest performance in Object storage. As we lower performance we also lower costs. These are just general examples, we all know that there are other type of media as Tape with LTFS that could lower costs further. You as a storage administrator would have to choose what to give your user given that you actually have it.
So, if you provide storage as a service then you most likely already have a compromise between all aspects, what you might call the sweet spot. However, as this is a sweet spot of all key aspects it is not responding the wide spectrum of needs. Some workloads require super high availability, replication, clustering 100% uptime while others require low cost via Object Storage and then there is the LOCAL Flash cards for high performance needs. After all most high performance and requirements of IO come from local workloads like TMP, SCRATH and SWAP (after all, this data does not require high durability as it is recreated on the fly as the OS and applications run)
What we need is a solution that does not force you into a compromise. Much easier said than done. Example below.
As you can gather from the graphic, via a single point of access you receive a slice across 3 aspects, Reliability, Performance and Cost. If you provide storage solutions you know this is very expensive and complex because you have to acquire, configure, manage all the technologies previously covered and you must then choose a level of performance with data protection and reliability and finally the cost is associated to each platform and configuration.
So, what is out in the world to help you with this? Well, take INFINIDAT, a centralized storage platform with higher levels of redundancy and good performance. Look at PureStorage an all SSD solution running in deduplicated mode with good density and TCO across a longer than usual period of time, you should also look at SolidFire as they can assign # of IOPs to the storage you provide helping you align performance to workload, then look at ExtremeIO from EMC great performance, Netapp for an all-inclusive feature and function of data storage and data protection by far the most mature in the market for combining storage and protection. Look at new players as Actifio, Rubrik for copy data management and the previous block storage virtualization engines as IBMs SVC or the Virtualization engine from HDS also OEM by HPE. There are many more technologies that I am leaving out. The list is HUGE! And your time could be consumed just in learning from each one of these.
Here at KIO Networks we do this for you! This helps us offer higher value and the service that is just right for you. We have expertise in a wide spectrum of technologies across key practices.
So, next time you look for storage on demand look at KIO Networks. In the spirit of help you learn something new I like to introduce a new subject that promises to help businesses align the right storage to the application at the right price.
There is storage player out in the market promising to address this complex challenge. Its called Primary_Data
What will data virtualization do for you?
Simplest term, allow you to move data heterogeneously across multiple platforms.
Hope you enjoy the set of videos I know I did.
The Launch of Primary_Data
Introduction to Primary_Data
What is data virtualization: I think the video is over complicated but good video.
Use cases for Data Virtualization
Primary_Data User Experience demo
Hope you enjoyed the videos, as with all cool technology, the devil in the details. From my side I will make sure to take a deep dive of the technology and report back to you guys.
Have a great week!
This article was posted by a great profesional in the data protection arena. Rolland Miller, his info can be found here: http://www.rubrik.com/blog/author/rolland-miller/
Rolland is currently investing his time at Rubrik I am sure they will absolutely make it big! Here is the original post location: http://www.rubrik.com/blog/unlimited-replication-with-rubrik-2-0/#.VdXN-vu9esk.linkedin
— Start of Snip Original Post —
Today we announced Rubrik 2.0, which is packed with exciting new features. I’ve been working in the storage industry for the past 16 years, the majority of time spent working on backup and DR solutions for companies. This isn’t my first rodeo–I’ve seen a lot, advised a lot of customers in how to architect their backup and disaster recovery infrastructure. Needless to say, I haven’t been this thrilled for a long time—our engineers are building something truly innovative that will simplify how recovery is done on-site or off-site for the better.
Why is Rubrik Converged Data Management 2.0 so interesting?
Our 2.0 release is anchored by Unlimited Replication. There are no limits to how many snapshot replicas you can have. There is zero impact on your production systems as replication occurs since this isn’t array-based replication. This is asynchronous, deduplicated, masterless, SLA driven replication that can be deployed any way you like, many-to-one, many-to-many, one-to-one, uni-directionally or bi-directionally. In the past, replication has always been engineered with a master-slave architecture in mind because systems have always had an active-passive view of control. Our Converged Data Management platform is fundamentally a distributed architecture that allows you to share nothing, but do everything—each node is a master of its domain. Our engineers apply the same building principles to replication. Hub and spoke? Check. Bi-directional? Check. Dual-hub, multi-spoke, and archived to the cloud. Check! Check! Check!A key property of Converged Data Management is instant data access. Data is immediately available, regardless of locality, for search and recovery. Using Rubrik for replication allows you to recover directly on the Rubrik appliance since applications can be mounted directly. Files can be found instantly with our Global Real-Time Search. There’s no need to copy files over to another storage system. We’ll give you near-zero RTO.
In this release, we extend our SLA policy engine concept into the realm of replication. You can define near-continuous data replication on a per-VM basis within the same place that backup policies are set. There’s no need to individually manage replication and backup jobs—instead, you’ve freed up your time by managing SLA policies instead of individual replication targets. Once you specify a few parameters, the engine automates schedule execution. For more on managing SLA policies, see Chris Wahl’s Part 1 and Part 2 posts.
No SLA policy is complete without measurement. In 2.0, we’re releasing beautifully simple reporting that helps you validate whether your backup snapshots are successful and whether they’re meeting the defined SLA policies. Our reporting will help you keep an eye on system capacity utilization, growth, and runway—so you’ll never be caught short-handed.
A New Addition to the Family
Finally, we’re welcoming the new r348, our smartest dense machine yet. We’re doubling the capacity within the same 2U form factor, while maintaining the fast, flash-optimized performance for all data operations, from ingest to archival.
Catch Us at VMworld
In less than two weeks, we’ll be at VMworld. Make sure to stop by our Booth 1045 to see a live demo. Arvind “Nitro” Nithrakashyap and Chris Wahl will be leading a breakout session on Wednesday, 9/2 at 10 am and giving away epic Battle of Hoth LEGO sets.
— End of Snip Original Post —
WOW! Let me quote Rolland, “There is zero impact on your production systems as replication occurs since this isn’t array-based replication. This is asynchronous, deduplicated, masterless, SLA driven replication that can be deployed any way you like, many-to-one, many-to-many, one-to-one, uni-directionally or bi-directionally.”
WOW! KICK ASS! Love it. If you are a techno geek like I am you will be as super excitied about this as I am.
Just imagine the ramification? This could be a true base platform for a service provider. Without limits you really could service a huge pool of clients and their needs. Obviously you still need size properly for your forecasted use and growth. Anyways, hopefully I can join you all at VMworld where I am sure Rubrik will WOW all of you!
Enjoy! @KIO Networks we love cool technology!
As a Senior Product Manager for Data at KIO Networks, I am always in search of technologies that will enable us to provide additional value around data protection and storage.
I have been looking at Rubrik for a year now and on a recent review with their Director of Presales this is what I understand. Hopefully you will find it useful.
As with any great building, you must have a strong foundation and in my personal opinion, Rubrik has solid technical founders, here are the initial key founders. Obviously everyone in the company like Rubrik is key. For a full list follow this link: http://www.rubrik.com/company/
Bipul Sinha (CEO)
Founding investor in Nutanix. Partner at Lightspeed (PernixData, Numerify, Bromium). IIT, Kharagpur (BTech), Wharton (MBA).
Arvind Nithrakashyap (Engineering)
Co-founder of Oracle Exadata and Principle Engineer Oracle Cluster. Lead real-time ad infra at RocketFuel. Silk Road trekker. IIT, Madras (BTech), Uni of Mass, Amherst (MS).
Arvind Jain (Engineering)
Google Distinguished Engineer. Founding engineer at Riverbed. Chief Architect at Akamai. Chipotle advocate. IIT, Delhi (BS). University of Washington (PhD Dropout).
These folks came from the Google, Facebook, Data Domain and VMware of the world. These are the key players in technology and services worldwide. Not bad right?
Put your money with your mouth is!
That’s what the initial investors did, Lightspeed Venture Partners, as well as industry luminaries John W. Thompson (Microsoft Chairman, Symantec Former CEO), Frank Slootman (ServiceNow CEO, Data Domain Former CEO) and Mark Leslie (Leslie Ventures, Veritas Founding CEO).
You can read more about their Series A funding at: http://www.rubrik.com/blog/press-release/rubrik-invents-time-machine-for-cloud-infrastructure-to-redefine-47-billion-data-management-market/
This initial funding was followed by a quick Series B of 41 Million. You can read about it: http://fortune.com/2015/05/26/rubrik-archive-data/
Now, in both funding blog references you will read key descriptors as time machine, archive, backup, etc. I think that these are either catchy names or well-known technology references. By the end of my post you will see that Rubrik is more than a catchy name and it should not be compared to previously known technologies. It should be known as a new technology that removes the need for the old.
What is Rubrik trying to change?
The fundamental change is to converge all backup software, deduplicated storage, catalog management, and data orchestration in a single software that’s packaged within an appliance. Rubrik seamlessly scales to manage any size data set from 10TB, 50TB to 1PB.
Expandability: So, what do we do with Archive, long retention data sets?
With an S3 tie in, data is sent over to this object storage.
If you are interested in lower your running costs you require a long term retention storage pool and when it comes to lowering cost there is nothing better than object storage.
When you look at your data protection operation as a question, what time frame do most of the restores come from? The answer is yesterday or within the last week.
In Primary storage we create tiers of storage with different characteristics, performance, reliability and cost. So, with our continued data growth and data protection platforms it makes sense to do the same! Tier your data protection vaults, storage pools to best align to your restore requirements and initial data copy (backup) requirements. Ingest quickly, restore quickly and for the unlike restore from a year ago wait a bit longer to pull data from the CLOUD or your own Object storage pool.
Easy of deployment: results in lower costs. In the case of Rubrik the promise is to do 50 minute or less from installation to backup of a VM. Now, I should mention that as of publicly available today, Rubrik is currently able to integrate with VMware environments. You have physical, HyperV, OpenStack and more? Never fear! Rubrik has plans to expand into other platforms. My recommendation as with any technology is, if you don’t find it anywhere else and you need it, reach out to the technology vendor with an opportunity and ask them to consider your priorities into their road map. Doesn’t hurt to ask! For me, my priorities are OpenStack integration and MS HyperV.
Now, going back to what Rubrik can do for your VMware environment. Granular restore from archive. Yes, the VM is protected and at the same time it is indexed.
After you have protected a VM in Rubrik, to restore a VM no data is moved! Instead, it is presented to VMware and you could BOOT the machine. You as a VMware admin will see this VM running in a data store. So, how will it perform? Well, how does 30,000 iops per appliance sound to you? Cool right? This number came from an IOMeter driven test running 4K blocks 50% read 50% write random load.
So, why is this so cool? Well, let’s look at the contrast, traditional backup is used to restore, meaning, you copy the data, then you copy the data back to its origin in case of data loss. In the Rubrik case, you copy the data, keep versions and each version is now usable in parallel of the original without the need to move the data back! Removing the painful restore window.
The idea is to do more with less, you don’t have to wait for data loss to boot up a VM. You could use this for Test and Dev workflows. Boot a new machine from Rubrik, test in parallel and then tear down instance.
Combine this functionality with replication and you could have your test and dev offshore.
A famous slogan from Data Domain comes to mind. Tape is dead, get over it. Yet in 2015 I kept hearing requirements for actual physical tape. However, I have been able to convert most of them over to Disk. I am sure you are hearing the same. Rubrik is going to make that conversation a much easier one. The tape out requirement in this case can go over to cloud and achieving off-site, long retention at low costs.
In my opinion Rubrik is fulfilling its promise to remove all traditional components of Backup appliance. However it is focused in VMware. I look forward in seeing Rubrik have the same impact in HyperV, OpenStack, Docker and expand its long retention or Tape out option over to google and Azure. It just makes sense!
If you are tech savvy, they have rest APIs and are built on html5, I challenge you to go build your own product and services around them!
My personal favorite aspects are: Rubrik can provide a DR solution with replication, it supports Bi Directional and Hub and Spoke. Imagine the architecture? 2 Rubriks, Site 1 and Site 2 cross replicating so that you have both primary data sets and the replica from your 2ndary site. What is the cherry on the top? All licenses is on Capacity to keep it simple. Yes, you won’t be surprised with an additional license required to get your flex capacitor working. There’s no additional cost for replication either – it’s included free of charge.
In a nutshell, Rubrik has a great technical foundation and its current focus on VMware is the best move for them as it will help them gain footprint in most environments. The features, functions, ease of use with the ability to be more than just backup is a big WIN for the consumer. The way I see it, it is a great start to a new wave of data protection. Rubrik can finally provide additional value to backup and reach RTOs and RPOs unnatural for other platforms.
If you are interested in my previous write up on Rubrik visit: https://tinyurl.com/yblscdgq or https://tinyurl.com/yb8y4fj5
Have a great day!
I remember when double parity was only provided by Netapp a long time ago. Netapp back then seemed to have the experience of concentrating more data and have probably been impacted by dual drive failures. It was called RAID-DP you can find its details here. http://www.netapp.com/us/communities/tech-ontap/tot-back-to-basics-raid-dp-1110-hk.aspx
Dual parity now a days is a basic raid offered by most vendors. As we continue to grow in disk sizes and amount of data, we concentrate much more important aspects of our business in these technology units (storage devices)
The issue with lesser parity in raid groups is the stress that is put onto the remaining drives in the raid group. During the stressful task of rebuilding a drive failure you hopefully do not lose another drive.
Now, think about this. All drives come with an MTBF a meantime between failures, when you ordered your storage it came with most of the disks you are using today. So, if the error rate is high enough that the your storage chooses to bring the drive down or drive actually dies, what is the likelihood that the other drives that came from the same batch behave the same way? On top of that you add stress to those same pre-existing drives? Obviously, there are a lot more variables to account for, but I think you get the point.
Needless to say, the industry needs more data protection. https://atg.netapp.com/wp-content/uploads/2012/12/RTP_Goel.pdf here is a URL for an paper by Atul Goel and Peter Corbett both are Netapp members. This algorithm will protect you from triple drive failure. I can’t wait to see it as an option in all storage arrays!
In the meantime, what can you do to protect from dual drive failures? Some folks say mirror mirror mirror and forget RAID 5 or 6. Yes, triple mirror. Sure, that’s an approach! The claim that triple mirror is cheaper? Depends on volume and what business you run. If you are looking into protecting from double failures and do not want to do RAID6 you might be better off just mirroring your data set from a volume on a RAID5 to another volume on a different RAID5. When you want to protect from dual disk failues the idea is to have more copies of data and that can be achieved in different ways.
There are also other blogs and articles I reviewed before I put this one together. Here they are for your review: http://www.tomshardware.com/reviews/ssd-reliability-failure-rate,2923-9.html http://www.techrepublic.com/blog/the-enterprise-cloud/raid-5-or-raid-6-which-should-you-select/
Above is a graph I put together to depict how as drive sizes continue to expand our time to recover also continues to expand. The time that it takes to recover is our time of risk. Now, with technologies such as deduplication and compression capable of storing 3X or in some cased 30X that of the original data size we concentrate more data (more risk) in the same storage foot print. Making recovery times more critical than ever before. Keep in mind that your clients might be implementing compression, deduplication at the application level and you unknowingly are assuming a higher risk mode of operations.
As a nutshell, Raid5 was effective protection for smaller drives and the impact on rebuild was not that long due to amount of data you could store. Now that data sets are larger and single drives are getting in the 10s of TBs per drive imagine your data loss risk! You should already be using RAID6 and when we start using these much larger drives we should start using triple mirror. I recommend you look at raid technologies at the chunk level so that all your drives are taking data and the raid chunks are across all drives as well. Examples, EMC XtremeIO XRD, Huawei Raid2.0, HP 3PAR chunklets based raid. Using these technologies will spread the rebuild load across much more disks lowering the burden per disk and increasing rebuild speeds.
Hope you found this informative, have a great day and weekend!
Julio Calderon, Global Senior Product Manager @KIO Networks