Is Public Cloud Storage Good Enough?

Customers who use the Amazon Web Services (AWS) Simple Storage Service (S3) experienced an outage that kept them from their data for four hours until AWS corrected the problem. The S3 outage took place several weeks ago in the AWS US-East-1 Region in Virginia, which is the oldest and largest AWS data center. Four hours may not seem like a long time except for AWS customers who depend on S3 to keep their data available.

S3 is a public storage cloud operated by AWS. You own the data, but AWS controls the infrastructure. You pay a fee to keep your data at AWS and to access your data. AWS is huge. Consider the scale at which AWS operates its storage service. S3 contains trillions of data objects stored in "bit barns" throughout the world. Customers use S3 for web content distribution, media file storage, backup data, shared data, log data, sensor data, and archived data. You can store any unstructured data you want in S3.

Amazon's S3 is one of the first services AWS offered when the company opened its cloud for business in October 2006. S3 is Object-Based Storage (OBS) technology accessed through the AWS S3 Application Programming Interface (API). The success of S3 in the marketplace validated the importance of Object-Based Storage. S3 put Object-Based Storage on the map, which created the market for S3-compliant storage. The S3 API is now the de facto standard for storing and accessing unstructured data in public and private clouds.

So when S3 sneezes, hundreds of thousands of AWS customers and millions of Internet users get colds. The S3 outage also broke the AWS status dashboard.  The company resorted to Twitter to provide status information to customers. Give AWS a failing grade for debugging a problem with the S3 billing system that shutdown S3 and caused other AWS S3 dependent services to stop functioning in the US-East-1 Region.

The good news is customers did not lose data but is S3 good enough? If AWS fails to keep data accessible, is it wise to keep putting all of your data in S3? Your response might be to rely less on AWS by using a competing storage service from Google or Microsoft. Alternatively, you might use a private, S3-compatible storage cloud to keep all or part of your data on premises or in a third-party data center.

It is important to understand that AWS is not your storage architect or storage administrator. AWS operates a public cloud storage service on a pay-as-you-go basis. How to deploy S3 to store and protect your data is up to you. Using "the cloud" as a convenient place to dump data does not relieve you of the responsibility to manage your data, which includes how to keep your data available during an outage.

The AWS S3 outage is an opportunity to rethink your data storage strategy. If you already have data storage and management applications that support S3, you have the choice of using them with an S3-compliant private cloud on your premises. You can keep all of your data in a private storage cloud, or you can migrate data from a private storage cloud to a public storage cloud based on what works best for your organization.

Here are a few things to keep in mind. Public cloud storage is not better than private cloud storage. Public cloud storage is not cheaper than private cloud storage. Object-Based Storage is not rocket science, and you do not need a storage administrator for every 500TB of data. Start with the storage capacity you need today. You can expand your private cloud when you require more storage. There is no rip-and-replace to increase your private cloud storage capacity. Object-Based Storage is something any organization can use even if you do not have petabytes of data to protect.

What is the value of AWS S3 to your organization? Well, the network effects of AWS are huge.  Hundreds of third-party vendors leverage S3 in their product or service including those who can provide you with S3-compliant storage on your premises.  The more people who use AWS S3, the more valuable it becomes to everyone.

If you want to build or buy your private storage cloud, then choose an Object-Based Storage vendor to work with who can do what you need. More than a dozen companies are in the OBS market. With some research, you can whittle that number down to the three most likely to fit your environment. Every vendor will tell you they are an industry-leading provider of cloud storage. They will promote their strengths and downplay their weaknesses. Ignore vendor appeals based on marketing and make your selection by putting them through a Proof-of-Concept based on a data storage project you are considering.

Remember that having S3-compliant storage is important because the S3 partner ecosystem is a valuable resource. Don’t forget that data is "sticky," and this has implications for its availability, durability, and cost. Armed with your Proof-of-Concept results and cost data, decide what mixture of public cloud storage, private cloud storage or private, hybrid cloud storage will best meet your needs for the long term.  Public cloud storage is good enough for many use cases. Just recognize under what circumstances it is an appropriate choice for your organization.

What is holding back private cloud storage?

Well, it would be hard to make a case for Object-Based Storage (OBS) if it wasn't lower in cost than traditional SAN and NAS storage, extremely scalable, and able to provide high levels of data durability.  Features like metadata search, storage policies, and support for legacy data access protocols like NFS, SMB, and FTP are available from most OBS vendors. To the list of good OBS features to have, you can add data deduplication, data compression, data encryption, and data tiering from private OBS to public OBS providers.

To help grow the market for private OBS, vendors are focusing on ease of deployment and management.  These consumability factors are important because an OBS cluster does not require weeks to make it operational nor does it require someone with years of experience as a storage administrator to manage it.  

OBS software vendors have partnered with storage hardware vendors like Dell, HPE, Lenovo, QCT, and Supermicro to offer fully supported OBS cluster deployments. Some OBS software vendors also provide private label storage appliances for a turnkey customer experience.

It appears that the only thing commercial OBS software vendors lack are lots of customers.  Why is it taking so long for organizations to embrace OBS as the new platform for storing their unstructured data?  The question is apropos because many organizations already entrust some of their unstructured data to public OBS providers like AWS, Google, and Microsoft.  So why are they hesitant to do it themselves?

The answer to the question is traditional data storage is conservative by nature.  It is traditional because it has been around a long time. Storage is more conservative than application development. When an application "crashes" people just re-launch it.  If the problem is bad enough, there will be an update or fix released for it.  If your traditional storage system fails to perform there is widespread panic in the organization because data becomes unavailable.  

Traditional storage systems are engineered to provide high levels of performance and durability, but this comes at a cost that is getting harder to justify in an era of continuous data growth. Scalability is the Achilles heel of traditional storage systems. Scaling up traditional storage is expensive. Expanding or upgrading traditional storage systems to keep pace with data growth is financially unsustainable. Traditional storage systems are no longer appropriate for meeting today's need for unstructured data storage.

So why haven't IT departments built or bought OBS clusters to solve their data storage pain points?  The answer is FUD, which stands for fear, uncertainty, and doubt.  OBS can make IT professionals fearful because it is relatively new, which creates uncertainty about using it, and doubt about the benefits of switching to it from traditional storage.

In addition to the FUD factor, the psychology of previous investments in traditional storage has habituated decision-making to what has worked in the past.  It is understandable, but as a practical matter the demands organizations are making on their data, and the growing amount of data needing storage will force the change from traditional storage to OBS.

OBS is a foundation technology for new storage architectures.  OBS is extremely scalable, uses RESTful APIs for application support, and provides high levels of data durability.  OBS also represents storage simplification as opposed to the complexity of managing traditional data storage systems.

New storage architectures require new knowledge and experience.  The lack of both is holding back OBS deployments in organizations. It is the reason commercial OBS vendors number their customers in the hundreds and not thousands. Commercial OBS vendors need to "bust a move" and get their software in the hands of people who need to know about it but are not yet planning OBS deployments in their organizations. 

Traditional storage was suited for a predictable and stable storage environment. The growth in unstructured data triggered a seismic shift in the requirements for data storage. There is a tsunami of data washing over many organizations.  Organizations can either drown in their increasing volume of unstructured data or learn to swim in it by deploying OBS solutions for smarter and more flexible data management.

Resisting OBS is not a viable data management strategy. But jumping into OBS deployment without some knowledge and experience is unwise.  Organizations need to learn to swim by starting in the shallow end of the OBS data pool.  Start small with a Proof-of-Concept (POC) that addresses one of your data storage pain points. Learn how to use OBS to solve a data storage problem. The experience and confidence gained from a successful OBS deployment will make it easy to identify other use cases for OBS in your organization.

There are hundreds of applications and data management solutions available in the OBS ecosystem. MonadCloud can work with you to leverage the OBS ecosystem so you can do smarter data management and meet your data storage requirements in a cost-efficient manner.  You can do this.  MonadCloud and Cloudian can help.

10 Things to Know About Public Cloud Storage

Public cloud storage providers tout their ease-of-use, low-cost, and virtually unlimited storage capacity. Here are 10 things you should consider when using a public cloud to store your data.

1. Parking your data is different than accessing your data. Those low prices per gigabyte (GB) per month for parking your data don't reflect the additional charges you incur when touching your data. Avoid unpleasant surprises in your monthly bill. Understand your storage provider's charges for accessing your data.

2. Ending your business relationship with a storage provider can be expensive and difficult. If you call it quits with your storage provider you will have to pay to get your data back. It will also be time-consuming to move tens or hundreds of terabytes (TB) of data using your Internet connection. Find out if your storage provider has a bulk download service so you don't spend the next year downloading your data.

 3. Anticipate that your storage provider could shutter their service on short notice. A storage provider who goes out-of-business could leave your data stranded in their storage cloud. Ask if the storage provider has an insurance policy or an escrow fund that will keep their storage service operating until customers move all their data. If not, then have a legal agreement with the storage provider that allows you or a designated third party to retrieve your data before they can shut down their storage servers.

4. Know where your data is stored in the public cloud. If you are required to know where your data is stored for regulatory compliance or governance reasons, make sure you are storing it in the proper geographic location. Even though the provider has physical control of your data, it is still your data, so pay attention to where you are keeping it.

5. Understand how storage providers protect data. Data stored in public clouds is not equally protected. Providers offer multiple ways to protect data. Choose appropriate data protection methods based on the types of data you plan to keep in the public cloud. The methods you choose can affect the storage cost and availability of your data.

6. Provision enough Internet bandwidth on your premises. You will most likely need a faster Internet connection when storing and accessing your data in the public cloud. Consider using two Internet connections from different Internet Service Providers to avoid service interruptions. The availability and cost of broadband Internet service can vary considerably based on your location.

7. Monitor your use of public cloud storage. Enumerate your cloud storage use cases. Identify which use cases deal with frequently accessed hot or warm data and which use cases deal with seldom accessed cold or archive data. Check the numbers for storing your hot, warm, cold, and archive data in the public cloud. Extrapolate what your monthly expense might be over a range of data growth rates for the next five years. It can be more cost-efficient to store your cold and archive data in your own private storage cloud while using the public cloud for your hot and warm data.

8. Create life-cycle management plans for your data. It is easy to fall prey to keeping too much data for too long on your primary storage systems (DAS, SAN, and NAS). Data gets created for many purposes, and a lot of it becomes cold rather quickly. Analyze the quantities and types of data you have on your primary storage systems. Reduce the management and storage costs for these systems by moving seldom used data to a public or private storage cloud where you can set policies on how long to retain this data.

9. Identify which data needs to be encrypted when stored in a public cloud. It is not necessary to encrypt all of your data, but you should encrypt data that contains intellectual property, personally identifiable information, medical records, and financial data. The responsibility for encryption key management and who does the encryption--you or the storage provider, must be clearly understood.

10. Compare the benefits and costs of public cloud and private cloud data storage. Using public cloud storage is not always the best or least expensive way to store data. You can achieve many of the same benefits attributed to public cloud storage with a private storage cloud on your premises. Do the comparisons and do the math in order to make data storage decisions that align with the long-term interests of your business or organization.

MonadCloud builds public, private and hybrid storage clouds using Cloudian HyperStore, which offers the highest degree of compliance with AWS S3, the de facto standard for cloud data storage.  S3 your data center with Cloudian, and run any data access and management solution available in the AWS S3 ecosystem...guaranteed.

 

 

 

What temperature is your data?

When it comes to data, most people don't equate it with having a temperature. In reality, data doesn't have a temperature that you can measure in degrees Fahrenheit or Celsius, but there are valid reasons to characterize your data by its temperature.

How do you characterize the temperature of data? An easy way to think about it is to classify data as being hot, warm or cold. Hot data is data that is frequently accessed. Warm data may have been hot at one time, but is less frequently accessed. Cold data is not accessed very often or not at all.

When thinking about data storage, data temperature is important because data in most organizations is created and stored on expensive primary storage like Direct Access Storage (DAS), Storage Area Networks (SAN) and Network Attached Storage (NAS). The issue with retaining all of your hot, warm and cold data on primary storage systems is that it increases your management costs and storage costs due to the constant demand for additional storage capacity.

The challenge of using primary storage systems to provide storage capacity for all of your hot, warm and cold data is exacerbated by declining budgets for data storage. IT departments are being tasked to store more data with reduced budgets and fewer people. A cost-efficient solution is needed for dealing with the year-over-year increase in demand for data storage.

So how can the temperature of your data help solve this problem?

Statistically speaking, the older data becomes, the less frequently it is likely to be accessed and the colder it becomes. As your data becomes warm and less frequently accessed or cold and hardly ever accessed, it doesn't belong on primary storage systems.

Warm and cold data belongs on a secondary storage system designed to be "cheap and deep" in terms of its cost and capacity. The solution is to automate the movement of warm and cold data from your primary storage systems to a less costly secondary storage system.

In order to do this, the change in data temperature needs to be monitored over time. This can be done because data files have associated metadata that indicates when it was created and when it was last accessed. File metadata can be used by a storage management application or service to take specific actions on data files depending on how old and cold they are or when they were last accessed.

Why bother storing cold data at all? Can't it just be deleted?

It can, but there are circumstances where deleting cold data is not lawful or advisable. Regulatory compliance, internal governance policies and legal e-discovery demands can require that particular types of data be retained for certain lengths of time before it can be defensibly deleted. The uncertainty about what data to keep and what data to delete has created a "save everything" policy in many organizations.

Saving everything can be financially sustainable, if you deploy a less costly secondary storage system to work in conjunction with your primary storage systems. A private storage cloud running object-based storage software on commodity storage hardware is a less costly secondary storage system. It is ideal for storing your warm and cold data.

The benefits of using a private storage cloud to implement smart data management in your organization or business include:

  • Avoiding the cost of expanding or replacing primary storage systems to accommodate the growth in data storage, which improves the ROI of your primary storage systems
  • Avoiding over-provisioning of new primary data storage systems, which wastes money and results in increased maintenance and support costs over time
  • Reducing the management costs of storing warm and cold data by having a single storage administrator manage up to 10 petabytes of private cloud storage. The same storage administrator can typically manage about 350 terabytes of primary storage (10 petabytes equal 10,000 terabytes)
  • Providing ready access to your warm and cold data when it is needed without having to restore anything from a tape or waiting hours or days to make it accessible.
  • Reducing the time required to backup your data because a private storage cloud doesn't use traditional backup methods to protect your warm and cold data

A private storage cloud is an integral part of a smart data storage and management solution that can take the "temperature" of your data and move warm and cold data so it is stored in a highly durable and cost-efficient manner.

Storing all your data in a public cloud is a mistake

One of the tenets of public cloud storage has been how cheap it is to utilize. After all, a few pennies per gigabyte per month sounds affordable, so why not make use of it?

True, the initial cost of storing data with public cloud providers like AWS, Google or Microsoft is low, and it is easy to get started. That said, few organizations consider the long-term implications of keeping significant amounts of their data in a public cloud.

So what is there to worry about?

Using the public cloud for data storage is like having a basement you never have to clean out. You can just keep putting more data in the public cloud but it never fills up. What started as a few terabytes stored in the public cloud can grow to tens and then hundreds of terabytes over a few years. Over time, that pennies-per-gigabyte monthly charge for storing your data in the public cloud becomes a significant monthly expense. The reality is you will have this monthly expense for as long as you park your data in a public cloud.

How can you prevent this from happening?

Knowing what data to keep in a public cloud and what data to store in your own private cloud can help you avoid accumulating the wrong types of data in the public cloud.

What type of data is best stored in a public cloud?

Hot or transactional data is the type of data you want to keep in the public cloud, assuming your applications are also running there. Data that is warm, cold or archival is the type of data you want to keep on premises in a private storage cloud.

Your hot or transactional data is what enables your organization or business to function, and it needs to be stored where your applications are running.  Your warm, cold or archive data can be stored in a more versatile and cost-efficient manner using a private storage cloud.

Doesn't this run counter to what everyone thinks about storing their warm, cold and archive data in the public cloud? Yes it does. If you adopt the conventional wisdom and use the public cloud for all your data storage, by the time you have several hundred terabytes in the public cloud, you will understand the flaw in this approach. You will have placed a large amount of static data in a public cloud built for running applications that use your hot or transactional data.

Public cloud storage providers know that data storage is "sticky" and you are not going to touch your warm, cold or archive data very often. So they are happy to keep charging you for every month you keep it in their storage cloud. And when you need to touch your data you will incur additional charges depending on how much of your data you touch during the month.

The solution is to build your own private storage cloud on premises to avoid having your warm, cold and archive data held hostage in the public cloud. If you haven't already gone down the road to using public cloud storage for all of your data, this should give you something to think about. However, if you have a non-trivial amount of data stored in a public cloud, you should make plans to get it back and keep it in your own private storage cloud. You will have to pay a price to do it, but you will avoid a monthly expense that never ends.

Let's ballpark the cost of storing 300 terabytes of data in the public cloud compared to storing 300 terabytes of data in a private cloud over five years.

Using AWS S3 (Simple Storage Service) standard storage with a reasonable amount of activity, you could expect to be charged $600,000 in operating costs over the course of five years, or $10,000 per month. Using a private AWS S3-compliant storage cloud, you could expect capital and operating expenses of $270,000 over the course of five years, or $4,500 per month. Which check would you prefer to write each month?

OK, a private storage cloud looks good financially, but won't building a private storage cloud be too complicated for most organizations? Not necessarily, because you can start with a handful of commodity storage servers plus software, and scale your storage as you need the capacity. You can also make use of storage appliances, which can be deployed to quickly build your private storage cloud.

The name for this storage architecture is Software-Defined Storage. The benefits of using it to create your own private storage cloud include:

  • No need to buy storage capacity that you won't be using for years
  • No vendor-driven obsolescence that requires time-consuming data migrations
  • No proprietary firmware or hardware needed in the storage servers
  • No need to buy all your storage servers from the same vendor
  • No need for each storage server to have the same amount of storage capacity

The benefits of having your own private storage cloud are real, and it can mean substantial savings on data storage costs in your organization or business. MonadCloud is ready to work with you to build a private storage cloud that meets your requirements for affordable and scalable data storage.

Replicate or Erasure Code?

When using object-based storage in a private or hybrid cloud, you usually have a choice between using replication or erasure coding to protect your data. Replication creates multiple copies to protect you data objects.  Erasure coding splits your data objects into fragments and adds “parity” fragments for data protection. Whether to replicate or erasure code--that’s the question.

Replication typically defaults to making three copies or replicas of the object being stored, which will require 200 percent more storage space than used by the original object before replication.  For example, if the size of the object is 100MB, then having 3 replicas to protect the object would require a total of 300MB of storage, or 200MB more than the original size of the 100MB object. In object-based storage there is no concept of an “original” copy of the object. All of the copies are the same and they are all called replicas of the object.

Erasure coding, which is derived from Reed-Solomon Error Correcting Codes, typically requires just 20 to 50 percent more storage space than used by the original object before erasure coding it.  For example, if the size of the object is 100MB, using 4+2 erasure coding would split the object into 4 data fragments, and calculate 2 parity fragments, for a total of 6 fragments.  The availability of any 4 out of the 6 fragments (data or parity) is sufficient to retrieve the object. In this example, erasure coding adds 50 percent in storage overhead compared to the 200 percent storage overhead required when using 3x replication.  

So, how do you decide between using object replication or erasure coding to protect your data?  Well, a “rule of thumb” is to use replication for data objects that are “warmer” or likely to be accessed, and to use erasure coding for data objects that are “colder” or not likely to be accessed.  However, some types of data objects, like image file backups, are more likely to be erasure coded, and whether they are considered “warm” or “cold” may not matter.

The size of your data objects can give you an indication whether to replicate or erasure code them.  Small data objects, say less than 1MB in size, may not be good candidates for erasure coding because it requires additional computation on the nodes in the storage cluster.  If you have lots of very small data objects, it will take more compute resources in the storage cluster to erasure code them, and more time to read them compared to replicating them.

Another consideration is whether your data objects will be “dispersed” over more than one physical location.  An object-based storage cluster can “disperse” replicated objects over multiple physical locations.  This can be very useful in positioning data objects close to where they will be needed or to provide a remote data protection location.

If your Internet bandwidth between physical locations is not particularly fast, then using erasure coding to disperse fragments of data objects to multiple physical locations may result in poor read performance.  It also increases the chance of not being able to access your data if one of the physical locations becomes inaccessible.  That said, there is a solution to this problem.  You can erasure code data objects in one physical location, and replicate the erasure coded data objects to another physical location.  The result is replicated, erasure coded objects that are stored in different physical locations.

Also, think about the amount of data you need to protect because it could add up to some interesting numbers in terms of the cluster size, and it might have direct bearing on whether you choose replication or erasure coding to protect your data. For example, if you have 1PB of data to protect, and use 3x replication, you will need 3PB of storage capacity.  The same 1PB of data protected by 4+2 erasure coding would need 1.5PB of storage.  Remember that you can use replication for some “buckets” and erasure coding for other “buckets” which gives you some flexibility in how you choose to protect your data and the amount of physical storage that will be required to do it.

Are all these replication and erasure coding features common to every object-based storage software vendor?  The answer is replication and erasure coding are supported by most object storage software vendors.  The exceptions are several vendors who only erasure code data.  That said, you still need to investigate how each vendor implements these data protection schemes and how they are managed.

If you want to make short work of your investigation, consider that the above mentioned replication and erasure coding features are available today in Cloudian’s HyperStore 5.2 appliances and software. As a Cloudian Preferred Partner, MonadCloud is ready to work with you to design and build a Cloudian-powered private storage cluster that meets your requirements for data protection and capacity storage.

Why Hybrid Cloud Storage?

One of the definitions of the word “hybrid” is a thing made from two different elements.  Hybrid cloud storage is a thing made from two different types of cloud storage...local or private cloud storage and remote or public cloud storage.  So, just how do you combine the two to get this thing called hybrid cloud storage?

Well, first you need a local or private storage cloud that has the ability to tier data to a remote or public storage cloud based on rules or policies.  Second, the data that is tiered to a remote or public storage cloud should be retrieved automatically when it needs to be accessed.  Third, the operation of the hybrid storage cloud should be transparent to users, and easily managed.

OK, why would you want you own hybrid storage cloud?  Well, you may be under legal requirements to keep some of your data directly under your control within certain geographic boundaries. Your internal data governance policy may include mitigating the risk of keeping all your data in a public storage cloud. You may not have the budget or Internet bandwidth to keep pace with the growth and expense of storing your data in a public storage cloud.

So, how does having your own hybrid storage cloud address these issues?  If your data must be stored within certain geographic boundaries, then having your own hybrid storage cloud allows you to keep your “restricted” data local, and directly under your control.  That said, your “unrestricted” data can either be kept local or tiered to a remote or public storage cloud, which could be appropriate for storing your “colder” or archive data.

If you need to avoid the risk of keeping all your data in a public storage cloud, then having your own hybrid storage cloud allows you to keep “warmer” or more frequently accessed data local, and tier it to a remote or public storage cloud as it becomes “colder” or is no longer being accessed.  Data tiered to a public storage cloud can also be compressed, which will lower the cost of keeping it there for the long term.

Aside from the risk of keeping all of your unstructured data in a public storage cloud, there is also the expense of keeping it there, and the associated cost of the bandwidth you need to access it.  Data tends to be “sticky” and “fork-lifting” it from one place to another usually isn’t done.  While the cost of public cloud storage is very low, “touching” your data does increase the cost of keeping it there.

Vendors of private storage clouds typically charge license fees based on how much data you are storing and annual maintenance and/or support fees, but you are not charged for touching your data.  This gives you the choice to keep your more important data local, and tier data that is not being used to public cloud storage. Doing this saves you the capital cost and operational expense of “infinitely” expanding the size of your local or private storage cloud, and leaves you in control of where you want to store your data.

Hybrid cloud storage has capabilities you don’t get when using only private cloud storage or public cloud storage.  Having the ability to tier data to a remote or a public cloud storage service, like AWS S3 or Glacier, gives you flexibility in determining where your data is stored, and more granular control over the cost of storing your data.

So who can do hybrid cloud storage at scale?  The answer is Cloudian, which understands that hybrid cloud storage is essential for anyone building a private storage cloud.  MonadCloud is a Cloudian Preferred Partner and ready to work with you on designing and building you own hybrid storage cloud.

10TB for Starters

The size of public cloud data storage providers and the number of "objects" they are storing are quite large. For example, the Amazon Web Services (AWS) Simple Storage Service (S3) stores in excess of 2 trillion objects, but it did not get that large all at once. The growth in S3 storage has accelerated since the service was launched in 2006.  Today, AWS S3 is the de facto standard for cloud data storage with an "ecosystem" of 350+ compatible applications. So, if you want your own AWS S3-compatible private storage cloud, how little can you start with?  The answer is 10TB.

When designing and building "capacity" storage for a private storage cloud, you have to start somewhere. Having a requirement for at least 10TB of data storage is the smallest amount you should consider. And just what types of data can you store in your own private storage cloud? The answer is, you can store a wide range of "unstructured" data in your private storage cloud.

Okay, what is "unstructured" data?  Unstructured data includes document files, worksheet files, presentation files, audio files, video files, archive files, backup files, snapshots of VMs, scanned image files, web content files, log files, diagnostic files, sensor files, and machine-generated files. That last category, machine-generated files, is interesting because machines are now generating more data than people.  All of this "unstructured" data needs to be stored, and it is growing 10x to 50x faster than the "structured" data contained in your database systems.

In reality, coming up with 10TB of data in your organization should be easy to do once you locate where all of your unstructured data currently resides on your primary storage devices. Primary storage devices include Direct Attached Storage (DAS), Storage Area Networks (SAN), and Network Attached Storage (NAS). You may also have unstructured data parked on external USB disk drives scattered throughout your organization. The bad news is the discovery process for locating all your unstructured data might take some time to complete. The good news is you can start moving your unstructured data to your private storage cloud as you locate it.

As you examine your "silos" of unstructured data, you will typically find multiple copies of the same file. Finding multiple copies of the same file in different locations is not unusual, but do you really need to keep multiple copies of the same file? The answer is no, because your private storage cloud provides a very robust level of data durability. You can also choose to "de-duplicate" your data files before they are moved to your private storage cloud.

You may wonder why put all of your unstructured data in a private storage cloud? The answer is, so you can manage it, make better use of it, and spend less on storing it. It is more expensive to manage and provide access to unstructured data stored in multiple silos on your primary data storage systems. Having your own MonadCloud private storage cloud, powered by Cloudian, allows you to create a large "pool" of capacity storage that you can more easily manage and access using a wide variety of applications that are AWS S3-compatible.

With your own private storage cloud you do not need to guess how much storage your are going to need in the future. You can add additional storage nodes to your cluster as you need them. There is no over-spending on storage capacity you may not need for several years, and no need--ever, to engage in time-consuming "data migration weekends" when you need to add new storage nodes or replace older ones. 

Your MonadCloud private storage cloud, powered by Cloudian, delivers continuous up-time access to your data, and is a proven way to simplify the management and use of your unstructured data.  And 10TB of data is all you need to get started.

Why Local Cloud Storage Makes Sense

When people use public data storage, they usually don't know where the provider's "bit barns" are located.  Not knowing may be OK for storing photos of your cat, but your perspective may change a bit when it concerns storing your business data.  Why?  Well, business data generally has a higher economic value than photos of your cat, and depending on the type of data, there may be a plethora of security, governance and compliance requirements that need to be met when storing that data.

Not to single-out anyone, but a survey by Perspecsys of 125 attendees at the recent RSA security conference, revealed that fifty-seven percent of the respondents "don't have a complete picture of where their sensitive data is stored."  Forty-eight percent said they "don't have a lot of faith in their cloud providers to protect their data."  Only sixteen percent actually knew where all of their "sensitive structured data" is stored.  And just seven percent knew the location of all of their "sensitive unstructured data."

The latter two categories are interesting to note.  Structured data is usually contained in database systems where there are presumed to be adequate access controls.  Structured data typically represents less than twenty percent of the data stored in any organization.  Unstructured data includes emails, media files, photos, presentations, office-type documents, log files and other machine-generated data.  Unstructured data represents the bulk of data in any organization and it is growing 10-to-50 times faster than structured data.

Poor understanding of the risks, lack of planning and a certain amount of ignorance regarding the practices of public cloud storage providers could create a really bad scenario for some organizations.  So, how do you avoid the downside while being able to utilize the benefits of public cloud data storage?  The answer is to cloud local, except when you decide it is appropriate to use public cloud storage.

Cloud local sounds like an oxymoron.  After all, the global public storage providers can offer very low-cost storage by leveraging their tremendous economics of scale.  So how can SMB or enterprise organizations build their own private cloud storage in an economically efficient manner?  The answer is to start with appropriately designed storage clusters from MonadCloud.

Organizations who already use public cloud storage may not realize they can operate their own private cloud storage that can scale from tens of terabytes to hundreds of petabytes on a pay-as-you-grow basis.  The software to build private storage clouds has been available for a few years from vendors who specialize in "object-based storage," which is a type of cloud storage offered by Amazon Web Services (AWS).  Object-based storage has many advantages.  It can easily scale-out as more storage is needed.  It is simpler to manage than file and block storage like NAS and SAN.  And it is cost efficient, which means you don't have to buy more than you need to get started.

So, does this mean you must choose between public or private cloud storage?  No, because MonadCloud clusters are true "hybrid" storage clouds.  A "hybrid" storage cluster allows you to keep your data local until you decide to tier some of it to a public cloud storage provider.  Your local data can also be encrypted and compressed before it is tiered, which provides greater security and economy when storing data in a public cloud.

MonadCloud storage clusters, powered by Cloudian, are true "hybrid" storage clouds. Cloudian has placed a bet that Amazon Web Services (AWS) will remain the dominant public cloud storage provider.  And with over 2 trillion objects stored, the AWS Simple Storage Service (S3) is the de facto standard for storing data in the public cloud.

When you "cloud local" with your MonadCloud storage cluster, you reap the benefits of operating your own low-cost, private storage cloud that is fully compatible with AWS S3 and Glacier.  This gives you the ability to choose from among the hundreds of applications that support AWS S3 and Glacier.  You also have the ability to utilize S3, Glacier and other S3-compatible storage as low-cost remote storage tiers for data that can be stored in the public cloud.  Local control of your data, plus the "hybrid" option to use public cloud storage can be yours with MonadCloud clusters powered by Cloudian.

Why Cloudian?

MonadCloud is all about building object-based storage for its customers.  But from the customer's perspective, object storage is a means to a solve their data storage growth and management problems.  So while MonadCloud sells object storage clusters as a business, we are really selling what you can do with it to solve your data storage growth and management problems.  Solution selling is a practice that is well understood by vendors and their channel partners.  Choosing the right partner to work with was an important decision for MonadCloud.

MonadCloud's message is "storage simplification" can help solve your data storage growth and management problems.  MonadCloud understands that Network Attached Storage (NAS) and Storage Area Networks (SAN) are complex and need skilled management. They are also expensive and must be replaced or upgraded every 3 to 5 years as vendors bring out the new and "end of life" their old storage systems. More importantly, NAS (file) and SAN (block) storage systems were not designed to accommodate the explosive growth of "unstructured data" being generated by users and machines.  If you think your data storage situation is bad now, wait (actually don't wait) until the whole Internet of Things (IoT) catches on.

When MonadCloud was researching storage software vendors to partner with, it became clear that some vendors focused on the storage needs of certain niche markets or "verticals" with their products. Cloudian set itself apart from these vendors in several important ways. First, Cloudian's software supports storage providers, small and medium businesses (SMB) and enterprise customers. This broad market approach comes from Cloudian's heritage developing carrier grade storage software to support mobile data services.  Second, Cloudian storage clusters can start on a small scale...tens of TBs and growto hundreds of PBs. This allows Cloudian to meet the needs of service providers, SMB and enterprise customers. Third, Cloudian offers an appliance based cluster "building block" which is manufactured for them and supported by them. This is the classic "turnkey" approach to making it easy to deploy and support storage clusters in the SMB and enterprise markets . Fourth, Cloudian is laser-focused on fullcompatibility with the Amazon Web Services (AWS) Simple Storage Service (S3), which is the de facto standard for cloud storage. AWS has the largest third-party "ecosystem" for cloud storage-related solutions. Anything that works with AWS S3 will work with Cloudian.  Cloudian is also unique in its ability to automatically tier data from a Cloudian powered MonadCloud storage cluster to AWS S3 and Glacier or another S3-compatible storage service.

MonadCloud chose to partner with Cloudian because it has developed a software-based cloud storage system that simplifies data storage, addresses a broad range of use cases from backup to big data. Cloudian understands the importance of AWS compatibility and is committed to delivering its Cloudian HyperStore software and appliances through its Cloudian authorized and preferred partners.  MonadCloud is proud to be designated as a Cloudian Preferred Partner...the first in New England.

Are you guilty of Digital Hoarding?

Now that hoarding has become fodder for reality television programming, it might be appropriate to address “digital” hoarding and how MonadCloud can help you cope with the avalanche of data accumulating around you in your organization.

It wasn’t too long ago that having a terabyte (1000 gigabytes) of data in a business was only something that large corporations accumulated. Today, laptops and PCs can be purchased with a terabyte disk drive installed. With so much storage available, just what are people doing with it?

The answer is they are using their storage devices to accumulate “unstructured” data. Unstructured data for most individuals includes emails, photos, videos, and music files. Organizations also accumulate unstructured data in the form of emails, media files, scanned documents, images, test data, log files, backup data and office files like text documents, worksheets, drawings and presentations. Unstructured refers to the fact that the data files are not “structured” or contained in a database. They are just files sitting on spinning disks and there are a lot of them.

Currently, unstructured data accounts for 80 percent of the data that resides within an organization. The estimated growth rate for unstructured data is 10-to-50 times that of structured data. With that kind of growth rate something has got to give. But wait a second, what about all of those cheap terabyte disk drives? Yes, a bare disk drive is relatively cheap, but everything else that goes along with managing your data is not. Disk drives are one part of a system to store and manage data.

Historically, the more sophisticated and expandable the storage system, the more it costs. IT staff must be trained and experienced in the management of things like disk drive arrays (RAID), Storage Area Networks (SANs) or Network Attached Storage (NAS) devices. Primary data storage systems require floor space and must be supplied with power and located in a suitable environment. And every 3-to-5 years they will need to be upgraded and eventually replaced at a non-trivial cost. Yes, disk drives are relatively cheap but storage systems and storage networks are not.

So how does MonadCloud offer relief from the treadmill of primary data storage expansion needed to contain all of this growth in unstructured data? First, your MonadCloud's storage cluster simplifies storage of your unstructured data.  Object based storage is simpler to use and manage than legacy block and file storage.  Second, your MonadCloud cluster has one of the lowest Total Cost of Ownership in the storage industry. Low means around $0.01/GB per month.  Third, your MonadCloud cluster is “infinitely” expandable.  You don't have to estimate how much storage you will need over the next 3-to-5 years.  Just add additional storage servers when you need them in order to meet your requirements.  MonadCloud storage clusters can be expanded from tens of terabytes to hundreds of petabytes of storage and from a handful of storage servers to hundreds of storage servers in a single cluster.

MonadCloud works with its customers to design, build, maintain, monitor and support their clusters.  You can locate your MonadCloud cluster on your premises or in a secure co-location facility, which has adequate power, air-handling, fire-suppression and emergency power generation. Your connection to your MonadCloud storage cluster can be over your internal network or over the Internet.

So why else would you consider deploying a MonadCloud cluster for your data storage? What are some additional reasons?  Your MonadCloud storage cluster is a good place to keep copies of your PC and server backups or snapshots as part of a disaster recovery plan. Cloud storage is good place to “park” data you don’t need very often but may need to retrieve at random, which is something tape backup is not particularly good at. Cloud storage reduces your need to keep expanding your primary data storage systems. There are estimates that an organization only accesses 15 percent of the data it is storing on a regular basis. So, why continue to fill up your expensive primary storage systems with infrequently accessed data when you can securely and reliably store it on your MonadCloud cluster for less? Cloud storage is also a good place to archive data you may need to keep for compliance or governance reasons.  

Cloud storage has evolved to become business-class storage that can help you cope with the rapid growth of unstructured data in your organization.  MonadCloud can work with you to design and build a storage cluster that best meets your requirements.

 

Backup or Archive?

When you consider using the cloud for data storage there is a meaningful distinction between backing up data and archiving data that can help you determine the type of cloud storage that will work best for you.

Everyone understands the importance of backing up their data.  After all, backups are all that stand between having your data and losing it due to mistakes, hardware failures and acts of nature.   Your backup method might be a simple utility program or a full-featured application designed to backup data files to tape or image disk partitions to a local disk drive.

When you make data backups you should be aware of the 3-2-1 Backup Principle, which states that you need to make 3 copies of your data; store 2 of the copies on different storage devices and keep 1 copy off-site.  Sounds like good advice and a MonadCloud storage cluster can help you implement the 3-2-1 Backup Principle.  How?  By default a MonadCloud cluster writes 3 copies of your backup data.  If your logical cluster resides in two different locations, then 2 copies can be in one location and the third copy can be in the other location, which would be physically remote to the other 2 copies.

One problem with data backups is they tend to become progressively less useful over time due to changes being made to your data.  One solution to the problem is to implement a backup media rotation scheme to reduce the amount of backup media required, while providing the desired retention time for your data.

Another problem with data backups is managing the backup media so that you have what you think will be needed on your premises.  People expect to use their backups when they encounter a problem like a missing file, so having the right backup available to restore is important.    To help you here, MonadCloud’s storage clusters support a number of backup software products,  including: Arcserve, Ctera, CommVault, CloudBerry Lab, PHD Virtual and Veeam.

Archiving data is different than backing up data because archiving means keeping data around for a long time.  Archiving is used to retain your data for specific periods of time or “forever” in certain cases.   For example, archiving financial records for five-to- seven years is typical for organizations that file tax returns.  However, patient diagnostic data and test results may need to be archived by healthcare organizations for the lifetime of the patient.

Within an organization, internal data governance policies and outside regulatory requirements may dictate categories of data, like emails, accounting data, project data and employee records be retained for specific periods of time.  Archived data will usually not be updated, but it may need to be referenced some time in the future, so it must be stored in a very durable manner.

The problem with archiving data is the amount of effort and expense required to select, store and retrieve the data.  Fortunately, there is “information lifecycle management” or “enterprise data management” software available to help you do this in a policy-driven, automated manner.

MonadCloud's storage clusters function as storage for your data backups, as well as any “unstructured” data that you don’t want to keep on your primary data storage systems. MonadCloud’s storage clusters can function as the “target” for your backup software, including software designed to image or “snapshot” your virtual and physical servers.

MonadCloud’s storage clusters are also suitable for archival purposes and meet the cost, performance and durability requirements for storing archive data.  Your archived data can be reliably stored on MonadCloud’s storage clusters for long-term retention and rapid access.

Depending on your requirements, MonadCloud’s storage clusters can also tier your archive data to AWS S3 or Glacier or any S3-compatible storage service that is remote to your MonadCloud cluster.  The difference between backup and archive is a distinction worth remembering.

 

 

Why MonadCloud?

My initial thinking and writing about creating a regional cloud storage provider was undertaken in response to the build-out of a fiber-optic network in southwestern New Hampshire called New Hampshire FastRoads.

I began a conversation with a colleague and friend who works for a local economic development corporation, which created New Hampshire FastRoads.  In our conversations we discussed various cloud services that could make use of the New Hampshire FastRoads fiber-optic network.  Eventually we focused on storage as something we thought would be a good, regionally salable, cloud service to provide.  

MonadCloud was conceived as a New England region provider of scalable cloud storage that is compatible with the Amazon Web Services (AWS) Simple Storage Service (S3).  I've written more about why AWS S3 compatibility is a key requirement, but suffice to say that S3 represents a de facto standard for creating object-based storage architectures.  Object based storage is different than legacy storage systems based on file and block storage protocols.  MonadCloud uses object based storage software running on a cluster of storage servers.  Object based storage is capable of storing Petabytes of data...up to hundreds of them.  To give you a sense of how large a Petabyte is consider that it takes 1000 Gigabytes to equal 1 Terabyte and it takes 1000 Terabytes to equal 1 Petabyte.  I think you get the idea.  Object based storage is massively-scalable.

So, if you can already do this kind of data storage with AWS S3, why would anyone want to store their data on a MonadCloud cluster? Good question and one that we thought about before moving forward with our project. The reasons are customers using a MonadCloud cluster can leverage recently built fiber-optic networks in New England to deliver high-speed, low latency network connections to their employees and/or customers.  Another MonadCloud advantage is, as they say in real estate, location, location, location.  Because MonadCloud clusters are local, the people using them will always know where their data is being stored and who is storing it.  Deploying your own MonadCloud cluster won't put AWS S3 out of business, but it does give you the benefit of keeping your data closer than say Virginia or Oregon.

Finally, there is growing awareness of the importance of creating a regional economy by developing and supporting local and/or regional producers and providers of goods and services.  We have a lot of talented and entrepreneurial business people in New England.  Being able to build local data storage clusters for customers is something we can do by ourselves and for ourselves.  MonadCloud can help strengthen the regional economy by keeping the data we depend on closer at hand. MonadCloud can design, build, monitor, maintain and support data storage clusters for its customers on their premises or in a service providers colocation site.  Our motto at MonadCloud is cloud local. MondCloud is ready to introduce New England customers to the next big thing in data storage technology that reduces the cost of storage and increases the durability of their data.

 

 

 

 

 

 

 

 

 

 

The Importance of AWS S3 Compatibility

Well, when you are small and your competitor is large it is useful to tap into what has made them successful and use it to your advantage.  When Amazon Web Services (AWS) created their Simple Storage Service (S3) for storing data objects in a scale-out storage server architecture, they became very successful.  Their Application Programming Interface or API became a de facto industry standard for storing data in the cloud.  By learning how to implement the S3 API, hundreds of third-parties were able to develop software and hardware that works with AWS S3.  By extension, if you could implement a storage service that was compatible with AWS S3, then the same third-parties whose software and hardware works with AWS S3 would also work with your AWS S3-compatible storage service.

When you hear that something is compatible with something else, it doesn't necessarily mean that the two are compatible in every way.  And so it is with the AWS S3 API.  A number of object based storage providers claim AWS S3-compatibility, but it might not extend beyond a basic or moderate level of functionality.  Some providers of object based storage software are more S3 compatible than others.

In choosing a suitable object based storage software vendor to partner with, MonadCloud chose Cloudian because it was committed to full AWS S3-compatibility. By selecting Cloudian, MonadCloud's customers will be able to choose from hundreds of hardware and software data management solutions that work with AWS S3.

Is AWS S3 the only cloud data storage standard out there?  No, but it is the most widely supported in the market by third parties and customers. MonadCloud expects emergent standards, like SNIA's Cloud Data Management Interface (CDMI), to be adopted and implemented by more object based storage software providers over the next few years.  But for the foreseeable future, AWS S3 will continue to dominate the public cloud storage market.  Everyone loves standards in information technology, which is probably why we have so many of them.