Big Data Meets Data Fabric and Multi-cloud – Forbes blog by John Webster

By , Sunday, January 14th 2018

Tags: avere, Big Data, blog, cloud, data fabric, elastifile, Forbes, John Webster, Mapr, Microsoft, multi cloud, nasuni, NetApp, syncsort,

As cloud computing progresses forward, an ages-old problem for IT is resurfacing—how to integrate and secure data stores that are disbursed geographically and across a growing diversity of applications. Data silos, which have always limited an organizations ability to extract value from all of its data, become even more isolated. Consider the mainframe, with its stores of critical data going back decades, as the original data silo. Mainframe users still want to leverage this data for other applications such as AI but must overcome accessibility and formatting barriers in order to do so. It’s a task made easier by software vendors like Syncsort who’s Ironstream moves data from the ‘frame in a way that is easily digestible by Splunk applications for example.

But as cloud computing progresses, the siloed data issue becomes even more apparent as IT executives try to broaden their reach to encompass and leverage their organization’s data stored in multiple public clouds (AWS, Azure, GCP) along with all that they store on site. The cloud-world solution to this problem is what is now becoming known as the Data Fabric.

Data Fabrics—essentially information networks implemented on a grand scale across physical and virtual boundaries—focus on the data aspect of cloud computing as the unifying factor. To conceive of distributed, multi-cloud computing only in terms of infrastructure would miss a fundamental aspect of all computing technology – data. Data is integral and must be woven into multi-cloud computing architectures. The concept that integrates data with distributed and cloud-based computing is the Data Fabric.

The reason for going the way of the Data Fabric, at least on a conceptual basis, is to break down the data siloes that are inherent in isolated computing clusters and on-and off-premises clouds. Fabrics allow data to flow and be shared by applications running in both private and public cloud data centers. They move data to where it’s needed at any given point in time. In the context of IoT for example, they enable analytics to be performed in real time on data being generated by geographically disbursed sensors “on the edge.”

Not surprisingly, the Data Fabric opportunity presents fertile ground for storage vendors. NetApp introduced their conceptual version of it four years ago and have been instantiating various aspects of it ever since. More recently, a number of Cloud Storage Services vendors have put forward their Data Fabric interpretations that are generally based on global file systems. These include Elastifile, Nasuni, and Avere—recently acquired by Microsoft.

Entries are also coming from other unexpected sources. One is from the ever-evolving Big Data space. MapR’s Converge-X Data Fabric is an exabyte scale, globally distributed data store for managing files, objects, and containers across multiple edge, on-premises, and public cloud environments.

At least two-thirds of all large enterprise IT organizations now see hybrid clouds—which are really multi-clouds—as their long-term IT future. Operating in this multi-cloud world requires new data management processes and policies and most of all, a new data architecture. Enterprise IT will be increasingly called upon to assimilate a widening range of applications directed toward mobile users, data practitioners and outside business partners. In 2018, a growing and increasingly diverse list of vendors will offer their interpretations of the Data Fabric.