The Enterprise Data Lake

The definition of a Data Lake

The term Data Lake has only been around for 10 Years and represents the collection of data organized by user-designed patterns. Organizations have outgrown their individual data repositories based on business functions and are in demand to create a unified view of the data. This new approach enables users with the ability to explore and discover insights of the data across structured and unstructured data. A data lake can help enable users with self-service data exploration with the right tools in hand across the data pipeline.

The Value of Creating a Data Lake

The value of creating a modern Data Lake comes from the evolving creation of data that is used to provide organizations with more insights into their business and uncover new opportunities that will lead to increase revenue and reduce cost if executed properly. New Technologies require skill sets that were not been in existence 5 years ago and are not taught in Universities fast enough due to the ever changing landscape of those emerging new concepts. That provides challenges for organizations to maintain appropriate skills and find specifics use cases that yield the highest value (RoI). One of the initial use cases for creating a data lake is off loading data from existing more expensive data repositories.

Enterprise Data Warehouse Off-Load

With the arrival of new exciting unstructured and semi structured data sources (Facebook, Twitter, Machine Sensor data…) a lot of companies are looking at inexpensive data platforms for storage to reduce processing time and cost. Once the initial data is loaded and organized in Hadoop, individual data products can be delivered to end users via traditional BI platforms for further visual analytics.

Infrequent used data which typically has been stored in main frames is moving from the old warehouses into Hadoop. The current warehouse will only process query ready structured data and provide better performance to the end user. Hadoop Storage cost is much lower that typical storage and therefore, the EDW off load use case is the most common project where companies start their Big Data journey.

Be aware to not let your Data Lake turn into a Data Swamp

Not all data is worth saving. While storing data in Hadoop can offset huge licensing cost for the Enterprise, without having a quantifiable use case in mind that is supported by your Sr Leader, every lake will be treated as a Science Project and it will eventually turn into a data swamp.

A successful Big Data initiative in any company in 2017 will have a business sponsor that is supporting the initiative financially and also willing to bridge the gap between IT and Business. Hadoop is still much too complex to be digested by a General Business Analyst. The limited non-technical access to the data lake makes it even more complex to empower the business user to get meaningful insight from the data via self-service analytics tools native on Hadoop with a response time measured in seconds.

System Soft’s Reference Architecture

Through our experience implementing data lakes we have sifted through hundreds of open-source Big Data products to assemble a reference architecture that is pre-integrated, pre-tested, and engineered completely end-to-end for immediate time-to-value in new implementations.

To accommodate a variety of customer needs we have included multiple storage formats to include both structured and unstructured data, leveraging tools such as Hortonworks Data Platform, Cloudera Data Hub and Elastic Search.

Open Source, Open System

System Soft’s reference architecture is built end-to-end on open source products with zero license fees. In addition, all analytics created by System Soft are available for customers to inspect, expand or enhance.

Not every component is required for every implementation. If your business scenario does not require both structured and unstructured data storage, you don’t need to implement both. Already have an enterprise standard BI tool? Already have Cloudera or Hortonworks? No problem, System Soft can accommodate virtually any existing infrastructure you might have.

How System Soft Can Help You

System Soft Technologies is a full-service, turnkey solution provider. We excel at implementing projects from start to finish, beginning with strategy, design, development, implementation and support. Furthermore, we also have many customers leveraging specific discreet services, such as:

  • Big Data Strategy
  • Data Science
  • Data Engineering
  • Data Migration
  • Data Warehousing
  • Master Data Management
  • Data Governance
  • System Support and Management

Support and Management

As a full-service solution provider we excel at ongoing support and maintenance. We will be your single point of contact for supporting the entire solution stack. We also provide upgrade services and migrations for changing technologies in this fast-paced technology market.

Critical Thinking. Collaboration. Success.

Copyright ©2017 System Soft Technologies. All rights reserved.