Data Lakes Capabilities and Features

Data Lakes Capabilities and Features

Introduction

In this age of data driven decision making, it is most important to keep a tab of every single piece of information. This information is from disparate sources and has a huge variety, volume and velocity. In this context, a data repository that allows the storage of massive volumes of data along with analytical capabilities has become the need of the hour.

In this post, let us try and understand a little more about Data Lakes along with some of their features.

What is a Data Lake?

According to Gartner, “Data lakes are enterprise-wide data management platforms for analysing disparate sources of data in its native format” – Nick Heudecker.

A data lake is a humongous repository that allows us to store, manage and analyse huge volumes of structured and unstructured data. 

 




Here are some of the salient features of a Data Lake:

  1. Most cost effective way to store huge volumes of structured and unstructured data
  2. Huge volumes of data can be stored easily without worrying about scaling issues
  3. Provides Predictive Analytical capabilities to generate meaningful insights from the huge volumes of coherent data
  4. Allows us to store different formats of data without worrying much about structure or schema
  5. Allows Schema-less Write and Schema based read at the time of extraction
  6. Varied formats of data like call logs, emails, social media posts, XML files etc. can be stored in one single location
  7. Gives us the power to generate next to real time analytics

Data Lake vs. Data Warehouse

Often, people consider data lakes to be an extension of the traditional enterprise data warehouses. However, there are many structural differences between them. Here is a quick comparison between a data warehouse and a data lake:

 

Data Lake

Data warehouse

Data

Stores all formats of data including structured, unstructured and semi-structured

Allows storage of well-structured and processed data

Storage

Cost effective storage solution for huge volumes of data

Data warehouses turn out to be an expensive option for huge volumes

Schema

Schema -on -read 

Schema – on –write

Agility

Configurable and reconfigurable as and when needed. 

Fixed configuration

Analytics Support

Offers excellent support for Predictive Analytics

Limited Analytical support

Data Format

No data format is required. Stores data in its native format

Data is modelled before ingestion

Accessibility

No standard way to access the data. Highly accessible and available  application developers and database administrators

Data accessible only with the help of  SQL and BI tools