Data Lakes Capabilities and Features
Introduction
In this age of data driven decision making, it is most important to keep a tab of every single piece of information. This information is from disparate sources and has a huge variety, volume and velocity. In this context, a data repository that allows the storage of massive volumes of data along with analytical capabilities has become the need of the hour.
In this post, let us try and understand a little more about Data Lakes along with some of their features.
What is a Data Lake?
According to Gartner, “Data lakes are enterprise-wide data management platforms for analysing disparate sources of data in its native format” – Nick Heudecker.
A data lake is a humongous repository that allows us to store, manage and analyse huge volumes of structured and unstructured data.
Here are some of the salient features of a Data Lake:
- Most cost effective way to store huge volumes of structured and unstructured data
- Huge volumes of data can be stored easily without worrying about scaling issues
- Provides Predictive Analytical capabilities to generate meaningful insights from the huge volumes of coherent data
- Allows us to store different formats of data without worrying much about structure or schema
- Allows Schema-less Write and Schema based read at the time of extraction
- Varied formats of data like call logs, emails, social media posts, XML files etc. can be stored in one single location
- Gives us the power to generate next to real time analytics
Data Lake vs. Data Warehouse
Often, people consider data lakes to be an extension of the traditional enterprise data warehouses. However, there are many structural differences between them. Here is a quick comparison between a data warehouse and a data lake:
|
Data Lake |
Data warehouse |
Data |
Stores all formats of data including structured, unstructured and semi-structured |
Allows storage of well-structured and processed data |
Storage |
Cost effective storage solution for huge volumes of data |
Data warehouses turn out to be an expensive option for huge volumes |
Schema |
Schema -on -read |
Schema – on –write |
Agility |
Configurable and reconfigurable as and when needed. |
Fixed configuration |
Analytics Support |
Offers excellent support for Predictive Analytics |
Limited Analytical support |
Data Format |
No data format is required. Stores data in its native format |
Data is modelled before ingestion |
Accessibility |
No standard way to access the data. Highly accessible and available application developers and database administrators |
Data accessible only with the help of SQL and BI tools |