Guest blog post by Kumar Chinnakali
In our last blog we saw the key benefits of Data Lake, but let’s deep dive in to the internals of a Data Lake via discussing the key considerations and compositions.
Take in any solution considerations it is practical difficult to arrives with a one-size-fit-all architecture; hence it applies for a Data Lake too. Hence the Data Lake architecture considerations is totally depends on the below factors,
- Data Ingestion – Real-time, micro-batch, macro-batch, and batch
- Storage Layer – Raw, and structured
- Structured Data Storage - SQL, Key-Value, document, Columnar, and graph
- Metadata Management in Data Lake
- Data Governance
- Data Search
- Data Access – Internal, and External
- Data Insights
Data Lake architecture composed of three layers and three tiers. Where layers are horizontal functionality cut across all the tiers which is Vertical functionality.
The three layers are,
- Information Lifecycle Management Layer
- It ensures that there are rules governing what we can and cannot store
- Over period of time the value of data tends to decrease but risk associated with that storage increase
- Metadata Layer
- It captures vital information about the data. Basically it’s all data about data.
- It is the foundational to make data more accessible and to extract value from Data Lake.
- Metadata Layer helps to have the following Patterns and Trends, Identifications, Data Lineage, Stewardship, Data Versioning, Entity and Attributes, Distributions, and Quality.
- Data Governance and Security Layer
- It fixes the responsibility for governing the right data access and the rights for modifying the data.
- Ensures the documented process for the Change Tracking & Change Data Capture.
- Provides access control and authentications