Many organizations are setting up Data Lake initiatives. The duration and cost of these projects make them affordable to large corporations only. Indeed, such initiatives are to be regarded as programs, not projects.
Beyond financial aspects, Data Lakes raise structural issues.
- Firstly, Data Lakes, which allow data to be stored in its original format regardless of volume, often resemble gigantic warehouses from which value can hardly be extracted. In most cases, data is neither prioritized, nor enriched. And we need to keep in mind that data can only be useful if it helps us make the right decision at the right time.
- The second problem is related to latencies caused by these systems. The distance between the data sources or between the consumers and the data, together with the poor management of updates (often resulting in complete overwriting of the data), makes this centralized data neither easily updated, nor seamlessly available, and therefore not compatible with operational requirements.
- Lastly, data security management is not usually a fundamental part of Data Lake designs. This entails risks of data leakage. Indeed, feeding a Data Lake with data that should be protected by a rigorous security policy can lead to illegal consumption (GDPR).
Implementing Data Lakes initiatives therefore requires the capacity to address these issues, critical for any organization.