Overview
A data lakehouse merges data warehouses' management and transactions with data lakes' flexibility and cost-efficiency, enabling business intelligence and machine learning on a single open data platform (Armbrust, 2021). This approach aligns with the idea of the currently very popular "data mesh" (Dehghani, 2019), emphasizing decentralized teams taking ownership of specific data products rather than relying on a centralized data team.
The decentralized architecture of a data lakehouse increases the importance of an effective data governance. However, data lakes process metadata (required for data governance) slowly compared to databases. Therefore, the architecture of the data governance layer must be designed for efficient metadata management to compensate for limited query performance (Jain, 2023).
We aim at investigating how such a data-governance layer is designed adequately.