Lakehouse

Yik San Chan, datanotes
Back

This summary is based on the Lakehouse paper by Databricks. Many thanks to Ruben Berenguel's writing that helps me sort it out.

Data warehousing has evolved a lot, as shown in the following figure copied from the paper.

evolution

First Generation: Data Warehouses

Pros:

Cons:

Second Generation: two-tier lake + warehouse architecture

Data first go to the lake, and then get synced to warehouses.

Pros:

Cons:

Third generation: Lakehouse architecture

Best of both worlds! Here's how:

Conclusion

Note that Lakehouse is more of a specification, while Delta Lake is an implementation by Databricks. Go check it out!


Feedback is a gift! Please send your feedback via email or Twitter.

© Yik San Chan. Built with Vercel and Nextra.