♯ Gradual Onboard

# ♯ Gradual Onboard ## Metadata **Kind**:: #paralet **PARA**:: [[4 Archive]] **Status**:: #x **Zettel**:: #zettel/fleeting **Created**:: [[2026-06-23]] ## Learning - [x] Go through Architecting an Apache Iceberg Lakehouse [manning.com](https://www.manning.com/books/architecting-an-apache-iceberg-lakehouse) - [x] Read Chapter 7 Implementing the catalog layer - [x] Read Chapter 8 Designing the federation layer - [x] Read 11.1 Orchestrating the lakehouse - [x] Read Crack Any Codebase with AI - How to use AI to understand legacy code. - [x] Study [Project Nessie](https://projectnessie.org/)j, a Git-inspired data lake catalog. - [x] Go through [[♯ Kleppmann - Designing data-intensive applications|DDIA, 2nd edition]] - [x] Read Chapter 5 Encoding and Evolution Products to Try - [x] Dremio (Lakehouse) - [x] Nessie (Catalog, [https://projectnessie.org/](https://projectnessie.org/)) ## Reading Legacy Code ``` Write me a learning note for this codebase. Simple markdown I can keep editing. Be aggressively short. Assume I know Python but not machine learning. Two sections: "Concepts I need to know" (max 8 items, plain English, where each shows up in the code) and "The important files" (max 8 items, what each does, plus a dependency chain). ``` ## Team - USA: Data engineering, Integration - China: Business ## People - Josh, community - Lam, Allan's wife, designer ## Areas **Stage**: Features ready, Expansion ### Data Analysis - User 360 - Insights - Data enrichment - Data valuable for clients - BI for clients, business needs - Data uniform - Product feedback, requirements - ETL. Efficient, bespoken pipeline ### Troubles - Meeting reliability - Security - Redundancy and legacy code ## Tech Stack - Click House - They have brought native pipe supports for PostgreSQL and MongoDB - MongoDB via cloud service - Business heavy frontend