I've always been curious about the internals of databases, so I decided to build my own Log-Structured Merge-Tree (LSM-Tree) engine in Go to understand the "magic" behind write-optimized storage.
Go-LSM is a persistent Key-Value engine that manages the full lifecycle of data from volatile RAM to immutable disk storage.
TECHNICAL HIGHLIGHTS:
1. THE DURABILITY LAYER (WAL)
To ensure zero data loss, I implemented an append-only Write-Ahead Log.
• Custom binary protocol: [Type][KeyLen][ValLen][Key][Value]
• File.Sync() to force kernel flushes to physical hardware
• Ensures absolute durability on system crashes
2. SKIPLIST MEMTABLE
Instead of a standard tree, I used a SkipList for the in-memory layer. • Provides O(log n) search and insertion without the rebalancing complexity of Red-Black trees
• Keeps keys lexicographically sorted for the eventual SSTable flush
• Enables efficient memory-to-disk transitions
3. SSTABLE FOOTER-BASED INDEXING
My SSTables are binary-searchable on disk using a tail-first reading strategy: • Jump to the last 8 bytes of a file to find the Index Block pointer
• Avoid full file scans by reading directly to the index
• Execute binary search on sorted keys for O(log n) disk lookups
4. MAINTENANCE LAYER: COMPACTION
Built a K-Way Merge compaction engine that handles performance optimization: • Processes multiple SSTable layers and merges them into single, optimized files
• Handles "Read Amplification" by reducing the number of files to check per query
• Processes "Tombstones" to finalize deletions and reclaim disk space
5. TOOLING & DEBUGGING
Custom binary parsers transform raw binary files into human-readable tables: • lsm-dump: View sorted SSTable contents
• lsm-wal-dump: Inspect unflushed Write-Ahead Log entries
• Enables deep inspection of internal storage layers
KEY ENGINEERING LESSONS:
Moving from standard application development to systems programming required a fundamental shift in how I think about memory and I/O: • ENDIANNESS LOGIC: Handling Big-Endian vs. Little-Endian conversions for cross-platform compatibility
• FILE OFFSET MANAGEMENT: Manually managing byte offsets and file pointer positioning
• CONCURRENCY & THREAD SAFETY: Implementing thread-safe mechanisms for MemTable flushing
• BINARY PROTOCOL DESIGN: Creating efficient, compact data encodings for durability
REPOSITORY:
https://github.com/Jyotishmoy12/LSM-Tree-in-Golang