Skip to the content.

Log structured storage

Solution 1 (Simply append)

Solution 1 Analysis

Solution 1 Improve on read

Hash Index: Maintain a in-memory hash table. Key is the index key and Value is the offset of the value in file. So each time we want to read the data, we could get the file offset from hash table and find the starting offset where the latest data is stored.

On read, get the offset from the hash table then read the data starting from the offset until \n

On write, append new record into log file, update the hash table to reflect the latest offset. So there will be two operations: update in-memory hash table and append to log file

Solution 1 Improve on write

On write, append new record into current log file (update current hash table) or into new log file (create a new hash table). Each log segement has its own in-memory hash table, because new records are appended into log file.

On read, search in the most recent in-memory hash table first and then the second most recent until find the key.

Compacting and merging log segements are performed in the background on previous log files and in-memory hash tables. Because current ones are still serving the write operation.

why we need compaction and merging

data-file-compaction

Compaction means throwing away duplicate keys in the log, and keeping only the most recent update for each key.

data-file-merging

since compaction often makes segments much smaller (assuming that a key is overwritten several times on average within one segment), we can also merge several segments together at the same time as performing the compaction,

Solution 1 Crash recovery

in-memory hash table, it could be either rebuilt on system reboot (takes long time if log file is large) or restoring the hash table from snapshot.

Crash could happend in between of the disk IO. File system crash recovery solution could help: https://web.stanford.edu/~ouster/cgi-bin/cs140-winter13/lecture.php?topic=recovery

Remaining issues

Solution 2 (SSTables and LSM-Trees)

Solution 2 Analysis

Solution 2 On write

Solution 2 On read

Solution 2 Improve on read

If the key we are looking for does not exist, we have to check the current in-memory red-black tree first and then scan the SSTable on disk from most recent to oldest. Storage engines usually use Bloom filters (wiki) to improve the performance.

Solution 2 Improve on compaction

Solution 2 Crash recovery

It is possible that system crashes before writing in-memory red-black or AVL tree onto disk as the SSTable. We could have a write ahead logging like journaling file system to track the changes before writing data into the in-memory red-black or AVL tree.