# Fabian Reinartz - Writing a Time Series Database From Scratch (Highlights)

## Metadata
**Cover**:: https://readwise-assets.s3.amazonaws.com/static/images/article3.5c705a01b476.png
**Source**:: #from/readwise
**Zettel**:: #zettel/fleeting
**Status**:: #x
**Authors**:: [[Fabian Reinartz]]
**Full Title**:: Writing a Time Series Database From Scratch
**Category**:: #articles #readwise/articles
**Category Icon**:: 📰
**Document Tags**:: #database #performance
**URL**:: [fabxc.org](https://fabxc.org/tsdb/)
**Host**:: [[fabxc.org]]
**Highlighted**:: [[2021-02-21]]
**Created**:: [[2022-09-26]]
## Highlights
- Each data point is a tuple of a timestamp and a value.
- the timestamp is an integer and the value any number.
- the blog series “Coding for SSDs” series is a an excellent resource
#further-reading #rl
[“Coding for SSDs” series](http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/)
- the only batches we get are vertical sets of data points across series.
- ideally, samples for the same series would be stored sequentially so we can just scan through them with as few reads as possible.
- There’s obviously a strong tension between the ideal pattern for writing collected data to disk and the layout that would be significantly more efficient for serving queries.
- We create one file per time series that contains all of its samples in sequential order.
- we batch up 1KiB chunks of samples for a series in memory and append those chunks to the individual files
- It also enables incredibly efficient compression formats, based on the property that a given sample changes only very little with respect to the previous sample in the same series
- Facebook’s paper on their Gorilla TSDB describes a similar chunk-based approach and introduces a compression format that reduces 16 byte samples to an average of 1.37 bytes.
#further-reading #rl
http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
- keeping a separate file for each series is troubling the V2 storage for various reasons:
- We actually need a lot more files than the number of time series we are currently collecting data for.
- several thousands of chunks per second are completed and ready to be persisted.
- It’s infeasible to keep all files open for reads and writes.
- Eventually, old data has to be deleted and data needs to be removed from the front of millions of files.
- Chunks that are currently accumulating are only held in memory.
- If the application crashes, data will be lost.
- In the Prometheus context, we use the term series churn to describe that a set of time series becomes inactive
- The current V2 storage of Prometheus has an index based on LevelDB for all series that are currently stored.
- lacks a scalable way to combine results from different label selections.
- We partition our horizontal dimension, i.e. the time space, into non-overlapping blocks. Each block acts as a fully independent database containing all time series data for its time window.
- Every block of data is immutable.
- all new data is written to an in-memory database
- To prevent data loss, all incoming data is also written to a temporary write ahead log,
- This layout allows us to fan out queries to all blocks relevant to the queried time range.
- When completing a block, we can persist the data from our in-memory database by sequentially writing just a handful of larger files.
- recent chunks, which are queried most, are always hot in memory.
- we are also no longer bound to the fixed 1KiB chunk size to better align data on disk.
- Deleting old data becomes extremely cheap and instantaneous.
- meta.json file. It simply holds human-readable information about the block
- a handful of larger allows us to keep all files open with little overhead. This unblocks the usage of mmap(2)
- keeping each block reasonably short
- avoid accumulating too much data in memory
- When querying multiple blocks, we have to merge their results into an overall result.
- Compaction describes the process of taking one or more blocks of data and writing them into a, potentially larger, block.
- we need a more capable inverted index.