Wouldn’t it be amazing if we didn’t have to waste so much precious space on our expensive and sensitive SSDs to run an Ethereum node, and could rather move at least some of the data onto a cheap and durable HDD?
With the v1.9.0 release, Geth separated its database into two parts (done by Péter Szilágyi, Martin Holst Swende and Gary Rong):
- Recent blocks, all state and accelerations structures are kept in a fast key-value store (LevelDB) as until now. This is meant to be run on top of an SSD as both disk IO performance is crucial.
- Blocks and receipts that are older than a cutoff threshold (3 epochs) are moved out of LevelDB into a custom freezer database, that is backed by a handful of append-only flat files. Since the node rarely needs to read these data, and only ever appends to them, an HDD should be more than suitable to cover it.
A fresh fast sync at block 7.77M placed 79GB of data into the freezer and 60GB of data into LevelDB.
By default Geth will place your freezer inside your
chaindata folder, into the
ancient subfolder. The reason for using a sub-folder was to avoid breaking any automated tooling that might be moving the database around or across instances. You can explicitly place the freezer in a different location via the
--datadir.ancient CLI flag.
When you update to v1.9.0 from an older version, Geth will automatically being migrating blocks and receipts from the LevelDB database into the freezer. If you haven’t specified
--datadir.ancient at that time, but would like to move it later, you will need to copy the existing
ancient folder manually and then start Geth with
--datadir.ancient set to the correct path.
Since the freezer (cold data) is stored separately from the state (hot data), an interesting question is what happens if one of the two databases goes missing?
- If the freezer is deleted (or a wrong path specified), you essentially pull the rug from underneath Geth. The node would become unusable, so it explicitly forbids doing this on startup.
- If, however, the state database is the one delete, Geth will reconstruct all its indices based on the frozen data; and then do a fast sync on top to back-fill the missing state.
Essentially, the freezer can be used as a guerrilla state pruner to periodically get rid of accumulated junk. By removing the state database, but not the freezer, the node will do a fast sync to fetch the latest state, but will reuse all the existing block and receipt data already downloaded previously.
You can trigger this via
geth removedb (plus the
--datadir.ancient flags if you used custom ones); asking it to only remove the state database, but not the ancient database.
Be advised, that reindexing all the transactions from the ancient database can take over an hour, and fast sync will only commence afterwards. This will probably be changed into a background process in the near future.