Big picture: SQLite itself is not “randomly broken”; the daemon is creating too many concurrent write attempts against a database that can only have one writer at a time.
What the debug page is showing:
Only one SQLite writer can run
Every
WithTxusesBEGIN IMMEDIATE.That means: “I want the write lock now.”
If another writer has it, SQLite waits up to ~10s.
After that, it returns
SQLITE_BUSY.
Many callers are trying to write at once Main write sources:
blob.(*Index).PutMany-range1: indexing downloaded blobssyncing.(*Server).loadStore: sync/discovery SQL workhmnet.(*Node).connect: peer connection bookkeepinghmnet.(*Node).onLibp2pIdentification/peerWriter: peer table updatesblob.(*DomainStore).*: domain cache updates
Some writes are heavy
PutManyand sync/indexing work can hold the writer for hundreds of ms. That is not 10s alone, but during sync bursts it happens repeatedly.Some writes are tiny but numerous
hmnet.(*Node).connectwrites only a small peer row update. But there can be many concurrent connects. Each one waits in SQLite for the writer lock. If the queue is long enough, they time out at 10s.So
connectis mostly a victim, not the original hog Debug shows:connecthold time is tinyconnectwait time is ~10stherefore it is waiting behind other writers
but because there are many
connectattempts, it also amplifies the storm
The existing fixes help, but don’t fully solve fairness Existing fixes reduce some heavy holders and batch identify-event peer writes. But direct
connect()still does its own synchronousWithTx, and the system still lets many goroutines race intoBEGIN IMMEDIATE.Global diagnosis:
sync/indexing writes hold SQLite writer ↓ many peer/domain writes queue behind them ↓ SQLite busy handler waits up to 10s per goroutine ↓ some callers hit SQLITE_BUSY ↓ connect/domain bookkeeping creates more noise and contentionBest conceptual fix:
Treat SQLite writes as a single-lane road.
Don’t let every goroutine independently race for the lane.
Put low-priority/best-effort writes behind a Go-level queue/batcher.
Make heavy writers release fairly between batches.
Immediate concrete fix:
Move
hmnet.(*Node).connectpeer-row update intopeerWriter, same as identify events.
Durable fix:
Add app-level writer admission/fairness around
WithTx, especially forPutMany, peer writes, and domain writes.
Do you like what you are reading? Subscribe to receive updates.
Unsubscribe anytime