DomainStore Efficiency Refactor

16 April 2026, 21:01

The Problem

Since tag 2026.4.1, the DomainStore feature (commit 34eda1a19) polls /hm/api/config on every tracked domain every 30 minutes, regardless of whether the data changed, whether the domain is reachable, or whether anyone is using it. This creates several compounding problems:

Server overload

Every node in the network independently polls every domain it has ever encountered. A popular server gets N requests every 30 minutes, where N is the number of nodes that have ever synced its content. As the network grows, this multiplies into sustained load that scales with total network size rather than actual usage.

Wasted bandwidth

The /hm/api/config endpoint returned no HTTP caching headers (no ETag, no Cache-Control). Every poll transferred the full JSON response even when nothing changed. The config data (PeerID, addresses, account UID) rarely changes -- typically only when a server restarts or moves.

Retry amplification

Unreachable domains were retried 3 times every 30 minutes, forever, with no backoff. A dead domain generated 6 failed HTTP requests per node per hour in perpetuity.

Unbounded growth

Domains are auto-tracked whenever the daemon syncs content that has a siteUrl in its metadata. They are never removed. Over time the polling list grows monotonically -- a node that has synced from 200 sites will poll 200 endpoints every 30 minutes even if the user only actively uses 5 of them.

Evidence this was too aggressive

The DomainStore already needed three emergency follow-up fixes after launch:

Connection pool starvation (too many concurrent checks exhausted SQLite connections)

Goroutine deduplication (duplicate checks for the same domain)

Shutdown handling (background goroutines outliving the process)

These are symptoms of a fundamentally too-aggressive polling design.

The Solution

Replace "poll everything on a timer" with a layered caching strategy that minimizes network traffic while keeping data fresh for actively-used domains.

1. Server-side HTTP caching (ETag + Cache-Control)

File: backend/hmnet/http_hm_api_config.go

The /hm/api/config handler now:

Serializes the response to bytes and computes a SHA-256 ETag

Checks If-None-Match request header -- returns 304 Not Modified if the ETag matches

Sets Cache-Control: public, max-age=300 (5 minutes)

This eliminates >95% of response body transfers since config data rarely changes.

2. Poll only recently-used domains

File: backend/blob/domain_store.go

Instead of checking ALL tracked domains every 30 minutes, we now:

Track last_accessed timestamp (touch-on-read) for every domain lookup

Only poll domains accessed in the last 24 hours

Poll every 2 hours instead of 30 minutes

A node with 200 tracked domains but 5 recently used drops from 200 checks/30min to 5 checks/2h -- a 40x reduction.

3. Client-side conditional requests (If-None-Match)

File: backend/blob/site_peer_resolver.go

The sitePeerResolver now:

Caches ETags per siteURL in an in-memory LRU

Sends If-None-Match with the cached ETag on every fetch

On 304: returns the cached config without deserializing a new response

Stores ETags persistently in the domains table for cross-restart survival

4. Adaptive backoff for failing domains

File: backend/blob/domain_store.go

Domains that fail to respond now use exponential backoff:

0 failures: 2h (normal interval)

1 failure: 4h

2 failures: 8h

3 failures: 16h

6+ failures: ~128h (~5 days, capped)

consecutive_failures resets to 0 on any successful check.

5. Domain eviction

File: backend/blob/domain_store.go

Domains not accessed in 30+ days are automatically deleted. Eviction runs once daily. This prevents the polling list from growing unbounded.

6. LRU/DomainStore coordination

Files: backend/blob/domain_store.go, backend/blob/site_peer_resolver.go

Two improvements to how the in-memory LRU cache and the persistent DomainStore work together:

Write-through: CheckDomain now populates the LRU cache after a successful fetch, so subsequent getConfig() calls get an instant hit.

Read-through: On LRU miss, getConfig() now checks the DomainStore before hitting the network. This means after a daemon restart, the first request for a recently-used domain resolves from the persistent cache instead of making a network call.

Resolution order: LRU (fast) -> DomainStore (warm) -> Network (slow) -> DomainStore fallback (offline).

7. Frontend silent replacement on domain change

Files: frontend/apps/desktop/src/app-grpc.ts, frontend/apps/desktop/src/components/editor.tsx, frontend/apps/desktop/src/utils/window-events.ts

When a background domain check detects that a domain now points to a different account UID:

The main process broadcasts a domainIdChanged event to all windows

Each open editor scans its ProseMirror document for hm://oldUid/... references

Link marks, embed blocks, and inline-embeds are silently updated to hm://newUid/...

Domain changes are rare, so the O(n) document scan is acceptable.

8. Web app domain resolver

Files: frontend/apps/web/app/routes/hm.api.resolve-domain.tsx, frontend/apps/web/app/domain-resolver.client.ts, frontend/apps/web/app/web-resource-page.tsx, frontend/packages/ui/src/resource-page-common.tsx

The web app's domainResolver (previously created but unused) is now wired in:

A new /hm/api/resolve-domain API route proxies domain resolution from the browser to the daemon

A client-side webDomainResolver function calls this route

The CollaboratorsPage now receives it, enabling users to add collaborators by domain name

Why This Approach

Why not push notifications / WebSockets?

Servers actively notifying clients when config changes would eliminate polling entirely. But it requires a new protocol, persistent connection management, firewall traversal, and fundamentally changes the architecture. The config data changes so rarely (maybe once a month per server) that caching + conditional requests achieve the same effect with zero new infrastructure.

Why not DNS-based service discovery?

Publishing peer info via DNS TXT/SRV records would let DNS TTL handle caching naturally. But it requires DNS infrastructure changes, zone file management, and doesn't work for sites behind CDNs or shared hosting. HTTP caching is universally supported.

Why not libp2p peer events?

When a peer reconnects, its addresses are fresh in the peerstore. We could update the DomainStore from peer events instead of HTTP polling. This would require cross-package wiring (hmnet -> blob) and a peerID -> domain reverse index. Valuable but complex -- good candidate for a follow-up.

Why not remove the DomainStore entirely?

The persistent cache is valuable for offline resolution and fast cold starts. The problem was the polling strategy, not the cache itself. After this refactor, the DomainStore is a net positive -- it reduces network traffic rather than creating it.

Why not server-side rate limiting?

The fix belongs on the client side (poll less, cache more). Server-side rate limiting would break legitimate first-time resolutions and would need to be deployed across all servers in the network. Client-side efficiency is the right lever.

Impact Summary

For a typical node with 100 tracked domains and 5 actively used: requests drop from 200/hour to ~2.5/hour, and the vast majority of those return 304 with no body.

DB Schema Change

Single migration (2026-04-16.010000) adding three columns to domains:

ALTER TABLE domains ADD COLUMN last_accessed INTEGER;
ALTER TABLE domains ADD COLUMN last_etag TEXT;
ALTER TABLE domains ADD COLUMN consecutive_failures INTEGER NOT NULL DEFAULT 0;

All columns are nullable/defaulted, so existing rows are unaffected.

Do you like what you are reading?. Subscribe to receive updates.

Unsubscribe anytime