LLM Wiki
Problem
A Seed space accumulates documents over months -- meeting notes, specs, discussion threads, imported articles, personal
notes. But the knowledge stays fragmented. There's no layer connecting them. You wrote about a decision in three
different places over two months and you don't even remember the first one. Two sources contradict each other and nobody
noticed because nobody re-read both.
The existing chat assistant can search and read documents on demand, but every conversation starts cold. It pieces
together fragments from raw documents the same way a RAG system does -- re-deriving context from scratch on every query.
Ask it a subtle question that spans five documents and it has to find and synthesize all five every time. Nothing
accumulates. Nothing is built up.
Users don't have time to manually maintain cross-references, and nobody ever will. The linking problem isn't laziness --
it's that the maintenance work scales faster than the content itself. Twenty documents need maybe ten cross-references.
Two hundred documents need thousands. No human keeps up with that.
What we need is a persistent, compounding knowledge layer that sits between the user and their raw documents. One that
the LLM builds and maintains, not the user. The user sources, explores, and asks questions. The LLM does the grunt work:
summarizing, cross-referencing, filing, flagging contradictions, keeping the map current as the territory changes.
Solution
Wiki document tree
The wiki is a set of regular Seed documents under a /_wiki/ path prefix in each space. No new document types, no new storage layer. Every wiki page is created via the existing CreateDocumentChange gRPC endpoint and flows through normal blob indexing -- FTS, embeddings, resource_links, everything.
The tree looks like this:
/_wiki/ # Index page: auto-generated table of contents, recent changes
/_wiki/entities/ # Directory listing all entity pages
/_wiki/entities/person-name # An entity page
/_wiki/entities/project-alpha # Another entity page
/_wiki/topics/ # Directory listing all topic pages
/_wiki/topics/architecture # A topic page
/_wiki/topics/auth-migration # Another topic page
/_wiki/contradictions # Contradiction log
/_wiki/_config # Wiki instruction documentEach wiki document carries metadata attributes (set via SetAttribute change operations on the document root):
wiki:type -- one of "entity", "topic", "index", "contradiction-log", "config"
wiki:sources -- JSON array of source document IRIs that contributed to this page
wiki:source-versions -- JSON object mapping source IRIs to the version string that was last processed
wiki:last-processed -- ISO 8601 timestamp of the last LLM processing run
wiki:human-curated -- boolean, flipped to true when a human (non-bot) author edits the page
Entity pages
One page per notable person, project, concept, or thing that appears across multiple source documents. "Notable" means
it shows up in more than one source with enough context that a standalone page is useful. An entity mentioned once in
passing doesn't get its own page -- it gets a line in a topic page.
Content of an entity page: what the entity is, key facts extracted from sources, a timeline of mentions (chronologically
ordered by source document dates), and inline hm:// links back to every source document that discusses it. The entity
page is the single place you go to understand everything the space knows about that entity.
The LLM decides what qualifies as a notable entity. This is governed by the wiki instruction document
(/_wiki/_config), which can define rules like minimum mention count, types of entities to track (people, projects,
tools, decisions), and entities to explicitly include or exclude.
Topic pages
One page per recurring theme, decision area, or subject that spans multiple sources. Where entity pages are about nouns,
topic pages are about ideas. "How we handle authentication" is a topic. "The migration from REST to gRPC" is a topic.
Content: a synthesized overview of the theme, the key arguments or positions from different sources (attributed with
links), open questions that haven't been resolved, and links to relevant entity pages. Topic pages are where the LLM's
synthesis work is most visible -- it's not just listing facts, it's connecting them into a narrative.
Contradiction log
A single document at /_wiki/contradictions. Each entry is a section with:
The specific claims that are in tension (quoted or paraphrased from sources)
Links to the source documents that conflict
Links to the wiki pages affected by the contradiction
A timestamp for when the contradiction was detected
When the LLM processes a new source and finds that it says something different from what's already in the wiki, it does
two things: updates the relevant wiki page to reflect the tension (e.g., "Source A says X, but Source B says Y"), and
adds an entry to the contradiction log.
The contradiction log is a Seed document like any other. It shows up in search, it syncs via P2P, and its
:activity/citations view links back to the wiki pages and source documents involved. Humans can review contradictions
and resolve them -- by editing source documents, by editing wiki pages, or by adding a new source that clarifies the
conflict. When a contradiction is resolved, the LLM removes the entry on the next processing run (it re-evaluates the
sources and sees that the tension no longer exists).
Cross-references
Wiki pages link to each other and to source documents using inline hm:// links -- standard Seed annotations with type=Link. This is the existing annotation system, nothing new.
Because Seed already indexes all links in the resource_links table and computes backlinks via the ListEntityMentions
gRPC (which powers the :activity/citations view), the cross-reference graph builds itself automatically. When the LLM
creates a wiki page about "Project Alpha" and links to three source documents and two other wiki pages, those links are
indexed during normal blob processing. Open any of those source documents, check :activity/citations, and you'll see
the "Project Alpha" wiki page listed as a citation.
This is bidirectional by default. Source documents cite wiki pages (because the wiki links to them). Wiki pages cite
each other (because topics reference entities and vice versa). The existing citation-based authority ranking in search
(applyAuthorityRanking in entities.go) means wiki pages with many inbound links rank higher in search results --
which is exactly right, because a well-connected wiki page is more useful than an isolated one.
No new cross-reference infrastructure is needed. The existing resource_links table, ListEntityMentions RPC, and
citation graph do all the work.
The wiki instruction document
The /_wiki/_config document is the brain of the wiki pipeline. It's not a system prompt string stuffed into an API
call. It's a comprehensive reference manual -- like the skill documents we use for agent workflows in this repo -- that
teaches the LLM everything it needs to know to build and maintain the wiki.
The pipeline reads /_wiki/_config and passes its full content as context to the LLM on every processing run. This
document encodes all the decisions that shape wiki behavior:
What the wiki is and what it's for in this particular space
The document tree structure and what goes where
How to identify notable entities vs. passing mentions (frequency thresholds, entity types to track, explicit
include/exclude lists)
How to structure entity pages vs. topic pages (section order, level of detail, what to include and what to skip)
When to create a new page vs. merge content into an existing one
How to detect and record contradictions (what counts as a contradiction vs. a nuance, how to phrase entries)
How to format cross-references using hm:// URLs and Seed's annotation system
What writing voice and style to use (a personal knowledge base might want casual, a team site might want structured
and formal)
How to handle updates to existing pages -- when to patch incrementally vs. rewrite a section
What to do when sources are deleted (remove dependent content, flag as unverified, or leave as-is)
What NOT to do (don't invent facts, don't remove content without justification, don't link to external URLs unless the
source does)
The Seed team ships a default version of this document. When the pipeline first runs on a space and /_wiki/_config
doesn't exist, it creates it with the default content. Users can then edit it -- it's a regular Seed document, so it's
versioned, synced, and editable by anyone with write capabilities on the space.
Different spaces can have radically different wiki behaviors just by editing their config. The Seed team can evolve the
default by updating the hardcoded fallback. Users who haven't customized their config get improvements automatically.
Users who have customized it keep their version.
The processing pipeline
The pipeline runs in the Electron main process as a background loop, same environment as the existing chat assistant. It
uses the same Vercel AI SDK infrastructure -- the same createProviderModel() factory, the same provider configuration,
the same API keys or ChatGPT login session. No Go daemon code for LLM calls. The daemon is involved only as a data
layer: gRPC calls to read documents, write changes, and query the index.
Where the chat uses streamText (because it needs to show tokens appearing in real time), the wiki pipeline would use
generateObject from the AI SDK -- it sends the LLM a prompt and gets back a typed, structured response describing
exactly what pages to create or update. No streaming to a UI needed.
The loop runs on a configurable interval (default maybe 5 minutes). Each cycle:
Read /_wiki/_config from the space via the read gRPC (or use the hardcoded default if the config document doesn't
exist yet).
List all documents in the space and compare against the wiki's processing state. The state is tracked in the wiki
pages themselves -- each wiki page's wiki:source-versions metadata records which version of each source was last
processed. A new document (not referenced in any wiki page's metadata) or an updated document (version mismatch) gets
queued for processing.
For each changed or new source document, read its full content via gRPC.
Read the current state of any wiki pages that reference this source (identified by their wiki:sources metadata).
Build the LLM request: the instruction document (/_wiki/_config) as system context, the source document content,
and the current wiki pages that might need updating. The LLM sees the full picture -- what the source says and what
the wiki currently says about it.
The LLM returns a structured response (via generateObject) specifying: which existing wiki pages to update (with
new block content), which new pages to create (with path, type, and content), which contradiction entries to add.
Apply changes via CreateDocumentChange gRPC calls for each affected wiki page. Each edit is a new Change blob that
flows through normal indexing -- FTS gets updated, embeddings get queued, resource_links are recomputed. The wiki
integrates with everything else automatically.
Update the wiki:source-versions metadata on each affected wiki page to record the version that was just processed.
The processing state lives in the wiki pages themselves rather than a separate tracking table. This is intentional --
the wiki pages are the source of truth for what's been processed. If a wiki page says it was built from version X of a
source, and the source is now at version Y, that page needs reprocessing. No separate state to get out of sync.
Ownership and permissions
Wiki pages are owned by the space owner -- they're created with the space owner's signing key (or an AGENT capability
delegated to the wiki bot). The permission model is the same as any other Seed document:
Personal spaces: The user's LLM generates and maintains the wiki. Only they can write to it. Others who subscribe
to the space can read the wiki but not modify it (standard Seed behavior -- you need a write capability to push
changes).
Collaborative spaces: Anyone with write capabilities on the space can edit wiki pages manually. The wiki respects
the same permission boundaries as any other document under the space's path hierarchy.
The /_wiki/ path prefix is a convention, not a permission boundary. But you could issue path-scoped capabilities restricted to /_wiki/ if you wanted to give someone wiki-editing rights without access to the rest of the space. The existing capability model supports this -- no_recursive and is_exact flags control path scoping.
Self-healing
Three mechanisms keep the wiki accurate over time:
Version tracking. Each wiki page records which version of each source document it was built from (in
wiki:source-versions metadata). When a source document gets updated -- a new Ref blob with different heads -- the
version changes. The next pipeline run detects the mismatch: the wiki page says it processed version A, but the source
is now at version B. The pipeline re-reads the source and sends it to the LLM along with the current wiki page, asking
it to update the page to reflect the new version. Only the changed content needs re-evaluation, not the entire wiki.
Deletion handling. When a source document is tombstoned (via CreateRef with a tombstone target, which is how Seed
handles deletion), the pipeline detects this on its next run. It finds all wiki pages that list the deleted source in
their wiki:sources metadata. For each affected page, it sends the current page content to the LLM with instructions to
remove or flag content that relied solely on the deleted source. If all sources for a page are deleted, the page itself
gets tombstoned.
Periodic regeneration. Incremental patching accumulates drift. The LLM keeps adding sentences, restructuring
paragraphs, and after twenty incremental updates a page can become incoherent -- repetitive, poorly organized,
internally contradictory. To prevent this, every N processing cycles (configurable via /_wiki/_config, maybe every 20
cycles), the pipeline fully regenerates a wiki page from all its current sources rather than patching incrementally. The
LLM reads all sources fresh and writes the page from scratch. This is more expensive (more tokens, more API calls) but
prevents quality decay. The regeneration schedule could be staggered -- regenerate a few pages per cycle rather than all
at once.
Discarding wiki content
When should wiki content be removed?
All sources deleted. If every source document contributing to an entity or topic page has been tombstoned, the
wiki page itself gets tombstoned. An entity with no sources is an entity that doesn't exist.
Human deletion. If a human deletes a wiki page (standard Seed document deletion), the pipeline respects that. The
deleted path gets added to an implicit exclusion list -- the pipeline won't recreate a page at a path that was
explicitly tombstoned by a human. The human's intent takes priority over the LLM's.
Relevance decay. If an entity drops below the relevance threshold (it's no longer mentioned in any active source
document, or it was only ever mentioned once and that mention was minor), the LLM can fold its content into a broader topic page and tombstone the standalone entity page. This keeps the wiki from bloating with stale, low-value pages. The rules for this are defined in /_wiki/_config.
Human editing
Wiki pages are regular Seed documents. Anyone with write capabilities can edit them. The question is what happens when
the LLM next processes that page.
The rule: when a human edits a wiki page, the wiki:human-curated metadata attribute gets set to true. The pipeline
detects this by comparing the latest Change blob's author against the wiki bot's signing key -- if the author is someone
else, a human edited it. When this flag is set, the LLM switches to append-only mode for that page. It can add new
information below the human-edited content (clearly demarcated), but it won't overwrite, restructure, or remove anything
the human wrote.
If you want the LLM to take full control of a page again -- maybe you made a quick fix that the LLM should integrate
properly -- clear the wiki:human-curated flag (set it to false or remove it). The next processing run will treat the
page as fully LLM-managed and may restructure it.
This gives humans a clean override mechanism. Edit a wiki page and it becomes yours. The LLM will still add to it, but
it won't mess with your work.
Accessing the wiki
Once generated, the wiki doesn't need any new access infrastructure. It plugs into everything that already exists
because wiki pages are just documents.
Browsing. Navigate to /_wiki/ in the desktop app. It's a document tree. The index page has a table of contents. Follow links to entity pages, topic pages, the contradiction log. Check :activity/citations on any page to see what else references it. Check :directory to list child pages.
Full-text search. Wiki pages are indexed in the FTS5 virtual table like any other document. Search for
"authentication" and you'll get both raw source documents and the wiki's topic page on authentication. The wiki pages
might actually rank better because they're more structured and keyword-dense -- the synthesis concentrates relevant
terms.
Semantic search. If the embedding indexer is enabled, wiki pages get embedded alongside regular documents. The
synthesized, cross-referenced wiki content produces denser embeddings that may match queries better than raw source
fragments.
Chat assistant. The existing assistant can search and read wiki pages through its search and read tools -- no
changes needed. When a user asks the assistant a question, it can find and reference wiki pages as context. This is
where the wiki's value compounds: instead of the assistant piecing together five raw documents, it finds the wiki page
that already synthesized those five documents. Better answers with fewer tool calls.
Citations. Open any source document and check :activity/citations. You'll see which wiki pages reference it. This creates bidirectional navigation: sources lead to wiki pages, wiki pages lead to sources. The existing ListEntityMentions RPC and resource_links table power this with zero new code.
Usage model
The wiki is LLM-written, human-read. You add source documents to your space -- notes, imports, meeting records, whatever content you work with. The LLM maintains the wiki. You browse, search, and read it. You can edit wiki pages if you want -- the LLM will respect your changes. You can customize how the wiki behaves by editing /_wiki/_config.
The wiki is not a replacement for source documents. Sources are ground truth. The wiki is the map. It tells you what's
in the territory, how things connect, and where the conflicts are. When the territory changes (sources are added,
updated, or deleted), the map updates itself.
Scope
This feature depends on two capabilities that don't exist yet.
Prerequisite: Document imports
The wiki is only useful if there are enough source documents to synthesize. Today, getting content into Seed is manual
-- you write documents in the editor or use the WordPress XML importer. For the wiki to deliver real value, users need
to import content from various sources: web pages, PDFs, notes from other apps, bookmarks, articles.
What the wiki needs from this: a reliable way to ingest external content as Seed documents with proper metadata. At
minimum, each imported document should carry its source URL (or origin identifier) and import date so the wiki can
attribute information back to its source. The import pipeline doesn't need to be perfect -- the wiki LLM can work with
rough text extraction and messy formatting -- but it needs to exist. Without it, most users won't have the document
volume that makes a wiki meaningful. A wiki over three documents isn't a wiki, it's a summary.
Prerequisite: LLM content creation
The wiki pipeline needs to programmatically create and edit Seed documents from LLM output. Today, the backend handles
document creation via the CreateDocumentChange gRPC, but the path from "LLM generates structured content" to "that
content becomes a Seed document" doesn't exist as a reusable layer.
What the wiki needs from this: a service in the frontend (TypeScript, same environment as the chat assistant) that takes
structured LLM output -- something like "create a document at path /_wiki/entities/project-alpha with these blocks,
these inline links, and these metadata attributes" -- and translates it into CreateDocumentChange requests with proper
block IDs, MoveBlock + ReplaceBlock + SetAttribute operations, and signing. This is the bridge between
LLM-generated content and Seed's document model.
The AGENT capability role already exists in the proto definitions (proto/documents/v3alpha/access_control.proto) but
isn't fully wired into a content-creation workflow. The wiki bot would be the first real consumer of AGENT capabilities
-- a signing key that's authorized to write documents on behalf of the space owner, used by the pipeline to create and
update wiki pages without requiring the owner's key for every change.
No Gos
Not replacing the existing chat assistant. The wiki provides better context for the assistant, not a different
interaction model.
Not building a new search system. The wiki uses existing FTS and semantic search.
Not supporting real-time wiki updates as the user types. Processing is batched on a timer.
Not handling non-text media (images, audio, video) as source material in v1.
Not supporting user-defined wiki page schemas or templates in v1 beyond what's configurable in /_wiki/_config.
Not modifying or deleting source documents. The wiki only reads sources, never touches them.
Not addressing self-hosted site wiki generation. This proposal covers local desktop app wiki generation only.
Self-hosted sites are a separate problem that involves running the pipeline server-side without an Electron process.
Do you like what you are reading?. Subscribe to receive updates.
Unsubscribe anytime