LLM Wiki

      Problem

        A Seed space accumulates documents over months -- meeting notes, specs, discussion threads, imported articles, personal
        notes. But the knowledge stays fragmented. There's no layer connecting them. You wrote about a decision in three
        different places over two months and you don't even remember the first one. Two sources contradict each other and nobody
        noticed because nobody re-read both.

        The existing chat assistant can search and read documents on demand, but every conversation starts cold. It pieces
        together fragments from raw documents the same way a RAG system does -- re-deriving context from scratch on every query.
        Ask it a subtle question that spans five documents and it has to find and synthesize all five every time. Nothing
        accumulates. Nothing is built up.

        Users don't have time to manually maintain cross-references, and nobody ever will. The linking problem isn't laziness --
        it's that the maintenance work scales faster than the content itself. Twenty documents need maybe ten cross-references.
        Two hundred documents need thousands. No human keeps up with that.

        What we need is a persistent, compounding knowledge layer that sits between the user and their raw documents. One that
        the LLM builds and maintains, not the user. The user sources, explores, and asks questions. The LLM does the grunt work:
        summarizing, cross-referencing, filing, flagging contradictions, keeping the map current as the territory changes.

      Solution

        Wiki document tree

          The wiki is a set of regular Seed documents under a /_wiki/ path prefix in each space. No new document types, no new storage layer. Every wiki page is created via the existing CreateDocumentChange gRPC endpoint and flows through normal blob indexing -- FTS, embeddings, resource_links, everything.

          The tree looks like this:

          /_wiki/                         # Index page: auto-generated table of contents, recent changes
          /_wiki/entities/                # Directory listing all entity pages
          /_wiki/entities/person-name     # An entity page
          /_wiki/entities/project-alpha   # Another entity page
          /_wiki/topics/                  # Directory listing all topic pages
          /_wiki/topics/architecture      # A topic page
          /_wiki/topics/auth-migration    # Another topic page
          /_wiki/contradictions           # Contradiction log
          /_wiki/_config                  # Wiki instruction document

          Each wiki document carries metadata attributes (set via SetAttribute change operations on the document root):

            wiki:type -- one of "entity", "topic", "index", "contradiction-log", "config"

            wiki:sources -- JSON array of source document IRIs that contributed to this page

            wiki:source-versions -- JSON object mapping source IRIs to the version string that was last processed

            wiki:last-processed -- ISO 8601 timestamp of the last LLM processing run

            wiki:human-curated -- boolean, flipped to true when a human (non-bot) author edits the page

        Entity pages

          One page per notable person, project, concept, or thing that appears across multiple source documents. "Notable" means
          it shows up in more than one source with enough context that a standalone page is useful. An entity mentioned once in
          passing doesn't get its own page -- it gets a line in a topic page.

          Content of an entity page: what the entity is, key facts extracted from sources, a timeline of mentions (chronologically ordered by source document dates), and inline hm:// links back to every source document that discusses it. The entity
          page is the single place you go to understand everything the space knows about that entity.

          The LLM decides what qualifies as a notable entity. This is governed by the wiki instruction document (/_wiki/_config), which can define rules like minimum mention count, types of entities to track (people, projects,
          tools, decisions), and entities to explicitly include or exclude.

        Topic pages

          One page per recurring theme, decision area, or subject that spans multiple sources. Where entity pages are about nouns,
          topic pages are about ideas. "How we handle authentication" is a topic. "The migration from REST to gRPC" is a topic.

          Content: a synthesized overview of the theme, the key arguments or positions from different sources (attributed with
          links), open questions that haven't been resolved, and links to relevant entity pages. Topic pages are where the LLM's
          synthesis work is most visible -- it's not just listing facts, it's connecting them into a narrative.

        Contradiction log

          A single document at /_wiki/contradictions. Each entry is a section with:

            The specific claims that are in tension (quoted or paraphrased from sources)

            Links to the source documents that conflict

            Links to the wiki pages affected by the contradiction

            A timestamp for when the contradiction was detected

          When the LLM processes a new source and finds that it says something different from what's already in the wiki, it does
          two things: updates the relevant wiki page to reflect the tension (e.g., "Source A says X, but Source B says Y"), and
          adds an entry to the contradiction log.

          The contradiction log is a Seed document like any other. It shows up in search, it syncs via P2P, and its :activity/citations view links back to the wiki pages and source documents involved. Humans can review contradictions
          and resolve them -- by editing source documents, by editing wiki pages, or by adding a new source that clarifies the
          conflict. When a contradiction is resolved, the LLM removes the entry on the next processing run (it re-evaluates the
          sources and sees that the tension no longer exists).

        Cross-references

          Wiki pages link to each other and to source documents using inline hm:// links -- standard Seed annotations with type=Link. This is the existing annotation system, nothing new.

          Because Seed already indexes all links in the resource_links table and computes backlinks via the ListEntityMentions gRPC (which powers the :activity/citations view), the cross-reference graph builds itself automatically. When the LLM creates a wiki page about "Project Alpha" and links to three source documents and two other wiki pages, those links are indexed during normal blob processing. Open any of those source documents, check :activity/citations, and you'll see
          the "Project Alpha" wiki page listed as a citation.

          This is bidirectional by default. Source documents cite wiki pages (because the wiki links to them). Wiki pages cite each other (because topics reference entities and vice versa). The existing citation-based authority ranking in search (applyAuthorityRanking in entities.go) means wiki pages with many inbound links rank higher in search results --
          which is exactly right, because a well-connected wiki page is more useful than an isolated one.

          No new cross-reference infrastructure is needed. The existing resource_links table, ListEntityMentions RPC, and
          citation graph do all the work.

        The wiki instruction document

          The /_wiki/_config document is the brain of the wiki pipeline. It's not a system prompt string stuffed into an API
          call. It's a comprehensive reference manual -- like the skill documents we use for agent workflows in this repo -- that
          teaches the LLM everything it needs to know to build and maintain the wiki.

          The pipeline reads /_wiki/_config and passes its full content as context to the LLM on every processing run. This
          document encodes all the decisions that shape wiki behavior:

            What the wiki is and what it's for in this particular space

            The document tree structure and what goes where

            How to identify notable entities vs. passing mentions (frequency thresholds, entity types to track, explicit

          include/exclude lists)

            How to structure entity pages vs. topic pages (section order, level of detail, what to include and what to skip)

            When to create a new page vs. merge content into an existing one

            How to detect and record contradictions (what counts as a contradiction vs. a nuance, how to phrase entries)

            How to format cross-references using hm:// URLs and Seed's annotation system

            What writing voice and style to use (a personal knowledge base might want casual, a team site might want structured

          and formal)

            How to handle updates to existing pages -- when to patch incrementally vs. rewrite a section

            What to do when sources are deleted (remove dependent content, flag as unverified, or leave as-is)

            What NOT to do (don't invent facts, don't remove content without justification, don't link to external URLs unless the

          source does)

          The Seed team ships a default version of this document. When the pipeline first runs on a space and /_wiki/_config
          doesn't exist, it creates it with the default content. Users can then edit it -- it's a regular Seed document, so it's
          versioned, synced, and editable by anyone with write capabilities on the space.

          Different spaces can have radically different wiki behaviors just by editing their config. The Seed team can evolve the
          default by updating the hardcoded fallback. Users who haven't customized their config get improvements automatically.
          Users who have customized it keep their version.

        The processing pipeline

          The pipeline runs in the Electron main process as a background loop, same environment as the existing chat assistant. It uses the same Vercel AI SDK infrastructure -- the same createProviderModel() factory, the same provider configuration,
          the same API keys or ChatGPT login session. No Go daemon code for LLM calls. The daemon is involved only as a data
          layer: gRPC calls to read documents, write changes, and query the index.

          Where the chat uses streamText (because it needs to show tokens appearing in real time), the wiki pipeline would use generateObject from the AI SDK -- it sends the LLM a prompt and gets back a typed, structured response describing
          exactly what pages to create or update. No streaming to a UI needed.

          The loop runs on a configurable interval (default maybe 5 minutes). Each cycle:

            Read /_wiki/_config from the space via the read gRPC (or use the hardcoded default if the config document doesn't

          exist yet).

            List all documents in the space and compare against the wiki's processing state. The state is tracked in the wiki

          pages themselves -- each wiki page's wiki:source-versions metadata records which version of each source was last
          processed. A new document (not referenced in any wiki page's metadata) or an updated document (version mismatch) gets
          queued for processing.

            For each changed or new source document, read its full content via gRPC.

            Read the current state of any wiki pages that reference this source (identified by their wiki:sources metadata).

            Build the LLM request: the instruction document (/_wiki/_config) as system context, the source document content,

          and the current wiki pages that might need updating. The LLM sees the full picture -- what the source says and what
          the wiki currently says about it.

            The LLM returns a structured response (via generateObject) specifying: which existing wiki pages to update (with

          new block content), which new pages to create (with path, type, and content), which contradiction entries to add.

            Apply changes via CreateDocumentChange gRPC calls for each affected wiki page. Each edit is a new Change blob that

          flows through normal indexing -- FTS gets updated, embeddings get queued, resource_links are recomputed. The wiki
          integrates with everything else automatically.

            Update the wiki:source-versions metadata on each affected wiki page to record the version that was just processed.

          The processing state lives in the wiki pages themselves rather than a separate tracking table. This is intentional --
          the wiki pages are the source of truth for what's been processed. If a wiki page says it was built from version X of a
          source, and the source is now at version Y, that page needs reprocessing. No separate state to get out of sync.

        Ownership and permissions

          Wiki pages are owned by the space owner -- they're created with the space owner's signing key (or an AGENT capability
          delegated to the wiki bot). The permission model is the same as any other Seed document:

            Personal spaces: The user's LLM generates and maintains the wiki. Only they can write to it. Others who subscribe

          to the space can read the wiki but not modify it (standard Seed behavior -- you need a write capability to push
          changes).

            Collaborative spaces: Anyone with write capabilities on the space can edit wiki pages manually. The wiki respects

          the same permission boundaries as any other document under the space's path hierarchy.

          The /_wiki/ path prefix is a convention, not a permission boundary. But you could issue path-scoped capabilities restricted to /_wiki/ if you wanted to give someone wiki-editing rights without access to the rest of the space. The existing capability model supports this -- no_recursive and is_exact flags control path scoping.

        Self-healing

          Three mechanisms keep the wiki accurate over time:

          Version tracking. Each wiki page records which version of each source document it was built from (in wiki:source-versions metadata). When a source document gets updated -- a new Ref blob with different heads -- the
          version changes. The next pipeline run detects the mismatch: the wiki page says it processed version A, but the source
          is now at version B. The pipeline re-reads the source and sends it to the LLM along with the current wiki page, asking
          it to update the page to reflect the new version. Only the changed content needs re-evaluation, not the entire wiki.

          Deletion handling. When a source document is tombstoned (via CreateRef with a tombstone target, which is how Seed handles deletion), the pipeline detects this on its next run. It finds all wiki pages that list the deleted source in their wiki:sources metadata. For each affected page, it sends the current page content to the LLM with instructions to
          remove or flag content that relied solely on the deleted source. If all sources for a page are deleted, the page itself
          gets tombstoned.

          Periodic regeneration. Incremental patching accumulates drift. The LLM keeps adding sentences, restructuring paragraphs, and after twenty incremental updates a page can become incoherent -- repetitive, poorly organized, internally contradictory. To prevent this, every N processing cycles (configurable via /_wiki/_config, maybe every 20
          cycles), the pipeline fully regenerates a wiki page from all its current sources rather than patching incrementally. The
          LLM reads all sources fresh and writes the page from scratch. This is more expensive (more tokens, more API calls) but
          prevents quality decay. The regeneration schedule could be staggered -- regenerate a few pages per cycle rather than all
          at once.

        Discarding wiki content

          When should wiki content be removed?

            All sources deleted. If every source document contributing to an entity or topic page has been tombstoned, the

          wiki page itself gets tombstoned. An entity with no sources is an entity that doesn't exist.

            Human deletion. If a human deletes a wiki page (standard Seed document deletion), the pipeline respects that. The

          deleted path gets added to an implicit exclusion list -- the pipeline won't recreate a page at a path that was
          explicitly tombstoned by a human. The human's intent takes priority over the LLM's.

            Relevance decay. If an entity drops below the relevance threshold (it's no longer mentioned in any active source

          document, or it was only ever mentioned once and that mention was minor), the LLM can fold its content into a broader topic page and tombstone the standalone entity page. This keeps the wiki from bloating with stale, low-value pages. The rules for this are defined in /_wiki/_config.

        Human editing

          Wiki pages are regular Seed documents. Anyone with write capabilities can edit them. The question is what happens when
          the LLM next processes that page.

          The rule: when a human edits a wiki page, the wiki:human-curated metadata attribute gets set to true. The pipeline
          detects this by comparing the latest Change blob's author against the wiki bot's signing key -- if the author is someone
          else, a human edited it. When this flag is set, the LLM switches to append-only mode for that page. It can add new
          information below the human-edited content (clearly demarcated), but it won't overwrite, restructure, or remove anything
          the human wrote.

          If you want the LLM to take full control of a page again -- maybe you made a quick fix that the LLM should integrate properly -- clear the wiki:human-curated flag (set it to false or remove it). The next processing run will treat the
          page as fully LLM-managed and may restructure it.

          This gives humans a clean override mechanism. Edit a wiki page and it becomes yours. The LLM will still add to it, but
          it won't mess with your work.

        Accessing the wiki

          Once generated, the wiki doesn't need any new access infrastructure. It plugs into everything that already exists
          because wiki pages are just documents.

          Browsing. Navigate to /_wiki/ in the desktop app. It's a document tree. The index page has a table of contents. Follow links to entity pages, topic pages, the contradiction log. Check :activity/citations on any page to see what else references it. Check :directory to list child pages.

          Full-text search. Wiki pages are indexed in the FTS5 virtual table like any other document. Search for
          "authentication" and you'll get both raw source documents and the wiki's topic page on authentication. The wiki pages
          might actually rank better because they're more structured and keyword-dense -- the synthesis concentrates relevant
          terms.

          Semantic search. If the embedding indexer is enabled, wiki pages get embedded alongside regular documents. The
          synthesized, cross-referenced wiki content produces denser embeddings that may match queries better than raw source
          fragments.

          Chat assistant. The existing assistant can search and read wiki pages through its search and read tools -- no
          changes needed. When a user asks the assistant a question, it can find and reference wiki pages as context. This is
          where the wiki's value compounds: instead of the assistant piecing together five raw documents, it finds the wiki page
          that already synthesized those five documents. Better answers with fewer tool calls.

          Citations. Open any source document and check :activity/citations. You'll see which wiki pages reference it. This creates bidirectional navigation: sources lead to wiki pages, wiki pages lead to sources. The existing ListEntityMentions RPC and resource_links table power this with zero new code.

        Usage model

          The wiki is LLM-written, human-read. You add source documents to your space -- notes, imports, meeting records, whatever content you work with. The LLM maintains the wiki. You browse, search, and read it. You can edit wiki pages if you want -- the LLM will respect your changes. You can customize how the wiki behaves by editing /_wiki/_config.

          The wiki is not a replacement for source documents. Sources are ground truth. The wiki is the map. It tells you what's
          in the territory, how things connect, and where the conflicts are. When the territory changes (sources are added,
          updated, or deleted), the map updates itself.

      Scope

        This feature depends on two capabilities that don't exist yet.

        Prerequisite: Document imports

          The wiki is only useful if there are enough source documents to synthesize. Today, getting content into Seed is manual
          -- you write documents in the editor or use the WordPress XML importer. For the wiki to deliver real value, users need
          to import content from various sources: web pages, PDFs, notes from other apps, bookmarks, articles.

          What the wiki needs from this: a reliable way to ingest external content as Seed documents with proper metadata. At
          minimum, each imported document should carry its source URL (or origin identifier) and import date so the wiki can
          attribute information back to its source. The import pipeline doesn't need to be perfect -- the wiki LLM can work with
          rough text extraction and messy formatting -- but it needs to exist. Without it, most users won't have the document
          volume that makes a wiki meaningful. A wiki over three documents isn't a wiki, it's a summary.

        Prerequisite: LLM content creation

          The wiki pipeline needs to programmatically create and edit Seed documents from LLM output. Today, the backend handles document creation via the CreateDocumentChange gRPC, but the path from "LLM generates structured content" to "that
          content becomes a Seed document" doesn't exist as a reusable layer.

          What the wiki needs from this: a service in the frontend (TypeScript, same environment as the chat assistant) that takes structured LLM output -- something like "create a document at path /_wiki/entities/project-alpha with these blocks, these inline links, and these metadata attributes" -- and translates it into CreateDocumentChange requests with proper block IDs, MoveBlock + ReplaceBlock + SetAttribute operations, and signing. This is the bridge between
          LLM-generated content and Seed's document model.

          The AGENT capability role already exists in the proto definitions (proto/documents/v3alpha/access_control.proto) but
          isn't fully wired into a content-creation workflow. The wiki bot would be the first real consumer of AGENT capabilities
          -- a signing key that's authorized to write documents on behalf of the space owner, used by the pipeline to create and
          update wiki pages without requiring the owner's key for every change.

      No Gos

          Not replacing the existing chat assistant. The wiki provides better context for the assistant, not a different

        interaction model.

          Not building a new search system. The wiki uses existing FTS and semantic search.

          Not supporting real-time wiki updates as the user types. Processing is batched on a timer.

          Not handling non-text media (images, audio, video) as source material in v1.

          Not supporting user-defined wiki page schemas or templates in v1 beyond what's configurable in /_wiki/_config.

          Not modifying or deleting source documents. The wiki only reads sources, never touches them.

          Not addressing self-hosted site wiki generation. This proposal covers local desktop app wiki generation only.

        Self-hosted sites are a separate problem that involves running the pipeline server-side without an Electron process.

    Do you like what you are reading?. Subscribe to receive updates.

    Unsubscribe anytime