Pragmatic Roadmap Towards Onyx Semantic Layer

>>>

Pragmatic Roadmap Towards Onyx Semantic LayerFrom #implicit capabilities buried in code to an explicit, discoverable, usable platform for third-party developers.

Sacha

13 February 2026, 10:50

ContentPeopleComments4

## Table of Contents

0. Preamble

0.1 Single guiding principle: "Does this make it easier for a third-party developer, or non-tech product builder, to ship an application on Seed Hypermedia than it was yesterday?"

0.2 Rich Hickey's philosophy applied: what we take, what we leave behind

0.3 Frugality constraints: zero new runtime dependencies, zero triple stores, zero rewrites

0.4 What the semantic layer is not

1. Stepping Back Before a Hello World — Hypothesis Verification Protocol

1.1 The hypotheses

H1 — Describability without breakage

H2 — Discoverability without source code

H3 — Community extension without coordination

H4 — Build-time generation is complete

H5 — Zero runtime overhead is achievable

H6 — The vocabulary is useful, not just correct

1.2 Experiment ordering and dependencies

1.3 The Hello World artifact

1.4 Success and failure criteria for the POC

1.5 What we are not testing in the POC

1.6 Decision gate

2. Foundation Sprint — Vocabulary & Minimal Tooling (Weeks 1–2)

2.1 Mapping the existing landscape

2.1.1 Inventory of block types in Go code (structs) and TypeScript (interfaces)

2.1.2 Extraction of implicit relationships between types

2.1.3 Identification of the 5–7 priority block types for the initial vocabulary

2.1.4 Survey of current naming conventions (CBOR fields, JSON keys, gRPC names)

2.2 Canonical SKOS vocabulary

2.2.1 YAML source file as canonical reference (human-readable, version-controllable)

2.2.2 Generation of core-skos.ttl from YAML

2.2.3 Why SKOS — the 80/20 solution for RDF

2.2.4 SKOS-XL for rich label management

2.2.5 Future-proof evolution path

2.2.6 Publication of the vocabulary as a standalone open artifact (dedicated IPFS CID)

2.3 Embedded JSON-LD context

2.3.1 Dual context: JSON-LD version for web consumers, CBOR version for the backend

2.3.2 Mapping of existing short keys to qualified URIs ("type" → hm:blockType)

2.3.3 Context embedded statically (never resolved over HTTP at runtime)

2.4 Auto-generated documentation

2.4.1 pyLODE pipeline: core-skos.ttl → docs/index.html

2.4.2 Upgrade path: Widoco

2.4.3 Exploration tool: Ontospy

2.4.4 CI integration: re-generation on every commit to the vocabulary/ directory

2.4.5 Collaborative editing (future): VoCol

2.5 Week 2 deliverables

3. Developer Experience Transformation (Weeks 3–4)

3.1 @seed/vocabulary npm package

3.1.1 TypeScript types generated from the SKOS vocabulary

3.1.2 JSON-LD context embedded as a static import

3.1.3 Lightweight builders (~2KB) for each block type

3.1.4 Type-safe Schema.org integration

3.1.5 Zero runtime dependency on any JSON-LD library

3.2 Go backend integration — "JSON-LD as native JSON" option

3.2.1 Option 1 (recommended): Go consumes JSON-LD as native JSON

3.2.2 Direct CBOR parsing in Go without going through an RDF layer

3.2.3 Option 2 (if RDF processing needed in Go): lightweight libraries

3.2.4 Option 3 (hybrid): Python tooling + Go runtime

3.2.5 Option 4 (advanced): CGo bindings to Rust

3.2.6 Serialization flexibility

3.3 Build-time generation pipeline

3.3.1 Single script: YAML → SKOS Turtle → JSON-LD context → TypeScript types → Go struct tags → HTML docs

3.3.2 Integration into the existing Makefile or justfile

3.3.3 Non-regression tests: vocabulary compiles, types are valid, examples parse

3.4 Third-party developer validation (proof of concept)

3.4.1 Scenario: an external developer creates a custom block type

3.4.2 The developer journey

3.4.3 Success criteria

4. User Experience Transformation (Weeks 5–6)

4.1 Capability discoverability in the interface

4.1.1 The SKOS vocabulary feeds a navigable block type registry in the editor

4.1.2 Community extensions visible alongside native types

4.1.3 Analogous role to Schema.org

4.2 Self-describing documents

4.2.1 Every published document carries its own JSON-LD context (no external resolution)

4.2.2 Minimal DCAT metadata embedded: title, author, license, date

4.3 Access fluidity: private → shared → open

4.3.1 Architectural principle: transitions are metadata operations, never document reformatting

4.3.2 Three levels, one architecture

4.3.3 Transition table

4.3.4 Open vocabulary, controlled content

4.4 Native multilingual labels

4.4.1 SKOS native multilingual support

4.4.2 Interface adapts to the user's language without ad hoc translation work

4.4.3 Community-extensible languages

5. Data Transformation — Interoperability & Compliance (Weeks 7–10)

5.1 FAIR compliance by construction

5.1.1 FAIR principles already satisfied by existing architecture (9 of 15)

5.1.2 EU standards mapping (EIF → DCAT → CPSV → SKOS)

5.1.3 FAIR compliance across access levels (private / shared / open)

5.1.4 CPSV — Core Public Service Vocabulary

5.2 DCAT catalog

5.2.1 Static generation of a catalog JSON-LD endpoint from published documents

5.2.2 Catalog lists metadata, not content (compatible with selective sharing)

5.2.3 Harvesting testable with a local CKAN instance or the EU sandbox

5.3 Exploring existing data with SPARQL Anything

5.3.1 Query any format as RDF without conversion

5.3.2 Practical use cases

5.3.3 Strictly tooling / prototyping use (not production)

5.4 Declarative format mappings: FaCade-X and RML

5.4.1 The problem with hardcoded conversions

5.4.2 Declarative mappings as data

5.4.3 Implementation options

5.4.4 Recommended progression

5.5 Optional SHACL validation

5.5.1 Shapes for 2–3 critical block types (proof of concept)

5.5.2 Validation in CI

5.5.3 Adoption conditional on demonstrated value, not systematic

5.6 Handling arrays and ordering

5.6.1 The problem

5.6.2 Solution: JSON-LD @list

5.6.3 Alternative: split responsibilities

6. TypeScript Frontend Optimization — Zero-Overhead JSON-LD (Parallel Track)

6.1 Build-time generation (primary strategy)

6.2 Never resolve @context in the browser

6.3 In-memory caching (if processing needed)

6.4 Centralized helpers — no duplication

6.5 CI/CD validation of generated JSON-LD

6.6 Complete frontend architecture

7. Consolidation & Ecosystem (Months 3–6)

7.1 Community extensions

7.1.1 Concept scheme template for third-party extensions

7.1.2 Guide: "Publish a Custom Block Type in 15 Minutes"

7.1.3 Discovery mechanism: SKOS resolution via IPFS

7.2 External vocabulary alignment

7.2.1 Schema.org mapping

7.2.2 ActivityPub alignment

7.3 Conditional complexity escalation

7.3.1 SKOS → RDFS threshold

7.3.2 RDFS → OWL threshold

7.3.3 Architecture Decision Records

7.4 EU & data spaces (if strategically relevant)

7.4.1 Vocabulary registration on EU Joinup

7.4.2 CPSV alignment

7.4.3 High-Value Datasets initiative

8. Cross-Cutting Principles (Running Thread)

8.1 Frugality

8.2 Build-time, not runtime

8.3 Single source of truth

8.4 Reversibility

8.5 RDF as specification, not runtime

8.6 Content is infrastructure

Appendices

A. Concern mitigation matrix

B. Key quotes

C. Glossary: SKOS, DCAT, SHACL, JSON-LD, DAG-CBOR, CID, FAIR, EIF, CPSV, RML, SPARQL Anything, FaCade-X, OWL, RDFS

D. References---

0. Preamble

0.1 Single Guiding Principle

Every technical decision in this roadmap is filtered through one question:

Does this make it easier for a third-party developer, or non-tech product builder, to ship an application on Seed Hypermedia than it was yesterday?

If the answer is no, we simplify. If the answer is yes, we ship.

"Ship" is intentional. Not "build," not "prototype" — ship. The full loop: create, test, publish, make available to others. This forces every decision to account for the publish-to-IPFS step, the discoverability story, the community extension mechanism, not just the local development experience. And "non-tech product builder" is intentional too: if the semantic layer only serves TypeScript developers, it has failed. The whole point of making capabilities explicit, discoverable, and self-describing is that you should not need to read Go source code to assemble a working application.

This is not a documentation problem. It is an information architecture problem: the platform's capabilities — block types, document relationships, metadata, possible extensions — live in code today. They are implicit, buried in Go structs and TypeScript interfaces. For a developer who wants to build something new, the path is: read the source, reverse-engineer the conventions, and hope nothing breaks.

The semantic layer turns that situation into one where a third-party developer can query the platform to find out what exists, build custom interfaces without depending on the core team's documentation, and extend the vocabulary with their own types — compatible with the existing ecosystem, without prior coordination. Without a semantic layer, every new application requires a conversation with the Seed team. With one, the ecosystem can grow in a decentralized fashion — exactly like the web itself.

0.2 Rich Hickey's Philosophy Applied: What We Take, What We Leave Behind

Our approach draws directly from Rich Hickey's nuanced position on RDF (thanks for that @Alexandr), synthesized from five public talks and interviews:

"Effective Programs — 10 Years of Clojure" (2017, Clojure/conj) — primary source, contains the explicit "RDF got it right" slide

"Deconstructing the Database" (2012, JaxConf) — RDF triples vs. Datomic datoms

Cognicast Episode 103: Clojure Spec (2016) — RDF as prior art for property-level specs

"Datomic Ions" — universal relation (EAVT) as triples-plus-time

The Datomic Information Model (InfoQ article, 2013) — closed-world vs. open-world

What We Take

URI-based naming — solving parochialism. Hickey's strongest endorsement. In "Effective Programs" he introduces "parochialism" — the problem where each system defines names inside its own little world (a class, a table, an ADT). He calls RDF's use of URIs for names "a fantastically good idea" and models Clojure's namespace-qualified keywords directly on this principle. The core problem: when System A defines name inside Person and System B defines name inside MailingList, merging is brutal. Large enterprises end up introducing a third database — often an RDF database — as a federation point just to resolve these collisions. Global, URI-based naming prevents naming conflicts across independently developed systems. This is foundational for decentralized cooperation.

Atomic facts (subject/predicate/object). Hickey considers the triple structure "generally a good idea" because "it seems atomic" and "we really do want atomic facts." The triple decomposition breaks what he calls the "tyranny of the container" — information is not owned by some aggregate structure (table, class, document). Datomic's datom (entity/attribute/value/transaction) is directly derived from this primitive.

Properties independent of aggregates. From the Cognicast interview: "I would definitely point to RDF as prior art in terms of thinking about properties, they called them, independent of aggregates. Property definitions have stood on their own." This is the philosophical foundation for clojure.spec defining specs at the keyword level rather than at the struct/aggregate level. Properties (predicates/attributes) are first-class, globally named entities — not things owned by a container. Define semantics at the property/attribute level, not the record/class level. This enables composition and evolution.

Data mergeability across schemas. On a slide titled "Parochialism — names," Hickey states flatly: "RDF got it right." RDF facilitates data merging even when underlying schemas differ. It supports schema evolution over time without requiring all consumers to change. He contrasts this with elaborate type systems that create "parochial types" — "The more elaborate your type system is, the more parochial your types are." Interoperability and mergeability are features of the naming and decomposition strategy, not of complex type machinery.

Namespace qualification. Clojure's convention of reversed-domain namespace-qualified keys (:com.example/email) is explicitly modeled on RDF's URI-based naming. Hickey notes this makes all Clojure names conflict-free not only with other Clojure names but with Java names too. The namespace convention from RDF is one of the most practically valuable ideas, independent of the rest of the RDF stack.

Schema-per-attribute (not schema-per-table). Datomic defines schema at the attribute level, not via ontologies or table schemas. This is the right granularity for evolving systems.

What We Leave Behind

No temporal dimension — triples are not enough. Hickey's primary and most repeated critique. RDF triples represent that something is, but not when it became true or stopped being true: "I've argued that that's not enough, because it doesn't let you represent facts, because it doesn't have any temporal aspect." From the Datomic Information Model: "Without a temporal notion or proper representation of retraction, RDF statements are insufficient for representing historical information." Datomic uses datoms (EAVT: entity, attribute, value, transaction) rather than triples (SPO). "It is a relational system, but it has essentially one relation: EAVT. Entity, Attribute, Value, and Time. If you ever worked with RDF or anything like that, there are triples. So this is like triples plus time."

Open-world assumption. RDF and the semantic web assume an open world — anything not stated might still be true. Datomic deliberately adopts the closed-world assumption: "Being oriented toward business information systems, Datomic adopts the closed-world assumption, avoiding the challenges of universal naming, open-world, shared semantics etc. of the semantic web." For situated programs dealing with information you control, you need to know what you don't know. The open-world assumption makes practical validation and reasoning extremely difficult. It is appropriate for the global semantic web vision but impractical for most application-level information systems.

Properties as "just names" without additional machinery. From the Cognicast: "In RDF, without combining it with RDF schema or something else, the properties are just names." Raw RDF gives you naming and linking but not validation, constraints, or operational semantics. You need RDFS, OWL, or SHACL layered on top — and each layer adds complexity that may not be justified. The naming convention is valuable on its own; the rest of the semantic stack should be adopted only when justified by actual requirements.

Full stack complexity (OWL, reasoners, SPARQL, triple stores). Hickey never produces a systematic "bad RDF" list, but his design choices with Datomic speak clearly. He took what he liked and deliberately dropped the rest. Kept: atomic facts, URI-named attributes, universal relation, schema-per-attribute. Dropped: triple stores, SPARQL, OWL reasoning, blank nodes, open-world semantics, verbose serialization. Datomic uses Datalog (not SPARQL) for queries. It uses a universal relation (not a graph store). It defines schema at the attribute level (not via ontologies). You can capture the essential value of RDF's ideas without adopting the RDF infrastructure.

Blank nodes. The Datomic model requires that all entities have stable identifiers. This is a deliberate rejection of RDF's blank node concept — anonymous, non-addressable nodes that cause well-documented problems with merging, querying, and canonicalization. Prefer named resources. Blank nodes are a complexity trap.

Summary: Take vs. Leave

Take From RDFLeave BehindURI-based naming (global, conflict-free)Open-world assumptionAtomic facts (subject/predicate/object)Triples without timeProperties independent of aggregatesOWL reasoning / RDFS rigidityData mergeability across schemasTriple stores & SPARQL complexityNamespace qualificationBlank nodesSchema-per-attribute (not schema-per-table)Verbose serialization (XML/Turtle)

How Hickey's Philosophy Maps to Seed

Use SKOS + JSON-LD for naming and interoperability — this captures the "good parts" (URIs, namespaces, self-describing properties, mergeability) without the infrastructure overhead.

Keep runtime data in native formats (CBOR/Go) — just as Datomic stores datoms in its own efficient format while borrowing RDF's conceptual model, Hypermedia can keep DAG-CBOR as the storage/wire format while using RDF as a specification language.

RDF as specification, not runtime — Hickey essentially did this with Datomic: took the ideas of RDF (atomic facts, global naming, properties-as-first-class) and implemented them in a system that has zero RDF dependencies at runtime.

Add complexity only when justified — Hickey's progression (simple data → Clojure spec → Datomic schema) mirrors the proposed SKOS → RDFS → OWL upgrade path. Start minimal, formalize when patterns stabilize.

Closed-world for your system, open-world for federation — internally, validate with closed-world assumptions (SHACL). Externally, publish SKOS concepts that anyone can extend without coordination. This dual posture is exactly what Datomic achieves: strict internally, composable externally.

0.3 Frugality Constraints

Three hard constraints that apply to every section of this roadmap:

Zero new runtime dependencies. The Go backend keeps reading CBOR and JSON with encoding/json. No RDF library is imported in production Go code. No JSON-LD processing library ships in the browser bundle. RDF is a specification language, not a runtime system.

Zero triple stores. We are not adding an RDF database to the architecture. No Jena, no Virtuoso, no GraphDB. Python tools (rdflib, pyshacl, pyLODE) run at development/build time only. The Go runtime never touches RDF. The approach is a virtual semantic layer: in-memory Python processing for development and tooling, generating static artifacts consumed by Go and TypeScript at runtime. No persistent RDF database anywhere in the stack.

Zero rewrites. Existing formats (DAG-CBOR, gRPC) do not change. The semantic layer overlays: it describes what already exists, then enables extending it. The vocabulary is not a secondary artifact — it is the public specification of what the platform can do. Explicit and discoverable mean nothing without usable: the semantic layer provides the stable contracts developers need — TypeScript types generated from the vocabulary (autocomplete and compile-time checking), embedded JSON-LD contexts (zero network calls, zero runtime dependencies), and validation shapes (SHACL, clear error messages when an extension violates the contract). The goal is not for developers to "learn RDF." It is for them to npm install a package and have everything work.

0.4 What the Semantic Layer Is Not

It is not a triple store. We are not adding an RDF database to the architecture. The Go backend keeps reading CBOR and JSON with encoding/json. RDF is a specification language, not a runtime system.

It is not a rewrite. Existing formats (DAG-CBOR, gRPC) do not change. The semantic layer overlays: it describes what already exists, then enables extending it.

It is not heavyweight ontology engineering. We are using SKOS — the simplest W3C standard — not OWL, not reasoners, not automated inference. Complexity gets added only when concrete use cases justify it.

It is not an academic exercise. Success is measured by third-party developers shipping applications, not by theoretical elegance.

1. Stepping Back Before a Hello World — Hypothesis Verification Protocol

Before writing a single line of SKOS Turtle, before generating a single TypeScript type, we need to name our assumptions, design the cheapest possible experiments to test them, and define what failure looks like. This section is the engineering discipline that prevents the roadmap from becoming a plan we execute on faith.

The semantic layer is a bet. It bets that formalizing Seed's implicit capabilities into a structured vocabulary will unlock third-party development. That bet rests on a chain of hypotheses. If any link in the chain breaks, the entire value proposition collapses — or, more usefully, needs rethinking. The Hello World POC exists to stress-test this chain before committing to a multi-week sprint.

1.1 The Hypotheses

Each hypothesis is stated as a falsifiable claim. For each, we define the experiment, the success criterion, the failure signal, and what we learn from failure.

H1 — Describability Without Breakage

Claim: An existing Seed document (DAG-CBOR, published on IPFS) can be described by a SKOS vocabulary and a JSON-LD context without changing the document format, without breaking existing consumers, and without altering its CID.

Why it matters: If adding semantic metadata requires reformatting existing documents, the entire "overlay, don't rewrite" premise collapses. Every existing document on IPFS would need to be re-published.

Experiment: Take one real Seed document in DAG-CBOR. Write a minimal JSON-LD context that maps its existing short keys ("type", "text", "url") to qualified URIs. Feed the original CBOR (decoded to JSON) plus the context to a JSON-LD processor (jsonld.js or rdflib). Verify that the expanded output contains valid RDF triples. Verify that the original CBOR bytes are unchanged (same CID). Verify that the Go backend parses the document identically with and without the context.

Success criterion: The document is valid JSON-LD without byte-level changes. The Go json.Unmarshal call produces identical output. The IPFS CID is preserved.

Failure signal: The existing key names cannot be cleanly mapped (naming collisions, reserved JSON-LD keywords like @id conflicting with existing fields). The CBOR encoding includes structures that JSON-LD cannot represent (raw binary, CBOR tags).

What failure teaches us: If CBOR structures resist JSON-LD mapping, we need a translation layer between the CBOR wire format and the JSON-LD interoperability format, rather than treating them as the same artifact. This changes the architecture from "overlay" to "projection" — still viable, but different.

H2 — Discoverability Without Source Code

Claim: A developer who has never seen the Seed codebase can, given only the published SKOS vocabulary and JSON-LD context, correctly identify the available block types, their properties, and their relationships — and produce a valid document.

Why it matters: This is the core value proposition. If the vocabulary is not self-sufficient for understanding, it is documentation with extra steps, not a semantic layer.

Experiment: Write the minimal SKOS vocabulary for 3 block types (Paragraph, Image, Heading). Publish the vocabulary and JSON-LD context as standalone files. Give these files to a person who has not worked on Seed (an intern, a colleague from a different project, a friend). Ask them to: (a) list the available block types, (b) describe the properties of an Image block, (c) produce a JSON document that conforms to the vocabulary. Time them. Record where they get stuck.

Success criterion: The test subject can complete all three tasks in under 30 minutes using only the vocabulary files and the generated HTML documentation. They produce a valid document without asking a question about the Seed codebase.

Failure signal: The test subject cannot determine block properties from the SKOS vocabulary alone (SKOS describes concepts and hierarchies but not data properties — skos:prefLabel tells you the name, not the fields). The generated documentation lacks actionable examples. The JSON-LD context is incomprehensible without a tutorial.

What failure teaches us: SKOS alone may not be sufficient for the developer experience use case. We may need RDFS properties or JSON Schema alongside SKOS earlier than planned. The vocabulary needs example documents, not just definitions. The documentation generator (pyLODE) may need customization or supplementation.

H3 — Community Extension Without Coordination

Claim: A third party can define a new block type that extends the core vocabulary (via skos:broader) and produce a document using that type, without any change to the Seed codebase, without permission, and without coordination with the core team.

Why it matters: Decentralized extension is the difference between "a platform" and "a protocol." If extensions require a pull request or a registry update, the ecosystem cannot grow organically.

Experiment: After H2, ask the test subject to invent a new block type ("Quiz Block," "Map Block," whatever they want). They must: (a) define it as a SKOS concept with skos:broader pointing to a core type, (b) write a JSON-LD context for their extension, (c) produce a document that uses both core and custom block types, (d) publish it (even if just as a file, not on IPFS). Verify that standard RDF tools can parse the combined document and correctly resolve both core and custom types.

Success criterion: The combined document is valid JSON-LD. An RDF tool (rdflib, jsonld.js) can expand it and identify both the core types and the custom type. The custom type's skos:broader relationship to the core type is traversable.

Failure signal: JSON-LD context merging fails (conflicting term definitions between core and extension contexts). The custom block type cannot reference core types across namespace boundaries. The tooling cannot resolve multiple contexts in one document.

What failure teaches us: We may need a context-merging strategy (JSON-LD supports multiple contexts via arrays, but edge cases exist). We may need to publish the core context at a stable URL that extension contexts can reference. The "just publish on IPFS" story may need a discovery layer.

H4 — Build-Time Generation Is Complete

Claim: A single YAML source file can generate all downstream artifacts — SKOS Turtle, JSON-LD context, TypeScript types, Go struct tags, HTML documentation — without manual intervention, and the generated artifacts are correct and usable.

Why it matters: If the generation pipeline is lossy or requires manual fixups, the "single source of truth" principle is compromised. Drift between artifacts is the failure mode this architecture is designed to prevent.

Experiment: Write the YAML for the 3 test block types from H2. Run the generation script. Verify: (a) the Turtle is valid SKOS (parse with rapper or rdflib), (b) the JSON-LD context correctly maps short keys to URIs (expand a test document), (c) the TypeScript types compile (tsc --noEmit), (d) the Go struct tags produce correct parsing (json.Unmarshal round-trip), (e) the HTML documentation contains all concepts with correct labels, definitions, and hierarchies.

Success criterion: All five artifact types are generated from one YAML file, all pass validation, and none require manual editing.

Failure signal: The YAML schema cannot express something needed by one of the output formats (e.g., CBOR integer codes have no natural place in YAML). The TypeScript type generator cannot infer field types from SKOS (SKOS describes concepts, not data shapes). The pyLODE output is missing critical information or is unreadable.

What failure teaches us: The YAML schema may need to be richer than pure SKOS metadata — it may need to include property definitions (field names, types, cardinality) that go beyond what SKOS expresses. This is fine — the YAML is our canonical source, not a SKOS file. SKOS is an output format, not the input format.

H5 — Zero Runtime Overhead Is Achievable

Claim: The semantic layer adds zero measurable overhead to document parsing in Go and zero bytes to the browser JavaScript bundle.

Why it matters: If the semantic layer degrades performance, it will be removed. Runtime cost is a hard constraint, not a trade-off.

Experiment: Benchmark Go json.Unmarshal on a test document with and without @context and @type fields. Benchmark CBOR decoding with and without integer prefix codes. Measure the npm package size (should be <5KB gzipped). Verify that no runtime JavaScript code from the vocabulary package executes in the browser.

Success criterion: Go parsing overhead is <1% (within measurement noise). The npm package ships zero runtime code (types only, plus static JSON). Browser bundle size delta is zero (JSON-LD strings are static, not processed).

Failure signal: The @context field in JSON adds non-trivial bytes to every document. CBOR prefix tables add decoding complexity. The TypeScript types import something that pulls in a dependency chain.

What failure teaches us: We may need to separate the interoperability context (shipped alongside the document for external consumers) from the wire format (CBOR, no context overhead). The npm package structure may need stricter tree-shaking boundaries.

H6 — The Vocabulary Is Useful, Not Just Correct

Claim: The generated HTML documentation, the TypeScript autocomplete, and the SKOS vocabulary actually help a developer make decisions — not just describe what already exists.

Why it matters: A vocabulary that restates the source code in a different syntax is overhead, not value. The value is in the navigability, the discoverability, the cross-referencing, the examples.

Experiment: Give the H2 test subject a task: "You have a document with paragraphs and images. You want to add a table of contents. Using only the vocabulary and documentation, figure out how." Observe whether the vocabulary helps them find the HeadingBlock type, understand its relationship to document structure, and determine what properties it needs.

Success criterion: The vocabulary and documentation provide enough context for the subject to solve the task without reading source code or asking for help. The SKOS hierarchy, the skos:scopeNote annotations, and the examples are the critical assets.

Failure signal: The subject finds the type definitions but cannot figure out how types compose into a document. The vocabulary describes atomic blocks but not their assembly into documents. The documentation is a flat list, not a navigable resource.

What failure teaches us: We may need to model document-level composition (how blocks assemble into documents) in the vocabulary, not just block-level taxonomy. We may need RDFS properties (hm:hasBlock, hm:position) earlier than planned. The generated documentation may need custom templates with usage-oriented guides, not just reference pages.

1.2 Experiment Ordering and Dependencies

The hypotheses have dependencies. Testing them in the wrong order wastes effort.

H1 (Describability)
 │
 ├──→ H4 (Build-Time Generation)
 │     │
 │     └──→ H5 (Zero Overhead)
 │
 └──→ H2 (Discoverability)
       │
       ├──→ H3 (Community Extension)
       │
       └──→ H6 (Usefulness)

Phase A (Day 1–2): H1 — Can we describe what exists?

If the existing CBOR documents resist JSON-LD mapping, nothing else matters. This is the load-bearing hypothesis. One real document, one context file, one round-trip test. Half a day of work, maximum.

Phase B (Day 3–4): H4 — Can we generate everything from YAML?

If the generation pipeline works, we have the artifacts for all subsequent tests. If it does not, we know which output formats need special handling. Two days of scripting, producing concrete files.

Phase C (Day 5): H5 — Does it cost anything at runtime?

Quick benchmarks. If performance is fine (expected), move on. If not, we learn early and adjust the architecture before building further.

Phase D (Day 6–8): H2 + H6 — Does it help a real person?

The human test. This is the most important phase. All the technical correctness in the world means nothing if a developer cannot navigate the vocabulary. Give the artifacts to a test subject. Watch what happens. Take notes.

Phase E (Day 9–10): H3 — Can someone extend without us?

The decentralization test. If H2 passed, this should follow naturally. If H3 fails despite H2 passing, the problem is in context merging or namespace resolution, not in the vocabulary itself.

1.3 The Hello World Artifact

The Hello World is not a demo. It is the minimal artifact that exercises all six hypotheses simultaneously. Concretely, it is:

One YAML file (hello-world.yaml) defining three block types: Paragraph, Image, Heading.

One generation run producing: hello.ttl (SKOS Turtle), hello-context.jsonld (JSON-LD context), hello-types.ts (TypeScript), hello-types.go (Go structs), hello-docs/index.html (pyLODE documentation).

One real Seed document (hello-doc.cbor) taken from the existing codebase, described by the generated context, parsed by Go with and without the context, expanded by a JSON-LD processor.

One extension (hello-extension.ttl) defining a custom block type by a "third party" (us pretending), producing a combined document, verifying RDF tool interoperability.

One test subject (not from the core team) attempting the H2 and H6 tasks using only the generated artifacts.

1.4 Success and Failure Criteria for the POC

The POC succeeds if:

All six hypotheses pass (even with minor caveats or known limitations)

The test subject completes the discoverability task without source code access

The generation pipeline runs end-to-end from YAML to all artifacts

Zero runtime overhead is confirmed by benchmarks

The combined core+extension document is valid and traversable

The POC fails usefully if:

H1 fails → we know the overlay model needs a projection layer (architecture change, not a showstopper)

H2 fails → we know SKOS alone is insufficient and RDFS properties are needed earlier (scope change)

H3 fails → we know context merging needs explicit design (technical fix, not architecture change)

H4 fails → we know the YAML schema needs richer property metadata (schema redesign, bounded effort)

H5 fails → we know interop context and wire format must be separated (already anticipated as a possibility)

H6 fails → we know documentation needs usage guides, not just reference pages (editorial work)

The POC fails catastrophically if:

Existing CBOR documents are fundamentally incompatible with JSON-LD (e.g., CBOR tags that have no JSON-LD equivalent) AND the cost of a projection layer exceeds the value of the semantic layer

The generation pipeline requires manual intervention that cannot be automated

The core team concludes that the vocabulary adds complexity without observable value for their use cases

A catastrophic failure is a valid outcome. It means the semantic layer, as designed, is wrong for this project. Better to learn that in 10 days than in 10 weeks.

1.5 What We Are Not Testing in the POC

Equally important: what is explicitly out of scope for the Hello World.

Not testing FAIR/DCAT compliance. EU interoperability is a downstream benefit, not a core hypothesis. If the vocabulary works for developers, DCAT metadata is a 15-line addition later.

Not testing SHACL validation. Validation is an optional layer. The POC tests whether the vocabulary is useful, not whether it can be enforced.

Not testing SPARQL Anything integration. Querying existing CBOR as virtual RDF is a powerful tool, but it is independent of the vocabulary's value proposition. Test it separately.

Not testing community scale. One test subject is not a community. The POC tests the mechanism (can one person extend?), not the sociology (will people want to?).

Not testing production deployment. CI/CD pipelines, npm publishing, GitHub Pages hosting — all important, all orthogonal to the core hypotheses. Build the pipeline after the value is confirmed.

1.6 Decision Gate

At the end of the POC (Day 10), the team makes an explicit go/no-go decision:

Go: Proceed to Section 2 (Foundation Sprint). The hypotheses hold. The architecture is sound. The value is observable.

Go with adjustments: Proceed, but modify the roadmap based on POC learnings. Document each adjustment as an Architecture Decision Record. Typical adjustments: "YAML schema needs property metadata, not just SKOS concepts" or "context merging requires a documented strategy."

No-go: Stop. The semantic layer is not the right investment for Seed right now. Document why. Revisit when the preconditions change (e.g., when the block type model stabilizes, when a third-party developer actually asks for formal types, when an EU compliance requirement materializes).

A no-go is not a failure of the project. It is a success of the POC: we spent 10 days instead of 10 weeks learning that the timing is wrong.

2. Foundation Sprint — Vocabulary & Minimal Tooling (Weeks 1–2)

2.1 Mapping the Existing Landscape

2.1.1 Inventory of Block Types in Go Code (Structs) and TypeScript (Interfaces)

Walk the codebase and extract every block type currently defined. In Go, these are structs with CBOR/JSON tags. In TypeScript, these are interfaces or type aliases. The goal is a flat list of every type the platform currently supports, with its fields, field types, and any implicit constraints (required fields, allowed values, references to other types).

2.1.2 Extraction of Implicit Relationships Between Types

Identify inheritance, composition, and reference relationships that exist in code but are not formalized. A ParagraphBlock and a HeadingBlock might both embed a TextContent struct — that is an implicit skos:broader relationship to a TextBlock concept. A DocumentBlock might reference a list of child blocks — that is a hm:hasBlock relationship waiting to be named.

2.1.3 Identification of the 5–7 Priority Block Types for the Initial Vocabulary

Select the block types that cover the most common use cases and that third-party developers are most likely to need. Candidates: Paragraph, Heading, Image, Video, Code, Embed, Comment. The exact list is a question for Eric and the core team.

2.1.4 Survey of Current Naming Conventions

Document every naming convention currently in use: CBOR field names, JSON key names, gRPC service and message names, TypeScript interface names. The vocabulary must map cleanly to these existing conventions — short keys in CBOR that expand to qualified URIs in JSON-LD, without breaking any existing consumer.

2.2 Canonical SKOS Vocabulary

2.2.1 YAML Source File as Canonical Reference

The single source of truth for the vocabulary is a YAML file. It is human-readable, version-controllable, diffable, and trivially parseable by any language. Every other artifact — Turtle, JSON-LD, TypeScript types, Go struct tags, HTML documentation — is generated from this file.

# block-types.yaml — canonical source
concept_scheme:
  id: hm:BlockTypes
  label:
    en: "Hypermedia Block Types"
    fr: "Types de blocs Hypermedia"
  description:
    en: "Taxonomy of content block types in Seed Hypermedia"

concepts:
  - id: hm:Block
    top_concept: true
    label:
      en: "Block"
      fr: "Bloc"
    definition:
      en: "Base concept for all content blocks"

  - id: hm:ParagraphBlock
    broader: [hm:Block]
    label:
      en: "Paragraph Block"
      fr: "Bloc Paragraphe"
    alt_label:
      en: ["Text Block"]
    definition:
      en: "Block containing plain or rich text content"
    example:
      en: "A paragraph of text with optional formatting"

  - id: hm:ImageBlock
    broader: [hm:Block]
    label:
      en: "Image Block"
      fr: "Bloc Image"
    definition:
      en: "Block containing an image (IPFS CID or HTTP URL)"
    scope_note:
      en: "Supports PNG, JPEG, GIF, WebP, SVG formats"

  - id: hm:MediaBlock
    broader: [hm:Block]
    label:
      en: "Media Block"
      fr: "Bloc Média"
    definition:
      en: "Block containing media content"

  - id: hm:VideoBlock
    broader: [hm:MediaBlock]
    label:
      en: "Video Block"
      fr: "Bloc Vidéo"

  - id: hm:AudioBlock
    broader: [hm:MediaBlock]
    label:
      en: "Audio Block"
      fr: "Bloc Audio"

  - id: hm:HeadingBlock
    broader: [hm:Block]
    label:
      en: "Heading Block"
      fr: "Bloc Titre"

  - id: hm:CodeBlock
    broader: [hm:Block]
    label:
      en: "Code Block"
      fr: "Bloc Code"

2.2.2 Generation of core-skos.ttl from YAML

A build script reads the YAML and produces valid SKOS Turtle. The generated output for the above YAML:

@prefix hm: <https://hypermedia.foundation/ns/core#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix dcterms: <http://purl.org/dc/terms/> .

# Concept Scheme (the vocabulary itself)
hm:BlockTypes a skos:ConceptScheme ;
    skos:prefLabel "Hypermedia Block Types"@en ;
    skos:prefLabel "Types de blocs Hypermedia"@fr ;
    dcterms:description "Taxonomy of content block types in Seed Hypermedia"@en .

# Top concepts
hm:Block a skos:Concept ;
    skos:inScheme hm:BlockTypes ;
    skos:topConceptOf hm:BlockTypes ;
    skos:prefLabel "Block"@en ;
    skos:prefLabel "Bloc"@fr ;
    skos:definition "Base concept for all content blocks"@en .

# Specific concepts
hm:ParagraphBlock a skos:Concept ;
    skos:inScheme hm:BlockTypes ;
    skos:broader hm:Block ;
    skos:prefLabel "Paragraph Block"@en ;
    skos:prefLabel "Bloc Paragraphe"@fr ;
    skos:altLabel "Text Block"@en ;
    skos:definition "Block containing plain or rich text content"@en ;
    skos:example "A paragraph of text with optional formatting"@en .

hm:ImageBlock a skos:Concept ;
    skos:inScheme hm:BlockTypes ;
    skos:broader hm:Block ;
    skos:prefLabel "Image Block"@en ;
    skos:prefLabel "Bloc Image"@fr ;
    skos:definition "Block containing an image (IPFS CID or HTTP URL)"@en ;
    skos:scopeNote "Supports PNG, JPEG, GIF, WebP, SVG formats"@en .

hm:MediaBlock a skos:Concept ;
    skos:inScheme hm:BlockTypes ;
    skos:broader hm:Block ;
    skos:prefLabel "Media Block"@en ;
    skos:prefLabel "Bloc Média"@fr ;
    skos:definition "Block containing media content"@en .

hm:VideoBlock a skos:Concept ;
    skos:inScheme hm:BlockTypes ;
    skos:broader hm:MediaBlock ;
    skos:prefLabel "Video Block"@en .

hm:AudioBlock a skos:Concept ;
    skos:inScheme hm:BlockTypes ;
    skos:broader hm:MediaBlock ;
    skos:prefLabel "Audio Block"@en .

2.2.3 Why SKOS — The 80/20 Solution for RDF

SKOS (Simple Knowledge Organization System) is the simplest W3C semantic standard. It provides 80% of RDF benefits with 20% of the complexity.

Weak semantics, strong flexibility. SKOS properties like skos:broader (more general) and skos:narrower (more specific) create hierarchies without rigid formal constraints. The semantics are intentionally weak — they express that one concept is "somehow more general" than another, without being overly strict about the exact nature of the hierarchical relationship. A concept can have any number of broader concepts — poly-hierarchies are explicitly allowed. This matches how real-world terminology and block type taxonomies evolve organically in decentralized communities.

Self-describing categories for decentralized cooperation. Each concept carries its own labels, definitions, and relationships. No central terminology registry required. Community can publish SKOS concept schemes independently. Relationships discoverable through graph traversal. Multiple languages supported natively (skos:prefLabel@en, skos:prefLabel@fr).

Proven at massive scale. SKOS is battle-tested in production systems managing billions of concepts:

Library of Congress Subject Headings (~400K concepts)

Getty Art & Architecture Thesaurus (~370K concepts)

EU Vocabularies (EUROVOC: ~7K concepts, 24 languages)

AGROVOC (FAO): ~40K agricultural concepts, 40 languages

Wikidata uses SKOS for category hierarchies

If it works for the Library of Congress, it will work for Hypermedia block types.

Natural fit for block type taxonomy. The block type hierarchy maps perfectly to SKOS:

Block (skos:ConceptScheme)
├── TextBlock (skos:Concept)
│   ├── ParagraphBlock (skos:narrower)
│   │   └── RichParagraphBlock (skos:narrower)
│   └── HeadingBlock (skos:narrower)
├── MediaBlock (skos:Concept)
│   ├── ImageBlock (skos:narrower)
│   ├── VideoBlock (skos:narrower)
│   └── AudioBlock (skos:narrower)
└── InteractiveBlock (skos:Concept)
    ├── CommentBlock (skos:narrower)
    └── PollBlock (skos:narrower)

Community extensions just add new skos:Concept instances with skos:broader relationships. No permission needed, no schema updates, no breaking changes.

Poly-hierarchy example for community extensions:

# Base concept (core team defines)
hm:MediaBlock a skos:Concept ;
    skos:prefLabel "Media Block"@en ;
    skos:definition "Block containing media content"@en .

# Community extension (poly-hierarchy)
community:InteractiveVideo a skos:Concept ;
    skos:broader hm:MediaBlock ;           # It's a media block
    skos:broader community:InteractiveContent ;  # AND interactive content
    skos:prefLabel "Interactive Video"@en ;
    skos:altLabel "Clickable Video"@en ;
    skos:related community:HotspotTechnology .

No single inheritance restriction. No need for complex class modeling. Just navigable relationships.

2.2.4 SKOS-XL for Rich Label Management

When you need multilingual support, historical label variants, versioned terminology, or label provenance, SKOS-XL extends SKOS with reified labels (labels as first-class entities):

# Standard SKOS (labels as literals)
hm:ImageBlock skos:prefLabel "Image Block"@en .

# SKOS-XL (labels as resources)
hm:ImageBlock skosxl:prefLabel [
    a skosxl:Label ;
    skosxl:literalForm "Image Block"@en ;
    dcterms:created "2024-01-15"^^xsd:date ;
    dcterms:creator <https://seed.team/alice> ;
    skosxl:labelRelation [
        skosxl:literalForm "Picture Block"@en ;
        rdfs:comment "Deprecated term, use 'Image Block'"
    ]
] .

This enables sophisticated terminology management — tracking that "Paragraph Block" was renamed to "Text Block" and then to "RichText Block" — without complex ontology machinery. Deploy SKOS-XL only when the label management use case is concrete, not speculatively.

2.2.5 Future-Proof Evolution Path

SKOS → RDFS → OWL is a well-trodden upgrade path:

Stage 1 (Now): SKOS concepts with weak hierarchies. Simple, flexible, decentralized. Community can extend freely.

Stage 2 (If needed): Add RDFS properties. Define formal properties (e.g., hm:hasAuthor). Still compatible with SKOS.

Stage 3 (If needed): Upgrade to OWL. Add stricter constraints. Enable reasoning (if justified by use case).

You are not locked in. SKOS is the safe starting point that does not close doors. Each escalation is documented via an Architecture Decision Record.

CriterionSKOS AdvantageComplexitySimplest RDF standard, ~10 properties totalFlexibilityPoly-hierarchies, weak semanticsDecentralizationSelf-describing, no central registryScaleProven with billions of conceptsToolingExcellent support (rdflib, SPARQL Anything)Future-proofEasy upgrade path to RDFS/OWL

2.2.6 Publication as a Standalone Open Artifact

The vocabulary is published as its own IPFS CID, independent of any document's access level. This is a non-negotiable architectural constraint. The vocabulary is the shared language — block types, property definitions, concept hierarchies. If the vocabulary is gated, no third-party developer can build compatible extensions without prior access. Open vocabulary, controlled content.

2.3 Embedded JSON-LD Context

2.3.1 Dual Context: JSON-LD for Web, CBOR for Backend

Two representations of the same context, generated from the same YAML source:

JSON-LD context for web consumers, TypeScript, documentation, and external interoperability:

{
  "@context": {
    "hm": "https://hypermedia.foundation/ns/core#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "dct": "http://purl.org/dc/terms/",
    "dcat": "http://www.w3.org/ns/dcat#",
    "@type": "hm:blockType",
    "text": "hm:text",
    "imageUrl": "hm:imageUrl",
    "altText": "hm:altText",
    "blocks": {
      "@id": "hm:hasBlock",
      "@container": "@list"
    }
  }
}

CBOR context for the Go backend and IPFS storage, using integer vocabulary codes for dramatic size reduction:

CBOR Prefix Table:
  1 → "https://hypermedia.foundation/ns/core#"
  2 → "http://www.w3.org/2004/02/skos/core#"
  3 → "http://purl.org/dc/terms/"
  4 → "http://www.w3.org/ns/dcat#"

Field codes:
  10 → blockType
  11 → text
  12 → imageUrl
  13 → altText
  14 → hasBlock

Same semantics, different wire formats. The JSON-LD context and the CBOR prefix table are both generated from the canonical YAML.

2.3.2 Mapping Existing Short Keys to Qualified URIs

The critical design constraint: existing CBOR data uses short keys like "type", "text", "url". These must map cleanly to qualified URIs without breaking any existing consumer. The JSON-LD @context is precisely this mapping:

{
  "@context": {
    "type": "@type",
    "text": "hm:text",
    "url": "hm:imageUrl",
    "alt": "hm:altText"
  }
}

Existing JSON remains valid JSON-LD. A tool that understands JSON-LD can expand the short keys to full URIs. A tool that does not understand JSON-LD still reads the same JSON it always did. Zero breakage.

2.3.3 Context Embedded Statically — Never Resolved Over HTTP at Runtime

This is a critical performance and reliability constraint. JSON-LD libraries, by default, fetch @context URLs over HTTP at runtime. This is slow, unreliable, and unnecessary for our use case. The context is a static artifact. It ships with the document, embedded in the IPFS payload. It ships with the npm package, imported as a local JSON file. It is never fetched over the network.

IPFS content-addressing makes this even more natural: the context has its own CID. It is immutable. It is the perfect cache candidate — CID = cache key, cache never invalidated.

2.4 Auto-Generated Documentation

2.4.1 pyLODE Pipeline

pyLODE is the recommended tool for Phase 1 documentation. It generates clean HTML documentation from RDF ontologies with zero configuration.

GitHub: https://github.com/RDFLib/pyLODE

Installation: pip install pylode

Usage: pylode core-skos.ttl -o documentation.html

What it generates:

Class hierarchy with visual diagrams

Property tables with domain/range

Examples embedded inline

Cross-references between concepts

Multi-language support (labels in all languages)

Output structure:

documentation.html
├── Metadata (title, version, authors)
├── Table of Contents
├── Classes
│   ├── Block (with properties)
│   ├── ImageBlock (with inheritance)
│   └── ParagraphBlock
├── Properties
│   ├── creator (with domain/range)
│   └── text
└── Examples

Characteristics: Zero configuration, works out-of-box. Pure Python, no Java dependencies. Fast generation (<1 second for typical ontology). Works with SKOS, RDFS, OWL. Used by Australian Government, CSIRO, EU projects.

Limitation: Single-page output (can be large for huge ontologies). Limited customization of styling. Best for quick documentation of small-to-medium vocabularies (5–100 concepts).

2.4.2 Upgrade Path: Widoco

When the vocabulary grows beyond ~50 concepts or professional multi-page documentation is needed:

GitHub: https://github.com/dgarijo/Widoco

Installation: Docker: docker pull dgarijo/widoco

Usage: java -jar widoco.jar -ontFile ontology.ttl -outFolder docs/

What it generates: Multi-page website (index, classes, properties, examples). Visual diagrams (WebVOWL integration). Metadata sections (authors, license, versioning). Provenance tracking (who created what, when). Evaluation reports (ontology quality metrics). Used by W3C Community Groups, EU Horizon projects.

docker run -v $(pwd):/data dgarijo/widoco \
  -ontFile /data/core-skos.ttl \
  -outFolder /data/docs \
  -includeAnnotationProperties \
  -webVowl

2.4.3 Exploration Tool: Ontospy

For internal development and ontology exploration:

GitHub: https://github.com/lambdamusic/Ontospy

Installation: pip install ontospy

Usage: ontospy scan ontology.ttl --viz

Features: Searchable class/property listings. Graph visualizations (D3.js interactive). Statistics dashboard (class count, property usage). Export to various formats (CSV, JSON). Great for exploring third-party ontologies during alignment work.

from ontospy import Ontospy

model = Ontospy("core-skos.ttl")
for cls in model.all_classes:
    print(f"{cls.label}: {len(cls.descendants())} subclasses")
model.export_visualization("docs/")

2.4.4 CI Integration: Auto-Generation on Every Commit

Documentation always in sync with vocabulary. Zero manual maintenance.

# .github/workflows/docs.yml
name: Generate Documentation

on:
  push:
    paths:
      - 'vocabulary/*.ttl'
      - 'src/block-types.yaml'

jobs:
  generate-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Install pyLODE
        run: pip install pylode

      - name: Generate Documentation
        run: pylode vocabulary/core-skos.ttl -o docs/index.html

      - name: Deploy to GitHub Pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: ./docs

2.4.5 Collaborative Editing (Future): VoCol

If the vocabulary becomes community-driven with 10+ contributors:

GitHub: https://github.com/vocol/vocol

Features: Web-based editor for non-technical contributors. Git version control integrated. Documentation generation built-in. Change tracking and approval workflows.

Trade-off: Complex setup (full web application). Overkill for small teams. Deploy only when the contributor base justifies it.

ToolComplexityBest Use CaseOutputpyLODELowQuick docs for small vocabulariesSingle HTMLWidocoMediumProfessional multi-page siteWebsiteOntospyMediumInteractive explorationHTML + JSONVoColHighCommunity collaborationWeb app

2.5 Week 2 Deliverables

Target file structure:

hypermedia-vocab/
├── src/
│   └── block-types.yaml          # Canonical source
├── generated/
│   ├── core-skos.ttl             # SKOS Turtle
│   ├── context.jsonld            # JSON-LD context
│   ├── context.cbor              # CBOR context (integer prefixes)
│   └── types.ts                  # Generated TypeScript types
├── docs/
│   └── index.html                # pyLODE documentation
├── examples/
│   ├── basic-document.jsonld
│   └── basic-document.cbor
└── scripts/
    └── generate.py               # YAML → all artifacts

3. Developer Experience Transformation (Weeks 3–4)

3.1 @seed/vocabulary npm Package

3.1.1 TypeScript Types Generated from the SKOS Vocabulary

The build pipeline reads the canonical YAML and generates TypeScript interfaces. These are not hand-maintained — they are derived artifacts, always in sync with the vocabulary.

// types.ts (auto-generated from block-types.yaml)
export interface Block {
  '@context': 'https://hypermedia.foundation/ns/core';
  '@type': string;
}

export interface ParagraphBlock extends Block {
  '@type': 'ParagraphBlock';
  text: string;
}

export interface ImageBlock extends Block {
  '@type': 'ImageBlock';
  imageUrl: string;
  altText?: string;
}

export interface VideoBlock extends Block {
  '@type': 'VideoBlock';
  videoUrl: string;
  duration?: number;
}

export type AnyBlock = ParagraphBlock | ImageBlock | VideoBlock;

3.1.2 JSON-LD Context Embedded as Static Import

The context ships as a JSON file inside the npm package. Never fetched over HTTP.

// Usage in application code
import HM_CONTEXT from '@seed/vocabulary/context.json';
import type { ImageBlock } from '@seed/vocabulary';

// The context is a local file, bundled at build time
// Zero network calls, zero runtime dependencies

3.1.3 Lightweight Builders (~2KB)

Instead of importing a full JSON-LD library (~100KB+ minified), the package ships tiny, type-safe builder functions:

// jsonld-builders.ts (~2KB total)

export function buildImageBlock(data: {
  imageUrl: string;
  altText?: string;
}): string {
  return JSON.stringify({
    "@context": "https://hypermedia.foundation/ns/core",
    "@type": "ImageBlock",
    "imageUrl": data.imageUrl,
    "altText": data.altText
  });
}

export function buildParagraphBlock(data: {
  text: string;
}): string {
  return JSON.stringify({
    "@context": "https://hypermedia.foundation/ns/core",
    "@type": "ParagraphBlock",
    "text": data.text
  });
}

// Fluent alternative
export class JsonLdBuilder {
  private data: Record<string, any> = {};

  context(url: string): this {
    this.data['@context'] = url;
    return this;
  }

  type(type: string): this {
    this.data['@type'] = type;
    return this;
  }

  prop(key: string, value: any): this {
    this.data[key] = value;
    return this;
  }

  build(): string {
    return JSON.stringify(this.data);
  }
}

Contrast with the heavy approach (do not do this):

// BAD: 150KB+ for basic structured data
import jsonld from 'jsonld';         // ~100KB minified
import { schema } from 'jsonld-schema';  // More KB

3.1.4 Type-Safe Schema.org Integration

For block types that map to Schema.org (Article, ImageObject, Person), use the schema-dts package — zero runtime cost, just TypeScript type definitions:

import { Article, ImageObject } from 'schema-dts';

interface HypermediaArticle extends Article {
  '@context': 'https://schema.org';
  '@type': 'Article';
  headline: string;
  author: {
    '@type': 'Person';
    name: string;
  };
  image: ImageObject;
}

function buildArticleJsonLd(meta: ArticleMeta): HypermediaArticle {
  return {
    '@context': 'https://schema.org',
    '@type': 'Article',
    headline: meta.title,
    author: {
      '@type': 'Person',
      name: meta.authorName
    },
    image: {
      '@type': 'ImageObject',
      url: meta.imageUrl
    }
  };
}

TypeScript compiler catches mistakes. IDE autocomplete for JSON-LD. Predictable structure equals easier optimization.

Libraries specifically for TypeScript + JSON-LD + SEO: schema-dts (type definitions for Schema.org, zero runtime cost), next-seo (Next.js SEO with JSON-LD support), @artsy/fresnel (responsive JSON-LD for metadata).

3.1.5 Zero Runtime Dependency on Any JSON-LD Library

The entire npm package contains: generated TypeScript types (zero runtime cost), a JSON file (the context), and ~2KB of builder functions. No jsonld.js, no rdf-ext, no n3. The browser ships static strings.

3.2 Go Backend Integration — "JSON-LD as Native JSON" Option

3.2.1 Option 1 (Recommended): Go Consumes JSON-LD as Native JSON

JSON-LD is valid JSON. The Go backend treats it as standard JSON. No RDF-specific libraries needed in production:

// types.go (auto-generated from block-types.yaml, no RDF imports)
package hypermedia

type ImageBlock struct {
    Context  string `json:"@context"`
    Type     string `json:"@type"`
    ImageURL string `json:"imageUrl"`
    AltText  string `json:"altText,omitempty"`
}

func ParseBlock(data []byte) (*ImageBlock, error) {
    var block ImageBlock
    err := json.Unmarshal(data, &block) // stdlib only
    return &block, err
}

Zero RDF-specific dependencies. Idiomatic Go code. Full performance (no parsing overhead). Interoperability (JSON-LD is valid RDF for other tools).

3.2.2 Direct CBOR Parsing in Go Without RDF Layer

The Go backend reads DAG-CBOR directly. The CBOR prefix table (integer codes mapping to URI prefixes) is a lookup table in Go, not an RDF processing step. CBOR field code 10 maps to hm:blockType. This is a map lookup, not graph processing.

3.2.3 Option 2 (If RDF Processing Needed in Go): Lightweight Libraries

If actual RDF graph operations are needed in Go at some future point:

Available libraries:

piprate/json-gold: Pure Go JSON-LD processor. Actively maintained (2023+ commits). Implements W3C JSON-LD spec. Good for expansion/compaction.

knakk/rdf: Basic RDF support. Simple triple manipulation. Turtle/N-Triples parsing. Used in production systems.

underlay/go-*: Emerging ecosystem. Modern approach to knowledge graphs. Growing community.

// Example using piprate/json-gold (only if needed)
import "github.com/piprate/json-gold/ld"

proc := ld.NewJsonLdProcessor()
expanded, err := proc.Expand(jsonLdDoc, nil)

3.2.4 Option 3 (Hybrid): Python Tooling + Go Runtime

Development time: Python tools (rdflib, pyLODE) generate artifacts. Runtime: Go consumes generated JSON-LD contexts and types. Build step: Python generates Go code from RDF vocabulary.

# Build time (Python)
python generate_go_types.py core-skos.ttl > types.go

# Runtime (Pure Go)
# Uses generated types, no RDF library needed

3.2.5 Option 4 (Advanced): CGo Bindings to Rust

If high-performance RDF processing becomes necessary: Oxigraph (Rust triple store) has excellent performance. CGo integration possible. Only if requirements justify the complexity. This is a last resort, not a starting point.

Recommendation: Start with Option 1. Evaluate based on actual needs. The JSON-LD design allows starting simple (pure JSON) and adding RDF processing later if justified.

3.2.6 Serialization Flexibility

Same semantics, different syntax. Choose based on context:

Development: Turtle (human-readable, for editing vocabularies)

Runtime/API: JSON-LD (Go-native JSON parsing)

Storage: CBOR (compact, IPFS-friendly, compatible with existing DAG-CBOR)

Content negotiation is a native RDF capability. A document published on IPFS can be served in any serialization from an edge gateway.

3.3 Build-Time Generation Pipeline

3.3.1 Single Script: YAML → Everything

One script, one source of truth, all artifacts:

block-types.yaml
    ├── → core-skos.ttl        (SKOS Turtle)
    ├── → context.jsonld        (JSON-LD context)
    ├── → context.cbor          (CBOR prefix table)
    ├── → types.ts              (TypeScript types)
    ├── → types.go              (Go struct tags)
    └── → docs/index.html       (HTML documentation via pyLODE)

# generate.py — the single build script
from rdflib import Graph, Namespace, Literal, URIRef
from rdflib.namespace import SKOS, RDF, DCTERMS
import yaml, json

# Load canonical source
with open("src/block-types.yaml") as f:
    vocab = yaml.safe_load(f)

# Build RDF graph
g = Graph()
HM = Namespace("https://hypermedia.foundation/ns/core#")
g.bind("hm", HM)
g.bind("skos", SKOS)

# Generate concept scheme
scheme_uri = URIRef(str(HM) + vocab["concept_scheme"]["id"].split(":")[1])
g.add((scheme_uri, RDF.type, SKOS.ConceptScheme))
for lang, label in vocab["concept_scheme"]["label"].items():
    g.add((scheme_uri, SKOS.prefLabel, Literal(label, lang=lang)))

# Generate concepts
for concept in vocab["concepts"]:
    concept_uri = URIRef(str(HM) + concept["id"].split(":")[1])
    g.add((concept_uri, RDF.type, SKOS.Concept))
    g.add((concept_uri, SKOS.inScheme, scheme_uri))
    for lang, label in concept["label"].items():
        g.add((concept_uri, SKOS.prefLabel, Literal(label, lang=lang)))
    # ... broader, altLabel, definition, example, scopeNote

# Write Turtle
g.serialize("generated/core-skos.ttl", format="turtle")

# Generate JSON-LD context
context = {"@context": {"hm": str(HM)}}
# ... build context from concepts
with open("generated/context.jsonld", "w") as f:
    json.dump(context, f, indent=2)

# Generate TypeScript types
with open("generated/types.ts", "w") as f:
    for concept in vocab["concepts"]:
        name = concept["id"].split(":")[1]
        f.write(f"export interface {name} extends Block {{\n")
        f.write(f"  '@type': '{name}';\n")
        # ... fields from properties
        f.write("}\n\n")

# Generate Go struct tags
with open("generated/types.go", "w") as f:
    f.write("package hypermedia\n\n")
    for concept in vocab["concepts"]:
        name = concept["id"].split(":")[1]
        f.write(f"type {name} struct {{\n")
        f.write(f'\tType string `json:"@type"`\n')
        # ... fields from properties
        f.write("}\n\n")

3.3.2 Integration into Existing Build System

# Makefile addition
.PHONY: vocab

vocab: generated/core-skos.ttl generated/context.jsonld generated/types.ts generated/types.go docs/index.html

generated/core-skos.ttl generated/context.jsonld generated/types.ts generated/types.go: src/block-types.yaml
    python scripts/generate.py

docs/index.html: generated/core-skos.ttl
    pylode generated/core-skos.ttl -o docs/index.html

3.3.3 Non-Regression Tests

# The vocabulary compiles (valid Turtle)
rapper -c generated/core-skos.ttl

# The TypeScript types are valid
npx tsc --noEmit generated/types.ts

# The Go types compile
go build ./generated/...

# The examples parse with the generated context
node -e "JSON.parse(require('fs').readFileSync('examples/basic-document.jsonld'))"

# The JSON-LD context is valid
python -c "from rdflib import Graph; g = Graph(); g.parse('examples/basic-document.jsonld', format='json-ld')"

3.4 Third-Party Developer Validation (Proof of Concept)

3.4.1 Scenario: External Developer Creates a Custom Block Type

A developer who has never read the Seed source code wants to create an "AI-Generated Image" block type:

@prefix community: <https://example.org/hypermedia/concepts#> .
@prefix hm: <https://hypermedia.foundation/ns/core#> .

community:AIGeneratedImage a skos:Concept ;
    skos:broader hm:ImageBlock ;  # Links to core
    skos:prefLabel "AI-Generated Image"@en ;
    skos:definition "Image created by AI model"@en ;
    skos:related community:PromptEngineering .

No coordination needed. Just IPFS publication + discovery.

3.4.2 The Developer Journey

# 1. Install the package
npm install @seed/vocabulary

# 2. Import types and context
import type { ImageBlock } from '@seed/vocabulary';
import { buildImageBlock } from '@seed/vocabulary/builders';
import HM_CONTEXT from '@seed/vocabulary/context.json';

# 3. Build a document with a custom block type
const block = {
  ...buildImageBlock({ imageUrl: 'ipfs://Qm...', altText: 'AI sunset' }),
  '@type': 'AIGeneratedImage',  // Custom type
  'community:prompt': 'A sunset over mountains'
};

# 4. Publish to IPFS
# The document is self-describing, carrying its own context

3.4.3 Success Criteria

The developer can build and publish a working block type using only the npm package, the published documentation, and the SKOS vocabulary. They never read the Seed source code. They never contact the Seed team. Their custom block type is discoverable by any tool that understands SKOS.

4. User Experience Transformation (Weeks 5–6)

4.1 Capability Discoverability in the Interface

4.1.1 SKOS Vocabulary as Block Type Registry

The SKOS vocabulary feeds a navigable block type registry in the editor. When a user opens the block selector, the available types come from the vocabulary, not from a hardcoded list. Labels and definitions in the user's language come from skos:prefLabel and skos:definition. Hierarchies from skos:broader/narrower organize the selector into logical groups.

4.1.2 Community Extensions Visible Alongside Native Types

Community-published block types appear in the same registry as core types. The mechanism is identical: a skos:Concept with a skos:broader relationship to a core type. The editor does not distinguish between "official" and "community" — it reads the vocabulary graph.

4.1.3 Analogous Role to Schema.org

On the traditional web, Schema.org lets search engines understand page content. In the Seed ecosystem, the semantic layer plays an analogous but more ambitious role: it allows any piece of software to discover what a document contains, what it can do with it, and how to interact with it.

A document published on IPFS carries with it the description of its own structure. A third-party tool can traverse relationships between documents without knowing the internal implementation. Community extensions are discoverable through the same mechanism as native types.

4.2 Self-Describing Documents

4.2.1 Every Published Document Carries Its Own JSON-LD Context

No external resolution needed. The context is embedded in the IPFS payload. A third-party tool can inspect any Seed document and understand its structure without documentation, without network access, without the Seed codebase.

4.2.2 Minimal DCAT Metadata Embedded

Every document carries basic discoverability metadata alongside its block content:

{
  "@context": {
    "hm": "https://hypermedia.foundation/ns/core#",
    "dcat": "http://www.w3.org/ns/dcat#",
    "dct": "http://purl.org/dc/terms/",
    "foaf": "http://xmlns.com/foaf/0.1/"
  },
  "@type": "hm:Document",
  "dct:title": "Document title",
  "dct:description": "Document summary",
  "dct:publisher": {
    "@type": "foaf:Agent",
    "foaf:name": "Author or organization"
  },
  "dct:issued": "2026-02-09",
  "dct:license": "https://creativecommons.org/licenses/by/4.0/",
  "dcat:distribution": {
    "@type": "dcat:Distribution",
    "dcat:mediaType": "application/ld+json",
    "dcat:accessURL": "ipfs://bafyrei..."
  },
  "hm:blocks": {
    "@list": [
      {"@type": "hm:ParagraphBlock", "hm:text": "Hello world"}
    ]
  }
}

This is additive — it goes alongside the existing block type vocabulary in the same context, not in a separate system. Adding DCAT metadata properties to the JSON-LD context is roughly 15–20 lines of JSON-LD. No new dependencies, no new tooling, no architectural changes.

4.3 Access Fluidity: Private → Shared → Open

4.3.1 Architectural Principle

The transition between visibility levels is a metadata operation on catalog entries, not a document reformatting. The JSON-LD context, the SKOS types, the DCAT metadata are identical at all three levels. Moving from private to shared to open is toggling a publication flag, not reformatting data.

FAIR does not mean open. The "A" in FAIR is Accessible, not Available: metadata must be retrievable even when the data itself is not. This distinction is critical for Seed, where users need fluid control over visibility.

4.3.2 Three Levels, One Architecture

Private — the document exists on IPFS, self-described with full JSON-LD context and DCAT metadata, but the CID is not published to any catalog. FAIR-compliant in structure, invisible in practice. The owner has a fully interoperable document that nobody else can discover.

Selectively shared — the DCAT catalog lists the document's metadata (title, description, topic, SKOS types) but the distribution points to a gated endpoint. A harvester or collaborator sees that the document exists and what it contains, but cannot retrieve it without authorization. This is exactly how EU member states handle sensitive datasets — the catalog entry is public, the data access is controlled.

Open — full DCAT distribution with an IPFS gateway URL. Anyone can discover, retrieve, and reuse. The document participates fully in the open data ecosystem.

4.3.3 Transition Table

TransitionWhat ChangesWhat Does Not ChangePrivate → SharedMetadata appears in catalog, gated distribution URL addedDocument content, CID, JSON-LD context, SKOS typesShared → OpenDistribution URL switches to public IPFS gatewayDocument content, CID, JSON-LD context, SKOS types, catalog entryOpen → SharedDistribution URL switches to gated endpointDocument content, CID, JSON-LD context, SKOS types, catalog entryShared → PrivateCatalog entry removedDocument content, CID, JSON-LD context, SKOS types

Every transition is a metadata operation. The document is immutable across all states.

4.3.4 Open Vocabulary, Controlled Content

The separation that makes selective sharing work without coordination:

The vocabulary tells you what kinds of things exist and how to describe them — always public

The catalog tells you what specific documents exist and what they contain — selectively visible

The documents themselves contain the actual content — access-controlled

A developer can build a Seed-compatible application using only the public vocabulary, without ever accessing a single private document. A data portal can index catalog metadata without retrieving gated content. A user can share a document with a collaborator by granting access to the distribution endpoint, without publishing anything to a public catalog.

4.4 Native Multilingual Labels

4.4.1 SKOS Native Multilingual Support

hm:ImageBlock
    skos:prefLabel "Image Block"@en ;
    skos:prefLabel "Bloc Image"@fr ;
    skos:prefLabel "Bildblock"@de ;
    skos:prefLabel "Bloque de Imagen"@es .

4.4.2 Interface Adapts to User Language

The block selector, tooltips, and documentation render in the user's language without ad hoc translation work. The vocabulary carries the translations natively.

4.4.3 Community-Extensible Languages

A community member can add Japanese labels to the core vocabulary by publishing a SKOS extension:

hm:ImageBlock skos:prefLabel "画像ブロック"@ja .

No coordination with the core team. No pull request. Just IPFS publication.

5. Data Transformation — Interoperability & Compliance (Weeks 7–10)

5.1 FAIR Compliance by Construction

5.1.1 FAIR Principles Already Satisfied by Existing Architecture

A Seed document on IPFS that carries its JSON-LD context with DCAT metadata satisfies:

F1 (globally unique identifier) — IPFS CID

F2 (rich metadata) — JSON-LD context with DCAT properties

F3 (metadata includes identifier) — CID in the distribution URL

F4 (registered in searchable resource) — any DCAT harvester can index it

A1 (retrievable by identifier) — IPFS gateway

I1 (formal knowledge representation) — JSON-LD / RDF

I2 (FAIR vocabularies) — SKOS + DCAT

I3 (qualified references) — URI-based linking

R1 (rich metadata with attributes) — DCAT + SKOS descriptions

That is 9 of the 15 FAIR principles met by the existing architecture plus a small context extension.

5.1.2 EU Standards Mapping

The EU interoperability stack (EIF → DCAT → CPSV → SKOS) maps directly to Seed's architecture:

EIF LayerWhat It MeansSeed EquivalentLegalLicensing, data rightsCreative Commons on documentsOrganizationalBusiness processes, governanceCommunity extensions modelSemanticShared vocabularies, controlled termsSKOS vocabulary for block typesTechnicalProtocols, formats, APIsIPFS + DAG-CBOR + JSON-LD

We already cover three of the four layers. The semantic layer — the one the EU cares about most — is exactly what we are building.

Norway's data.norge.no is the reference implementation: FAIR principles enforced through DCAT catalogs, SKOS concept schemes, and machine-readable metadata. Every EU member state data portal follows the same pattern via data.europa.eu.

5.1.3 FAIR Compliance Across Access Levels

FAIR PrinciplePrivateSharedOpenF1 Unique identifier (CID)YesYesYesF2 Rich metadataYes (local)Yes (catalog)Yes (catalog)F3 Metadata includes IDYesYesYesF4 Searchable registryNoYesYesA1 Retrievable by IDOwner onlyAuthorizedAnyoneA2 Metadata accessible even if data is notN/AYesYesI1 Formal representationYesYesYesI2 FAIR vocabulariesYesYesYesR1 Rich attributesYesYesYes

The key row is A2: "Metadata are accessible even when the data are no longer available." DCAT handles this natively — the catalog entry persists and describes the dataset even if the distribution endpoint requires authentication or the data has been retracted. This is how Seed can offer selective sharing without violating FAIR principles.

5.1.4 CPSV — Core Public Service Vocabulary

CPSV is more specialized (government services), but the pattern is relevant: it describes capabilities — what a service can do, what inputs it requires, what outputs it produces. This maps to Seed's block type model. A block type is a capability (what the platform can do). Its properties are inputs (what you need to provide). Its rendered output is the result. If Seed ever targets public sector users or EU-funded projects, CPSV compatibility comes nearly for free from the existing SKOS + DCAT foundation.

5.2 DCAT Catalog

5.2.1 Static Generation of a Catalog Endpoint

A DCAT catalog endpoint is a static JSON-LD file listing published documents — generated at build time, same as the HTML documentation. Any DCAT-compliant harvester (data.europa.eu, national portals) can index Seed documents if they expose this endpoint.

5.2.2 Catalog Lists Metadata, Not Content

The catalog entry describes a dataset (title, author, topics, license, distribution URL) without including the content itself. For selectively shared documents, the distribution URL points to a gated endpoint. For open documents, it points to an IPFS gateway.

5.2.3 Harvesting Testable with Local CKAN or EU Sandbox

Before registering with a production EU portal, test harvesting locally:

Set up a local CKAN instance

Point it at the generated DCAT catalog endpoint

Verify that catalog entries are indexed correctly

Verify that gated distribution URLs are handled correctly (metadata visible, content requires auth)

5.3 Exploring Existing Data with SPARQL Anything

5.3.1 Query Any Format as RDF Without Conversion

SPARQL Anything (https://github.com/SPARQL-Anything/sparql.anything) uses a facade architecture. It exposes any data source through a virtual RDF graph. No conversion step. Query on the fly.

PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX hm: <https://hypermedia.foundation/ns/core#>

-- Query document.cbor without converting it to RDF first
SELECT ?blockType ?text
WHERE {
  SERVICE <x-sparql-anything:location=document.cbor> {
    ?doc xyz:type "document" ;
         xyz:blocks ?block .

    ?block xyz:type ?blockType ;
           xyz:text ?text .
  }
}

5.3.2 Practical Use Cases

Find all image blocks in a document:

SELECT ?imageUrl
WHERE {
  SERVICE <x-sparql-anything:location=document.cbor> {
    ?block xyz:type "image" ;
           xyz:imageUrl ?imageUrl .
  }
}

Find blocks with missing required fields (accessibility audit):

SELECT ?block
WHERE {
  SERVICE <x-sparql-anything:location=document.cbor> {
    ?block xyz:type "image" .
    FILTER NOT EXISTS { ?block xyz:altText ?alt }
  }
}

Extract all text content for full-text indexing:

SELECT ?text
WHERE {
  SERVICE <x-sparql-anything:location=document.cbor> {
    ?block xyz:text ?text .
  }
}

5.3.3 Strictly Tooling and Prototyping

SPARQL Anything is for development-time exploration and prototyping. Performance is limited (virtual graph, not indexed). Not suitable for production validation. Use it to understand what is in your CBOR data and to prototype queries before building production tooling.

Recommended workflow:

Development/Exploration:
  SPARQL Anything → Query existing CBOR directly
  (Fast prototyping, zero conversion)

Production/Validation:
  FaCade-X mappings → Convert CBOR to RDF
  pyshacl → Validate against SHACL shapes
  (When you need formal validation)

5.4 Declarative Format Mappings: FaCade-X and RML

5.4.1 The Problem with Hardcoded Conversions

Hard-coding format conversions (CBOR→RDF, JSON→RDF) creates maintenance burden. Every schema change requires updating converter code. Multiple output formats multiply the work.

5.4.2 Declarative Mappings as Data

Define once how your format maps to RDF, apply everywhere:

# map-cbor-to-rdf.yaml
source:
  format: dag-cbor
  root: "$"

target:
  format: rdf-turtle
  base: "https://hypermedia.foundation/ns/core#"

mappings:
  - source_path: "$.type"
    target_predicate: "rdf:type"
    transform: 
      type: "prefix_namespace"
      namespace: "hm:"

  - source_path: "$.creator"
    target_predicate: "hm:creator"
    target_type: "uri"
    transform:
      type: "ensure_scheme"
      scheme: "did:key:"

  - source_path: "$.blocks[*]"
    target_predicate: "hm:hasBlock"
    iterate: true
    nested_mappings:
      - source_path: "type"
        target_predicate: "rdf:type"
        transform:
          type: "concat"
          parts: ["hm:", "$value", "Block"]

      - source_path: "text"
        target_predicate: "hm:text"
        target_type: "literal"
        language: "en"

Benefits: Mappings are versioned data. Multiple output formats from same source. Easier to maintain than code.

5.4.3 Implementation Options

RML (RDF Mapping Language) — W3C standard, most mature, best tooling (https://rml.io/)

YARRRML — RML in YAML (human-friendly RML authoring, https://rml.io/yarrrml/)

Custom — Define your own mapping DSL (more control, more work)

5.4.4 Recommended Progression

Week 1-2:  SPARQL Anything (ad-hoc exploration)
Week 3-4:  Simple Python converter (100-200 lines)
Month 2+:  RML mappings (when mappings get complex)

Do not over-engineer early. Start simple, formalize when patterns stabilize. Move to RML when you have 5+ different source formats, when mapping logic exceeds 500 lines, or when you need version control of mappings independently from application code.

5.5 Optional SHACL Validation

5.5.1 Shapes for 2–3 Critical Block Types

# Enforce that all concepts have English labels
hm:ConceptShape a sh:NodeShape ;
    sh:targetClass skos:Concept ;
    sh:property [
        sh:path skos:prefLabel ;
        sh:minCount 1 ;
        sh:languageIn ( "en" ) ;
    ] .

# Image blocks must have alt text for accessibility
hm:ImageBlockShape a sh:NodeShape ;
    sh:targetClass hm:ImageBlock ;
    sh:property [
        sh:path hm:altText ;
        sh:minCount 1 ;
        sh:message "Image blocks must have alt text for accessibility" ;
    ] .

5.5.2 Validation in CI

# validate.py
from rdflib import Graph
from pyshacl import validate

data_graph = Graph().parse("examples/basic-document.jsonld", format="json-ld")
shapes_graph = Graph().parse("shapes.ttl", format="turtle")

conforms, results, text = validate(data_graph, shacl_graph=shapes_graph)

if not conforms:
    print(text)  # User-friendly error messages
    exit(1)

5.5.3 Adoption Conditional on Demonstrated Value

SHACL validation is optional. SKOS works fine without it. Adopt it only when the validation use case is concrete — when extension developers are producing invalid block types and need clear error messages. Do not add SHACL shapes speculatively.

5.6 Handling Arrays and Ordering

5.6.1 The Problem

RDF triples have no intrinsic order. This is annoying for ordered lists of blocks in a document.

5.6.2 Solution: JSON-LD @list

JSON-LD extends RDF with ordered collections:

{
  "@context": "https://hypermedia.foundation/ns/core",
  "@type": "Document",
  "blocks": {
    "@list": [
      {"@type": "ParagraphBlock", "text": "First"},
      {"@type": "ImageBlock", "url": "ipfs://..."},
      {"@type": "ParagraphBlock", "text": "Second"}
    ]
  }
}

@list preserves order in RDF serialization.

5.6.3 Alternative: Split Responsibilities

Do not model ordered lists in RDF. Keep them in your CBOR layer, reference from RDF:

# RDF (unordered, for semantics)
:doc1 hm:hasBlock :block1, :block2, :block3 .

# CBOR (ordered, for rendering)
{
  "blocks": [
    {"id": "block1", ...},
    {"id": "block2", ...},
    {"id": "block3", ...}
  ]
}

RDF handles what blocks exist. CBOR handles how to render them. Choose based on whether ordering is semantically meaningful or purely presentational.

6. TypeScript Frontend Optimization — Zero-Overhead JSON-LD (Parallel Track)

This section runs in parallel with sections 3–5. It addresses the "verbose" and "friction" concerns about JSON-LD while keeping all RDF interoperability benefits.

6.1 Build-Time Generation (Primary Strategy)

Do not process JSON-LD in the browser. Generate it at build time.

Traditional approach (heavy — do not do this):

// Runtime - SLOW
import jsonld from 'jsonld';

async function renderBlock(block: Block) {
  const expanded = await jsonld.expand(block);  // HTTP fetch @context
  const framed = await jsonld.frame(expanded, frame);  // Processing
  return render(framed);  // Finally render
}

Optimized approach (fast — do this):

// Build time - FAST
import type { Block } from '@seed/vocabulary';

function buildBlockJsonLd(block: Block): string {
  return JSON.stringify({
    "@context": "https://hypermedia.foundation/ns/core",
    "@type": block.type,
    "text": block.text,
  });
}

// At build time (Next.js SSG/ISR, Vite, etc.)
export async function getStaticProps() {
  const blocks = await fetchBlocks();
  const jsonld = blocks.map(buildBlockJsonLd);

  return {
    props: { jsonld }  // Pre-generated, static
  };
}

Result: Zero runtime JSON-LD processing. Just static <script type="application/ld+json"> in HTML. Browser does not parse anything RDF-related. SEO benefits without performance cost.

For Hypermedia Protocol: Build JSON-LD once when document published. Store in IPFS alongside DAG-CBOR. Frontend just embeds pre-built string. Zero overhead.

6.2 Never Resolve @context in the Browser

JSON-LD libraries fetch @context URLs at runtime by default. This is slow and unreliable.

Bad:

const doc = {
  "@context": "https://hypermedia.foundation/ns/core",
  "@type": "ImageBlock",
};
await jsonld.expand(doc);  // Fetches @context over network!

Good (if you must process client-side):

import HM_CONTEXT from './contexts/hypermedia-core.json';

const CONTEXTS = new Map([
  ['https://hypermedia.foundation/ns/core', HM_CONTEXT],
  ['https://schema.org', SCHEMA_ORG_CONTEXT]
]);

const documentLoader = (url: string) => {
  const context = CONTEXTS.get(url);
  if (!context) throw new Error(`Unknown context: ${url}`);
  return { contextUrl: null, document: context, documentUrl: url };
};

await jsonld.expand(doc, { documentLoader });  // Local, no HTTP

Best (do not process client-side at all):

const jsonLdString = buildBlockJsonLd(block);  // Pre-built string

For Next.js specifically:

import { NextSeo } from 'next-seo';

export function BlockPage({ block }) {
  const jsonLd = buildBlockJsonLd(block);

  return (
    <>
      <NextSeo
        additionalMetaTags={[
          {
            tagName: 'script',
            innerHTML: jsonLd,
            type: 'application/ld+json'
          }
        ]}
      />
      {/* Rest of page */}
    </>
  );
}

For Hypermedia: Package core.json context in npm package. Import as static JSON. Zero network calls.

6.3 In-Memory Caching (If Processing Needed)

If client-side JSON-LD processing is unavoidable (advanced data apps):

class JsonLdCache {
  private expandedCache = new Map<string, any>();
  private contextCache = new Map<string, any>();

  async expand(doc: any): Promise<any> {
    const key = JSON.stringify(doc);
    if (this.expandedCache.has(key)) {
      return this.expandedCache.get(key);
    }
    const expanded = await jsonld.expand(doc, {
      documentLoader: this.cachedDocumentLoader.bind(this)
    });
    this.expandedCache.set(key, expanded);
    return expanded;
  }
}

For Hypermedia: Content-addressed data means perfect caching. IPFS CID = cache key. Cache never invalidates (immutable data). Serve from edge (Cloudflare Workers, Vercel Edge).

6.4 Centralized Helpers — No Duplication

Bad (duplicated across pages):

const organizationJsonLd = {
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Seed Hypermedia",
  // ... 50 lines repeated everywhere
};

Good (centralized):

// jsonld/organization.ts
export const ORGANIZATION_JSONLD = {
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Seed Hypermedia",
  "url": "https://seed.hyper.media",
} as const;

// layouts/_document.tsx — defined once, all pages inherit
export default function Document() {
  return (
    <Html>
      <Head>
        <script
          type="application/ld+json"
          dangerouslySetInnerHTML={{
            __html: JSON.stringify(ORGANIZATION_JSONLD)
          }}
        />
      </Head>
    </Html>
  );
}

6.5 CI/CD Validation of Generated JSON-LD

// scripts/validate-jsonld.ts
describe('JSON-LD Generation', () => {
  it('generates valid ImageBlock JSON-LD', () => {
    const block = { type: 'ImageBlock', imageUrl: 'ipfs://Qm...' };
    const jsonld = buildBlockJsonLd(block);

    expect(() => JSON.parse(jsonld)).not.toThrow();

    const parsed = JSON.parse(jsonld);
    expect(parsed['@context']).toBeDefined();
    expect(parsed['@type']).toBe('ImageBlock');
  });
});

# .github/workflows/validate-jsonld.yml
name: Validate JSON-LD
on: [push, pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v2
      - run: npm install
      - run: npm test -- jsonld

6.6 Complete Frontend Architecture

1. Author RDF Vocabulary (Python - one time)
   block-types.yaml → generate.py → core-skos.ttl → pyLODE → documentation

2. Generate TypeScript Types (Build step)
   block-types.yaml → generate.py → types.ts

3. Implement Builders (TypeScript)
   types.ts → jsonld-builders.ts (2KB helpers)

4. Build-time Generation (SSG/ISR)
   buildBlockJsonLd(block) → static HTML with <script type="application/ld+json">

5. Runtime (Browser)
   Zero JSON-LD processing
   Just static structured data for SEO/crawlers

6. Validation (CI/CD)
   Tests ensure generated JSON-LD is valid

ConcernMitigationVerboseBuild-time generation = zero runtime costTypeScript integrationschema-dts + custom types = full type safetyPerformanceStatic strings, no JSON-LD processingBundle size2KB helpers vs 100KB+ librariesComplexityCentralized builders, DRY principlesValidationAutomated tests + CI/CD

7. Consolidation & Ecosystem (Months 3–6)

7.1 Community Extensions

7.1.1 Concept Scheme Template for Third-Party Extensions

Provide a copy-paste-ready template:

@prefix myext: <https://example.org/hypermedia/concepts#> .
@prefix hm: <https://hypermedia.foundation/ns/core#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

myext:MyBlockTypes a skos:ConceptScheme ;
    skos:prefLabel "My Custom Block Types"@en .

myext:MyCustomBlock a skos:Concept ;
    skos:inScheme myext:MyBlockTypes ;
    skos:broader hm:Block ;  # Links to core vocabulary
    skos:prefLabel "My Custom Block"@en ;
    skos:definition "Description of what this block does"@en .

7.1.2 Guide: "Publish a Custom Block Type in 15 Minutes"

Step-by-step guide: define a SKOS concept, reference a core type via skos:broader, publish to IPFS, register in a community index (optional). No permission required, no schema updates, no breaking changes.

7.1.3 Discovery Mechanism: SKOS Resolution via IPFS

Third-party vocabularies are discoverable through IPFS CID resolution. A tool that encounters an unknown block type can resolve its SKOS concept and learn its label, definition, hierarchy, and relationships — all from the vocabulary, without contacting the extension author.

7.2 External Vocabulary Alignment

7.2.1 Schema.org Mapping

Map core block types to Schema.org where applicable: hm:ParagraphBlock → schema:Article (or schema:CreativeWork), hm:ImageBlock → schema:ImageObject, hm:VideoBlock → schema:VideoObject. This enables search engine discoverability without additional work.

7.2.2 ActivityPub Alignment

If federation becomes relevant, map Seed document types to ActivityPub activity types. A published document is a Create activity. A shared document is an Announce. SKOS concepts can carry skos:related links to ActivityPub types.

7.3 Conditional Complexity Escalation

7.3.1 SKOS → RDFS Threshold

Move to RDFS when: patterns have stabilized across multiple community extensions, formal properties (domain, range) are needed for validation, and the core team needs machine-enforceable constraints beyond SHACL shapes.

7.3.2 RDFS → OWL Threshold

Move to OWL when: a specific reasoning use case is justified (e.g., automated classification of block types based on their properties), the performance cost of reasoning is acceptable, and the complexity is justified by concrete user value.

7.3.3 Architecture Decision Records

Every escalation is documented via an ADR: what changed, why, what alternatives were considered, what the trade-offs are, and how to revert if needed.

7.4 EU & Data Spaces (If Strategically Relevant)

7.4.1 Vocabulary Registration on EU Joinup

Register Seed's SKOS vocabulary with the EU Joinup platform. This makes it discoverable by EU institutions and member state data portals.

7.4.2 CPSV Alignment

If public sector use cases materialize, map block types to CPSV concepts. A Seed document describing a public service maps naturally: the document is the service description, block types are service components, properties are service attributes.

7.4.3 High-Value Datasets Initiative

The EU mandates FAIR compliance for publicly funded data through the Open Data Directive and the Data Governance Act. Organizations that produce FAIR-compliant data get priority access to EU funding and procurement. A decentralized publishing platform that produces FAIR-compliant documents by default — without requiring users to learn anything about DCAT or SKOS — is a strong differentiator.

The pitch becomes: "Publish on Seed, and your documents are automatically discoverable by every EU data portal — or keep them private with the same data quality, and open them up whenever you are ready, with one click."

8. Cross-Cutting Principles (Running Thread)

8.1 Frugality

Every addition must pass the test: "Does this simplify the work of a third-party developer?" No tooling that is not used within the week it is added. Runtime dependency budget: zero new RDF libraries in Go or in the browser bundle.

The "good parts of RDF" philosophy in one sentence: maximum interoperability, minimum complexity.

8.2 Build-Time, Not Runtime

JSON-LD is a compiled artifact, never processed dynamically. IPFS + content-addressing = perfect caching (CID = cache key, cache never invalidated). The browser parses nothing RDF-related: static strings in <script type="application/ld+json">.

This mirrors Hickey's Datomic approach: take the ideas of RDF (atomic facts, global naming, properties-as-first-class) and implement them in a system that has zero RDF dependencies at runtime.

8.3 Single Source of Truth

The canonical YAML generates everything: Turtle, JSON-LD, TypeScript, Go, HTML. A change to the YAML propagates everywhere through the build pipeline. No drift between documentation, types, and runtime.

block-types.yaml (single source of truth)
    │
    ├── core-skos.ttl        → pyLODE → docs/index.html
    ├── context.jsonld        → npm package → frontend
    ├── context.cbor          → Go backend → IPFS storage
    ├── types.ts              → npm package → TypeScript consumers
    └── types.go              → Go backend → CBOR parsing

8.4 Reversibility

Every technical choice is reversible without a rewrite. SKOS does not preclude RDFS later. JSON-LD does not preclude Turtle. CBOR does not preclude JSON. The build pipeline can regenerate all artifacts from the YAML source at any time, in any format.

Closed-world internally (strict validation via SHACL shapes), open-world externally (free extensions via SKOS concepts that anyone can publish without coordination).

8.5 RDF as Specification, Not Runtime

We use RDF as a specification language for interoperability, not as a runtime database system. This is the "good parts" philosophy in practice:

Use these (good parts): Namespaces (avoid naming conflicts). URIs (stable, decentralized identifiers). Linked data (documents reference each other). SKOS (simple, flexible taxonomies). JSON-LD (RDF without the syntax pain).

Avoid these (complexity traps): OWL reasoning (unless you have specific use case). RDFS domain/range (too rigid for decentralized extension). Triple stores (not needed for vocabulary management). SPARQL endpoints (not needed initially). Blank nodes (prefer named resources).

The Go backend stays lightweight. RDF semantics are in the specification, not in runtime dependencies. You choose how much RDF processing (if any) happens in Go based on actual requirements.

8.6 Content Is Infrastructure

In agentic systems, content that does not describe itself does not exist. An agent cannot reverse-engineer intent from a Go struct. It cannot infer constraints from a TypeScript interface. It either finds intent, constraints, and applicability encoded in the asset — or it moves on.

This moves content strategy upstream of architecture — into architecture. The SKOS vocabulary, the JSON-LD context, the canonical YAML — these are not companion artifacts that describe the platform from the outside. They are the platform's public contract, its self-description mechanism, its compilation source. Documentation explains; infrastructure carries. When an agent resolves a block type through a SKOS hierarchy, it is not reading documentation. It is traversing the system itself.

The vocabulary is load-bearing. Remove it, and nothing is discoverable, nothing is composable, nothing is extensible without a phone call. Content is infrastructure — and building it is not a writing task. It is the collaboration between content strategists and decision-graph architects, encoding intent directly into the asset, at build time, before any agent ever touches it.

Appendix A: Concern Mitigation Matrix

ConcernMitigationOver-complicatedSKOS only (simplest standard, ~10 properties), no reasonersVerboseJSON-LD + CBOR, not XML/Turtle at runtimeNo arrays/orderJSON-LD @list for ordered collectionsFriction with triplesJSON-LD is native JSON, Go stdlib handles itTriple store overheadVirtual layer (in-memory Python at dev time), no databaseLimited Go supportGenerate Go code from RDF, no runtime RDF dependenciesTypeScript frontendBuild-time generation, zero runtime overhead, 2KB bundleMaintenance burdenAuto-generated docs (pyLODE), CI/CD validation, single YAML source

"In RDF, without combining it with RDF schema or something else, the properties are just names." — Rich Hickey, Cognicast Episode 103 (2016)

Appendix C: Glossary

SKOS — Simple Knowledge Organization System. The simplest W3C standard for organizing concepts into hierarchies with labels, definitions, and relationships. ~10 properties total.

DCAT — Data Catalog Vocabulary. W3C standard for describing datasets for discovery. Used by every EU member state data portal.

SHACL — Shapes Constraint Language. W3C standard for validating RDF data against structural constraints. Like JSON Schema, but for graphs.

JSON-LD — JSON for Linking Data. A JSON-based serialization of RDF. Valid JSON that can also be interpreted as RDF. The bridge between web developers and the semantic web.

DAG-CBOR — Directed Acyclic Graph Concise Binary Object Representation. Binary serialization used by IPFS for content-addressed data. Compact, fast, and deterministic.

CID — Content Identifier. IPFS's content-addressed identifier. A hash of the content. Immutable: the same content always has the same CID. The perfect cache key.

FAIR — Findable, Accessible, Interoperable, Reusable. A set of 15 principles for scientific data management. Increasingly mandated by EU funding agencies.

EIF — European Interoperability Framework. EU framework defining four layers of interoperability: legal, organizational, semantic, technical.

CPSV — Core Public Service Vocabulary. EU vocabulary for describing public services and their components.

RML — RDF Mapping Language. W3C standard for declarative mappings from any data format to RDF.

SPARQL Anything — A tool that queries any data format (JSON, CSV, XML, CBOR) as if it were RDF, without conversion.

FaCade-X — An architecture pattern for exposing non-RDF data through a virtual RDF interface.

OWL — Web Ontology Language. The heavyweight formal ontology standard. Enables automated reasoning but adds significant complexity. Use only when justified.

RDFS — RDF Schema. A middle-ground formalization layer between SKOS (lightweight) and OWL (heavyweight). Defines formal properties with domain and range constraints.

Appendix D: References

SKOS specification: https://www.w3.org/2004/02/skos/

pyLODE: https://github.com/RDFLib/pyLODE

Widoco: https://github.com/dgarijo/Widoco

Ontospy: https://github.com/lambdamusic/Ontospy

VoCol: https://github.com/vocol/vocol

SPARQL Anything: https://github.com/SPARQL-Anything/sparql.anything

FaCade-X paper: https://ceur-ws.org/Vol-2456/paper37.pdf

RML: https://rml.io/

YARRRML: https://rml.io/yarrrml/

piprate/json-gold (Go JSON-LD): https://github.com/piprate/json-gold

schema-dts (TypeScript Schema.org types): https://github.com/google/schema-dts

data.europa.eu (EU data portal): https://data.europa.eu/

data.norge.no (Norway reference implementation): https://data.norge.no/