A work-in-progress proposal for schema features in the Hypermedia protocol

    Problem

      Onyx is highly pragmatic- we are not introducing formality for the sake of it.

      We aim to solve the following problems:

          People need to Understand Hypermedia Protocol

            For the protocol to gain adoption, developers need to understand the data that gets signed and distributed. And we also need to understand and document the RPCs that are used to communicate between p2p nodes, and what RPCs a single node may expose to its consumers (GUIs, APIs, command lines, agentic access)

            I suggest a unified system to describe both the verifiable signed data and the APIs, because they have overlapping schemas. For example the "block type" which describes chunks of content like paragraphs and images in a document. This type is useful to describe both the node RPCs, and to describe the format of the verifiable signed data.

          The Hypermedia Protocol Must be Extensible

            There are many use cases which the core team cannot support, or does not want to impose on the community. For example if someone in the community wants to experiment with a new form of media like interactive video, they should be able to write software that interacts with the network safely, not breaking other software.

            And on the core team, we may want to develop our own experiments and internal tools, while keeping the core documents+comments functionality of the protocol stable.

          Programatic Access to Data and RPCs

            With a formal schema, we can deliver some useful tools. For example, we could build a GUI that allows us to safely experiment with the raw RPCs and data.

            In the signed verifiable data, for example, an IPFS file URL may be encoded as a string (or a raw IPLD CID). But when we expose that to the user, we can offer a file picker. If the schema implies that this should be an image, we can display the image and make sure the user does not select a zip file.

            Another useful application of a robust schema system is automatically creating type-safe SDKs in many languages, lowering the barrier to entry for developers in different ecosystems to participate in the same data universe.

            Also, our APIs may be more robust with a schema system. When a user sends data to an API, we can validate it against schemas and provide detailed errors, telling the user what they did incorrectly.

          Protocol Robustness

            Currently the "verification" of the verifiable data simply checks for the signatures and permissions/capabilities of the writer to ensure they are allowed to edit the resource. It does not validate the structural integrity of a resource. For example if a user uploads arbitrary data inside the content of a comment, the system will accept that comment even if it is nonsensical, and the UI will break as a result.

    Solution

      Before we implement schemas, we must decide how to balance Open World Assumption and Closed World Assumption. Also, we must support existing data in the HM network which does not yet have a schema. The other goal is to keep the barrier low for new implementers of software on the network.

      So, schemas are optional. HM is "Open World" by default. We include fallback schemas for existing types in HM25. A schema provides utility but is not meant to break software. At the most extreme, software will warn users when content does not conform to the stated schema.

      What are the hard requirements for data in the HM ecosystem? All HM content must:

        Be parsable. We are using DAG-CBOR encoded IPFS blobs. If this parsing fails, the data is considered garbage. Otherwise, we can present it to the user in some capacity.

        Have correct signature, trust, capabilities - All software in the HM ecosystem must build upon our trust primitives to ensure that data is owned and controlled by the correct keys. This is a prerequisite of Onyx Hypermedia

      Note: it is important for users to see what data exists inside the content they are referencing and signing. If the software encounters unknown fields in the data, they must be presented to the user in the best way possible. This will prevent users from accidentally referencing and signing "invisible" content which may be undesirable or weighty. Unexpected data that shows up in any blob (Profiles, Changes, Refs, Doc Change Blocks) must be surfaced to the user.

      Schema Definitions

        Still undecided. See below "Schema Decision-Making"

      Fallback Schemas

        To handle existing content, we will resolve to our own published schemas depending on the "type" field that is defined.

        New content signed by respectful software will include the appropriate schema metadata.

    Schema Decision-Making

      A work in progress

    Inspirations and References

      Assorted technology we considered, much that helped inspire/influence this proposal

      1

        RDF

          RDFS

          OWL

          SHACL + DASH

          SKOS

        XRPC

        graphQL

        IPLD schemas

        XSL (XML Schemas)

        FHIR