Research paper from ChatGPT 4o-mini-high "deep research"
We are embarking on a challenging project to create a new hypermedia web protocol. This protocol isn’t a simple single-use API – it spans multiple roles and contexts. Each node in the network can play different roles (for example, acting as a server, a client, a paid service provider, an archiver, or an indexer), and depending on its role, it will expose different APIs. Our goal is to define a unified way to describe all these APIs and the data they handle, so that implementers (in TypeScript, Go, and other languages) can all follow the same strict specification.
In essence, we need a robust type system and an API protocol layer that covers all aspects of our hypermedia network. This system should cover both how data is structured and how it’s exchanged (including real-time updates). The challenge is that our network touches many mediums and patterns:
Multiple Roles, Multiple APIs: A node’s behavior and available endpoints differ if it’s a client, server, indexer, etc. We need a way to specify each role’s API while reusing common data types across them. A unified schema will prevent inconsistencies as the same data types will appear in various APIs.
Shared Data Types for TypeScript and Go: We have a codebase in both TS and Go. We want to define data models and interfaces once and generate or enforce them in both languages. A single source of truth for types will keep front-end and back-end in sync and reduce errors.
Strictly Structured, Signed Data: Our network stores permanent data in IPFS (using DAG-CBOR encoding), and this data is cryptographically signed. This means we need a well-defined schema for these stored objects – every piece of content should conform to a schema for validity. The data format should be self-describing if possible, so that anyone inspecting an object can figure out its type and validate it against the proper schema. Ideally, one should be able to follow a reference from the data to its schema or documentation easily.
Self-Documenting and Readable: We want the format and protocol to be easy for developers to read and write. This implies using human-readable structures (e.g. JSON-like schema or descriptors) and including metadata that makes the data self-documenting. For instance, if an object has a certain type or field, one should be able to discover what that means (through an ID or link to its schema or an explanation).
Real-Time “Push” Communication: This is not just a request/reply web API. The system needs real-time capabilities – servers and nodes must be able to push data updates immediately to others. Whether it’s via web sockets, subscriptions, or another mechanism, our protocol layer must support streaming updates and asynchronous events. This is true both for node-to-node communication (e.g. propagating new content across the network) and for client-server communication (e.g. live updates to a user’s app).
Automated Documentation: Given the complexity, we want to generate documentation automatically from the schemas. If we define our types and endpoints formally, we should be able to produce human-friendly docs (web pages, READMEs, etc.) that explain the hypermedia data formats and APIs. This will greatly help developers understand and use the protocol.
Browser-Friendly and Offline-Friendly: Client-side web support is a must – browsers should be able to use this protocol (meaning we may need HTTP or WebSocket compatibility, not a binary that only native apps can use). Also, considering the decentralized nature (IPFS), nodes might be offline or data might be fetched from peers. We might even consider embedding schema information into IPFS itself so that if you encounter some data and you’re offline, you could still retrieve its schema from the network. In other words, the system should play well with a distributed environment where not everything is served from a central server.
Current Stack Constraints: We currently use IPLD (InterPlanetary Linked Data) with DAG-CBOR for data serialization and storage. DAG-CBOR is a binary JSON-like format that ensures content is content-addressable (hashes of data are consistent) and is a preferred codec in IPFS. This choice is essentially fixed for our permanent data storage – we aren’t planning to change the data encoding away from DAG-CBOR. We also use gRPC today for communication between front-end and back-end components. gRPC gives us a way to define services and uses Protocol Buffers for data encoding. However, it’s not necessarily the best fit for a web client (which might have trouble with gRPC’s binary protocol or need special proxies), and it doesn’t inherently provide the kind of self-documenting hypermedia interface we’re aiming for. We will examine whether we continue with gRPC or switch to something else.
Considering all the above, we’re essentially looking at a very meta project – one that defines how all other pieces talk and what data they exchange. It’s a bit hard to explain because it’s a layer of abstraction above the actual features. One way to see it is that we’re designing an Interface Definition Language (IDL) and a protocol specification for our hypermedia network. This will serve as the foundation for bringing documentation, robustness, and sanity to the evolving system.
What should we call this? It could be described as a schema-driven hypermedia protocol framework. We haven’t decided on a catchy name yet, but it might help to think of it as the “hypermedia schema and API layer” for our IPFS+libp2p network. (For now, we’ll focus on choosing the right technology; a good name can come once we know what it’s built on!)
Evaluating Potential Solutions and Ecosystems
Given these requirements, the big question is: Is there an existing ecosystem or standard that fulfills these needs, or do we need to create a custom solution? We will examine several candidate technologies and approaches one by one:
The AT Protocol Lexicon – A schema and API system from the Bluesky/AT Protocol project, which might align closely with our needs.
GraphQL – A popular query language and type system known for strong typing, flexibility, and real-time support via subscriptions.
gRPC and Protocol Buffers – Our current RPC framework, offering strict types and streaming, and how it might be adapted or improved (e.g. via new tools) to meet our goals.
OpenAPI (Swagger) and JSON Schema – The mainstream way to specify RESTful APIs and data models, with a huge ecosystem of tools (and the possibility of pairing it with AsyncAPI for real-time aspects).
IPLD Schemas or a Custom IDL – Rolling our own schema language, possibly building on IPLD’s existing schema system, to tailor exactly to our environment.
Combination or Other Niche Approaches – (For completeness, considering mixing strategies or other lesser-known frameworks).
Let’s break down how each option stacks up against our requirements:
AT Protocol Lexicons (Bluesky’s Schema System)
One promising existing solution is the AT Protocol’s Lexicon system. The AT Protocol (developed by Bluesky) is a decentralized social networking protocol, and it introduced Lexicons as a way to define both the data types (records) and the API methods (procedures/queries/subscriptions) in a single schema language. This approach sounds very much like what we need: it’s designed for an open network where different parties need to agree on data formats and behaviors.
What Lexicon Offers: Lexicons are written in JSON and are similar to JSON Schema or OpenAPI, but with extensions specific to the AT Protocol’s needsatproto.com. Each lexicon has a unique ID (a namespaced identifier) and can define: record types (the schema for stored objects), procedures (RPC endpoints for actions, usually POST), queries (read-only endpoints, typically GET), and subscriptions (real-time event streams over web sockets)atproto.comatproto.com. In other words, a single lexicon file can describe a piece of the protocol – for example, a lexicon might define a data model for a “Post” record and also the “getPost” API to fetch it, etc.
Crucially, Lexicon enforces that data can be self-describing. In AT Protocol, objects often carry a $type field that tells you which schema (lexicon) they conform toatproto.com. This means if one node hands some data to another, the receiving side can look at $type and know how to interpret and validate that object (by referring to the lexicon definition of that type)atproto.com. This is exactly the kind of self-documenting approach we envision – any data can point to an explanation of what it is. Lexicon’s design explicitly notes that records should include the $type field, since records might be circulated outside of their original context and “need to be self-describing.”atproto.com.
For APIs, Lexicon defines a lightweight RPC mechanism called XRPC (basically RESTful calls under the hood). Endpoints are identified by names like com.example.getProfile, and they correspond to HTTP paths (e.g. GET /xrpc/com.example.getProfile)atproto.com. The lexicon schema for an endpoint specifies its parameters, input schema, output schema, and even possible error codesatproto.comatproto.com. There’s also support for subscriptions (server-sent events over websockets), where a lexicon can define the message types that stream outatproto.com. This aligns with our real-time requirement.
Why it Aligns with Our Needs: The Lexicon system was built to solve interoperability in a decentralized network – it’s meant to allow different implementations to agree on behavioratproto.com. That is very much our problem too. It’s also not as over-generalized as something like RDF; Lexicon is meant to be pragmatic and even supports code generation for static types and validationatproto.com. In fact, the Bluesky team states that lexicons “enable code-generation with types and validation, which makes life much easier”atproto.com. Indeed, they have built tools to generate TypeScript interfaces and client libraries directly from lexicon filesatproto.comatproto.com. For example, if you have a lexicon for com.example.getProfile, you can generate a TS client method so that calling it feels like a normal function call with typed return valuesatproto.com. This is great for our goal of having type-safe interfaces in TS and Go – we could write schemas once and generate stubs/clients in both languages.
Another huge plus: Lexicon builds on IPLD and uses DAG-CBOR for binary representation. The AT Protocol data models can be represented in JSON or in CBOR (content-addressable) formatproto.com. Bluesky’s repo sync and events actually use content-addressable records and CAR files (which are from the IPFS world). This means lexicon is natively compatible with the idea of storing data in IPFS. Our current storage (IPFS DAG-CBOR) would fit right in, since lexicon-defined objects can be encoded as DAG-CBOR and carry their $type for interpretationatproto.comatproto.com. They even have a special type for cid-link to represent IPFS content addressesatproto.com. All this suggests that adopting Lexicon could let us keep using IPFS for data and have the schemas to validate those DAG-CBOR objects.
Tooling and Ecosystem: The AT Protocol ecosystem provides some tooling already. For instance, there’s a TypeScript lexicon parser and code generator (lex-cli) that Bluesky uses to produce its client librariesdocs.bsky.appatproto.blue. They also have a Python SDK that was auto-generated from lexicons: it includes models, XRPC client, and even a “firehose” (streaming) client – and it’s explicitly built to allow custom lexicons, not just Bluesky’satproto.blue. In the Python SDK docs, they encourage using the code generator for your own lexicon schemas and mention that the SDK provides utilities for CID, NSID, AT URIs, DAG-CBOR, CAR files, etcatproto.blue. This indicates a mature approach where a lot of the heavy lifting (parsing schemas, creating data structures, handling content addressing) is already handled. For Go, there might not be an official lexicon codegen yet, but since Bluesky’s reference implementation had components in Go, it’s possible similar tools exist or can be built.
Potential Downsides: Lexicon is quite new and specific to the AT Protocol. It’s essentially a bespoke solution for that ecosystem. Adopting it would mean learning its schema definition style and perhaps extending or modifying it for any unique needs of our project. The community around it is smaller compared to something like GraphQL or OpenAPI. However, the concepts in lexicon (JSON schema-like definitions) are familiar enough, and it’s stated that lexicons could be translated to JSON Schema or OpenAPI if neededatproto.com. The main question is whether it covers everything we need. From what we see: it covers data records, it covers RPC endpoints, it covers real-time streams, it supports content-addressable data, and it was built with codegen and multi-language use in mind. That checks almost all our boxes.
In summary, AT Proto’s Lexicon provides a ready-made “schema and API language” tailored for a decentralized, content-addressed network with multi-language support. It yields self-describing data ($type fields) and has existing tools for code generation and documentation. This could be an excellent starting point or even the foundation of our system, sparing us from reinventing a schema language from scratch.
GraphQL
Next, let’s consider GraphQL, a very popular technology for APIs. GraphQL is essentially a query language and schema definition system for APIs that was open-sourced by Facebook. It lets you define types (with fields and their types) and operations (queries, mutations, subscriptions) in a schema. Clients can request exactly the data they need with flexible queries, and the system can provide powerful tooling thanks to its strict schema and introspection capabilities.
Strong Typing and Single Schema: GraphQL’s type system could give us a unified view of our data. We can define object types that represent our records (e.g., a Post type with fields like id, content, createdAt), and also define the entry points for operations. GraphQL schemas often have “Query” type for read operations and “Mutation” type for write operations. They also support subscriptions for real-time updates (typically implemented over WebSocket). For instance, we could allow a subscription like onNewPost that pushes new posts to subscribers. This matches our need for push-based data flow.
One of GraphQL’s biggest strengths is how self-documenting it is. The schema is part of the server, and GraphQL includes an introspection system that allows clients (or tools) to query the schema itself. In practice, this means you can ask the server “what queries do you support, what types do you have, what fields do they have, and what do those fields mean?”adhithiravi.medium.comadhithiravi.medium.com. GraphQL APIs are required to provide this introspection, making them effectively self-documenting APIs. Developers can use tools like GraphiQL or Apollo Explorer to browse the API and see descriptions. This addresses our automated documentation goal: with GraphQL, documentation UIs can be generated on the fly from the live schema, and tools can even do autocompletion and code generation based on the introspective schemaadhithiravi.medium.com. In short, GraphQL’s introspection “provides a way for clients to discover the resources (types, queries, etc.) available in the schema,” which saves time and enables rich IDE featuresadhithiravi.medium.comadhithiravi.medium.com.
Real-Time via Subscriptions: GraphQL has built-in support for subscriptions, which are essentially long-lived operations where the server can push back data whenever certain events occur. Implementation-wise, this often uses WebSocket connections. For example, Apollo GraphQL uses a websocket sub-protocol to send subscription data. The GraphQL spec’s take is that “subscriptions are long-lasting operations that can change their result over time,” and that they maintain an active connection (commonly via WebSocket) so the server can push updates to the clientapollographql.com. This directly addresses our need for real-time updates – GraphQL can notify clients of new data in real time without polling.
Multi-language and Codegen: GraphQL is language-agnostic for the server (there are GraphQL server libraries in many languages, including Go) and for the client (many client libraries as well). For our TypeScript front-end, GraphQL is a natural fit; there are tools like Apollo Client, and we can use GraphQL Code Generator to produce TS types for the results of queries, making the front-end strongly typed. For Go, there are libraries like graphql-go or gqlgen that help define a schema and resolve it with Go functions, or to generate Go types from a schema. It might not be as straightforward as gRPC code generation, but it’s doable. Moreover, because GraphQL has the schema in a standard format, we could potentially generate documentation or even stub resolvers automatically.
Content-Addressed Data & Offline Issues: GraphQL was originally designed for client-server scenarios rather than purely peer-to-peer or content-addressed networks. It doesn’t have a concept of content hashes or IPFS built in. However, that doesn’t mean it can’t be used; we could incorporate CIDs (content IDs) as fields (e.g., a field in GraphQL of type ID or String to carry an IPFS hash). The GraphQL schema itself could in theory be stored or shared via IPFS (after all, it’s just a schema definition text), but typically GraphQL assumes the client can reach the server to get introspection. In offline situations where you only have data, GraphQL doesn’t include a mechanism for the data to tell you what it is (no $type field akin to lexicon). So if an object floats around on IPFS, having a GraphQL schema won’t automatically validate it unless you know which type it’s supposed to be. We could mitigate this by establishing conventions (like always including a __typename or a type indicator in the data). In fact, GraphQL does have an automatic field __typename that you can query to get the runtime type of an object in a response (especially useful for union or interface types)graphql.org. But that presupposes you got the data via the GraphQL API. On the flip side, GraphQL doesn’t forbid including a type field in an object; it’s just not mandated by the system as lexicon does. We might need to design our GraphQL types to have something like a type field if we want self-description in the raw data.
Another challenge is that GraphQL is typically a single schema for a service. Our network has multiple roles and potentially multiple services (e.g., maybe one service for serving content, one for indexing, etc., or different node types exposing different subsets). We could still model this in GraphQL by perhaps having a very large schema that includes types and operations for all roles, but a given node would only implement a subset (unauthorized operations could return errors or be disabled). Alternatively, each role might expose a different GraphQL endpoint. That would mean multiple schemas to maintain, though they could share type definitions. This is not impossible, just something to manage.
Performance and Efficiency: GraphQL’s flexibility (the client can ask for many fields at once) can sometimes lead to over-fetching on the server side if not optimized, but it solves the over-fetching problem on the client side (client only gets what it asks for). In our scenario, if clients are retrieving content by queries, GraphQL could be fine. For node-to-node, GraphQL might be less common – nodes might prefer exchanging data in bulk rather than through a query language. But nothing prevents using GraphQL between nodes if one node exposes a GraphQL API to others.
GraphQL vs Our Requirements Summary: GraphQL scores well on type safety, multi-language support, and real-time updates. It also shines in documentation and introspection – GraphQL APIs are inherently self-documenting and come with a rich ecosystem of tools for exploring the APIadhithiravi.medium.com. Where GraphQL is a bit less aligned is the content-addressable/offline aspect (no built-in content hash or schema distribution mechanism) and possibly the complexity of implementing resolvers for everything (we’d need to write the logic to serve the data as GraphQL, whereas something like lexicon might allow more direct object passing if data is already structured). GraphQL also doesn’t natively enforce that stored data conforms to a schema unless that data only ever comes through the GraphQL API. We might still need separate validation if data is injected via other means (like if nodes sync data over libp2p, not through GraphQL, we must validate it against the GraphQL schema manually).
In conclusion, GraphQL is a strong candidate especially for the client-server boundary due to its developer-friendliness, introspection, and real-time support. It might require some adaptation to fit a fully decentralized model (we’d need to decide how GraphQL schemas are distributed or referenced, and how to use it for node-to-node communication if we want that). It is a proven technology with lots of community support and could give us a quick win on documentation and type-safe usage in TypeScript. We would have to weigh its lack of inherent content-addressing awareness against the benefits.
gRPC and Protocol Buffers
Our current implementation already uses gRPC (with Protocol Buffers as the IDL/encoding) between the front-end and backend. gRPC is a framework designed by Google for high-performance RPC, and it generates code for many languages from a .proto interface definition file. It’s worth evaluating if we can stick with gRPC/Protobuf and perhaps extend or tweak our usage to meet the goals, or if gRPC is fundamentally mismatched for what we want.
What gRPC/Protobuf Provides: With Protocol Buffers, you define message types and service RPC calls in .proto files. This gives a strict schema for your data (with field types, required/optional fields, etc.) and for your APIs (the RPC methods and their request/response types). From these, gRPC can auto-generate code in multiple languages – out of the box, there’s support for Go, C++, Java, Python, Ruby, C#, NodeJS, etccloud.google.com. This is a big plus for the multi-language requirement: we could generate Go structs and TS classes from the same schema (though TypeScript support may rely on community tools or newer developments, as Google’s official support for JS/TS results in JavaScript stubs; however, there are projects to generate TS typings).
gRPC also has excellent support for real-time streaming. Unlike traditional REST, gRPC (built on HTTP/2) allows server streaming, client streaming, and bidirectional streaming nativelymedium.com. That means a single RPC call can be kept open and messages can flow continuously in either or both directions, fulfilling our push requirements. In practice, we could define a streaming RPC like SubscribeUpdates(stream UpdateRequest) returns (stream UpdateEvent) and both sides can send messages arbitrarily. This is very powerful for implementing real-time data flows (and indeed gRPC is used in many microservices for exactly that purpose)medium.com.
Performance-wise, gRPC is efficient: Protocol Buffers are a compact binary format, and gRPC runs over HTTP/2 with multiplexing (so multiple calls can share one connection, and streams can interleave)medium.com. If raw efficiency and low overhead are priorities, gRPC excels. Also, Protobuf messages are well-suited for large numbers of small messages or high throughput, which might be relevant in a busy network of nodes.
Type Safety and Codegen: Protobuf enforces strict types (e.g., an int32 or a string or a custom message type). If a message doesn’t match the schema, it can’t be decoded. We would have one set of .proto files that define all our data structures and RPC calls, and from that we generate Go code (structures and interface stubs) and TypeScript code (perhaps using a tool like Buf’s Connect or grpc-web for the browser). This gives a single source of truth for types, similar to what we want.
In fact, with gRPC and the proto IDL we would get automatic code generation of clients/servers in multiple languages, which is a known advantage (teams often cite avoiding writing boilerplate thanks to this)medium.com. As a simple example, if we define a service MyService with an RPC method GetItem(ItemRequest) returns (ItemResponse), the tooling will generate a Go interface or base class for a server to implement GetItem, and a Go and TS client method to call GetItem easily. This is analogous to how lexicon or OpenAPI codegen would provide stubs.
Where gRPC Falls Short for Us: The main issues with gRPC relate to self-documentation, web support, and flexibility:
Self-Documentation & Discoverability: gRPC does not have a built-in introspection or discovery mechanism that a web client can easily use. There is something called server reflection in gRPC, but it’s more for programmatic usage (and not always enabled). It’s not like GraphQL where you can ask the server about its schema. Typically, developers rely on the .proto files as documentation or use tools to generate static docs. We could generate HTML or markdown from proto comments, but it’s not as dynamic or rich as GraphQL’s introspection or OpenAPI’s Swagger-UI. So, if “self-documenting hypermedia” is a goal, gRPC is not ideal. The client must already know what RPC calls exist; you can’t just fetch a schema on the fly easily (unless you distribute the proto or use reflection clients).
Browser and Client-Side Support: Browsers cannot directly open arbitrary TCP connections or speak HTTP/2 the same way server environments can, which complicates using gRPC directly in web apps. There is gRPC-Web, a variant that works over HTTP/1.1 and some limited streaming (it allows server-streaming but not client-streaming, due to browser limitations). Using gRPC in a web page often requires an HTTP proxy that translates between the browser-friendly format and true gRPC on the server. This adds complexity. Recently, projects like Buf Connect have created libraries to generate TypeScript clients that use HTTP+JSON or gRPC-Web under the hood, so that you can call gRPC services from a browser without painbuf.buildnews.ycombinator.com. Connect essentially lets you serve a gRPC service in multiple protocols (binary for native clients, JSON for web clients) with the same interface. This is promising if we wanted to keep gRPC, but it’s extra technology to adopt. Without something like that, using gRPC from a JavaScript front-end means using the limited gRPC-Web client or switching to a different approach for the web.
Tight Coupling vs Hypermedia: gRPC by nature is RPC – you call a method and you get a response. It doesn’t encourage hypermedia or self-descriptive resource representations. For example, if you fetch an object via gRPC, you just get the protobuf message back (in binary or as decoded object) without any links or type indicators beyond what’s in the proto definition known at compile time. There’s no built-in notion of including a schema reference in the message (except using Google’s Any type which can include a type URL, but that’s not commonly used for hypermedia style; it’s more for polymorphism). This means it’s less suited to a scenario where data might be floating around and needs to explain itself. We could embed things in our proto (like a field that carries the schema name or version), but that’s up to us to enforce.
Integration with IPFS/Data at Rest: If all data exchange was via gRPC calls, protobuf would impose a schema and we’d be fine. But our project involves data being stored in IPFS as DAG-CBOR. There’s a mismatch here: Protocol Buffers is a different encoding. We wouldn’t store proto-encoded data in IPFS (at least, IPFS usually works with IPLD formats like DAG-CBOR or DAG-PB). We could theoretically design proto and IPLD schemas to mirror each other, but that means maintaining two parallel definitions for each type (one in .proto and one in, say, IPLD Schema or code). That introduces potential inconsistencies and extra work. Another path is to stop using proto for data encoding and only use it for the interface, but then we lose some benefit of gRPC or have to convert to/from CBOR on every call. Alternatively, we could define our data types in proto and then use a special IPFS codec (there is a thing called “DAG-PB” for the old protobuf-based IPLD, or even “DAG-COSMOS” which is a protobuf-based IPLD for Cosmos SDK). However, we have largely standardized on DAG-CBOR which is JSON-like. So there’s friction between using Protobuf schemas and using CBOR data formats. It might not be insurmountable, but it’s a design complexity. (Some projects do use protobuf for defining data and then store it, but we’d have to ensure deterministic encoding for the hashes – standard proto has some issues with field ordering unless carefully handled, whereas DAG-CBOR is naturally deterministic for hashing.)
In summary, gRPC/Protobuf does meet a number of our needs (strict typing, multi-language codegenmedium.commedium.com, streaming real-time commsmedium.com, high performance). If our primary goal was performance and strong API contracts between known services, gRPC would be a top choice. However, for open ecosystems and hypermedia-style self-description, it’s not as strong. It shines in closed, internal microservice contexts where you control both ends and can share the proto files. In a more open network, distributing proto definitions or evolving them can be trickier (though tools like Buf’s schema registry could help, it’s still a centralized approach).
We could potentially continue to use gRPC internally for certain communications (especially node-to-node or backend-to-backend, where web clients aren’t involved), while using another mechanism for client-facing APIs or for describing data schemas. For example, we might decide to keep some gRPC services but auto-generate OpenAPI or GraphQL on top for external consumption. But doing multiple layers might defeat the purpose of unification.
Another note: there is nothing in gRPC out-of-the-box for automatically generating documentation in a nice way (no equivalent of Swagger UI without converting the proto to an OpenAPI spec using a tool). We’d have to use comments in proto and generate static docs or something. This is doable (protoc plugins exist to output markdown docs), but it’s not interactive or as accessible to developers as, say, GraphQL’s introspection or an OpenAPI UI.
To conclude on gRPC, it’s a robust technology that covers type-safe interfaces, multi-language stubs, and streaming, but it lags in schema flexibility, self-description, and web-friendliness. With additional tooling (like Connect for browser support, or protoc-gen-openapi for docs), we can patch some gaps. If we prioritize staying close to current practice and maximizing performance, we might consider sticking to gRPC but enhancing it. If we prioritize broad accessibility, easy use by third-party developers, and alignment with IPFS content addressing, we might need to move to a different solution.
OpenAPI (Swagger) and JSON Schema
Another well-established path for defining APIs and data models is OpenAPI (formerly known as Swagger). OpenAPI is essentially a specification to describe RESTful HTTP APIs, including their endpoints, request/response schemas, and authentication methods. It heavily uses JSON Schema to describe the structure of request and response bodies. Many modern web APIs use OpenAPI to publish their interface, and there’s a rich ecosystem of tools around it (for codegen, documentation, testing, etc.).
What OpenAPI/Swagger Provides: An OpenAPI document (typically a YAML or JSON file) lists all the API endpoints (grouped by paths and HTTP methods) and for each, the expected parameters, the request body schema (if any), and the response schema. The schemas are defined using JSON Schema vocabulary (types, properties, etc.), which ensures we have a precise specification of the data format. This directly addresses our need for a strict protocol and clear documentation of data structures.
One of the biggest strengths here is tooling: We can feed an OpenAPI spec into a code generator to get client libraries and server stubs in dozens of languages. For example, Swagger Codegen and OpenAPI Generator support generating clients in over 40 languages (including TypeScript, Go, Python, etc.) and server stubs in many languages as wellswagger.io. This means if we define our API in OpenAPI, we could auto-generate a TypeScript client SDK and a Go server implementation outline, ensuring consistency. The Swagger Codegen site advertises generation of server code in 20+ languages and client SDKs in 40+ languagesswagger.io, which shows how broad the support is. With a few commands, developers can get a starting point and focus on the actual logic rather than writing request/response handling codeswagger.io.
Documentation and Developer Experience: OpenAPI specs can be plugged into tools like Swagger UI or Redoc which produce interactive API docs. This is great for making our hypermedia protocol accessible – developers can read the documentation in a nicely formatted way, try out endpoints (Swagger UI allows making test calls), and see all the models and fields explained. In many ways, this achieves the “self-documenting” goal, although it’s not the data itself carrying the docs – instead, the OpenAPI spec is the source of truth for documentation. Still, it’s automated and always up-to-date if the spec is updated. We can host the docs or even embed them in developer portals easily. Many open APIs use this approach to reduce friction for integrators.
JSON Schema for Data Validation: Since OpenAPI leverages JSON Schema for defining data models, we inherently get a way to validate our IPFS-stored data too. JSON Schema can specify required fields, data types, formats, etc. We could maintain a library of JSON Schemas for our content types. When data comes in from IPFS, we can identify its type (perhaps by a field or context) and validate it against the corresponding JSON Schema. This ensures that signed data indeed conforms to the protocol. JSON Schema isn’t as strongly typed as Protobuf or GraphQL (it’s more about validation than about generating code types with perfect fidelity), but there are tools to generate TypeScript types from JSON Schema, for example. In Go, one might use something like github.com/alecthomas/jsonschema to get Go types, or just manually write Go structs that match the schema (and use a validator library if needed).
Real-Time Extensions (AsyncAPI): By default, OpenAPI focuses on request-response HTTP APIs. It doesn’t natively describe push or streaming over WebSocket (HTTP itself is request-response). However, there is a parallel specification called AsyncAPI which is designed to document and design event-driven APIs (like pub/sub systems, WebSocket APIs, MQTT, etc.). AsyncAPI is essentially to messaging what OpenAPI is to REST. We can use AsyncAPI to define channels for events, the message schemas for those events, and how clients subscribe or publish. In fact, AsyncAPI schema definitions are a superset of JSON Schema, so it aligns well with the OpenAPI data modelsasyncapi.com. We could, for instance, have an AsyncAPI spec that says “there is a WebSocket channel at /stream, and if you subscribe, you might receive messages of type X, Y, or Z, defined by these schemas.” The AsyncAPI Initiative provides a generator too, which can produce documentation or even code from an AsyncAPI specasyncapi.comasyncapi.com. They have template-based generators for multiple languages (JavaScript, Java, Go, TypeScript, etc.) to help create server or client stubs for handling the eventsasyncapi.com. So, by combining OpenAPI for the regular RPC calls and AsyncAPI for the realtime streams, we could cover both halves of our protocol in a formal way. Many tooling platforms (like SwaggerHub or Postman) are starting to support AsyncAPI alongside OpenAPI, reflecting the need to document event-driven aspects of APIs.
Integration with IPFS and Offline: OpenAPI/AsyncAPI specs are just files (JSON/YAML), so we certainly could put them on IPFS for distribution. There’s no built-in mechanism for a piece of data to declare “here’s my schema, go fetch it”, but we could invent conventions (like including a content hash of the schema in the data, which someone could use to retrieve the schema from IPFS). JSON Schema itself supports $ref links, which could theoretically be URLs or IPFS URIs. For example, a data object could include a $schema property with an ipfs://… link to its schema. That’s not standard, but it’s an idea to explore if we want truly self-contained data+schema.
Even without that, maintaining the OpenAPI/JSON Schema definitions in the project and versioning them would give a clear contract. If a node is offline but has some data and the schemas pinned, it can still validate. OpenAPI doesn’t automatically enforce data validity at runtime; it’s more a design-time artifact. We would need to use validators in code to enforce that e.g. incoming data or outgoing data matches the schema (this is something we could integrate into our codegen or runtime).
Development Workflow Considerations: If we use OpenAPI, we have two main approaches: design-first or code-first. In design-first, we’d manually write the OpenAPI spec and then generate code stubs. In code-first, we’d annotate our Go code (or use something like gRPC with protobuf annotations) to generate an OpenAPI spec from it. Many teams prefer design-first for a new protocol because it forces clarity up front. The risk is duplication of effort (keeping spec and code in sync), but with codegen, one can largely eliminate one side of that (i.e., generate code from spec or generate spec from code, but not maintain both manually).
Given our scenario of wanting a unified type system, we could define all our data models as JSON Schemas in an OpenAPI Components section, and reuse them across endpoints. This is similar to how lexicon does it, just using the OpenAPI vocabulary. If we ever needed to share these schemas externally (say others building services in our ecosystem), JSON Schema is a well-understood format they could use, and OpenAPI is a standard for APIs that many know how to read.
Pros vs Cons: OpenAPI’s pros are wide adoption, lots of tools, easy documentation generation, and use of JSON (which aligns with our DAG-CBOR JSON-like data). It can definitely model our domain. It’s not inherently tied to content addressing or P2P, but it’s flexible enough that we can incorporate identifiers or references in our models. The con might be that OpenAPI by itself doesn’t cover streaming; we’d be bringing in AsyncAPI for that, which is a slightly separate document/spec (though we can integrate them conceptually). Also, OpenAPI doesn’t auto-generate code at runtime – it’s a build-time artifact (unlike GraphQL introspection which is runtime). So if our network adds new methods, we’d have to distribute an updated spec and regenerate clients; whereas GraphQL or lexicon could allow dynamic querying of what’s available. However, in a versioned protocol, that’s expected – we’d version our API and issue new specs for new features.
Another consideration: writing and maintaining a huge OpenAPI spec could become cumbersome. It’s verbose, especially if we have many roles and endpoints. It might actually be comparable in complexity to maintaining lexicon JSON files, though. Either way, we need to manage a bunch of definitions. At least it’s text-based and could be split into multiple files for organization.
Finally, automated documentation is a big win here: not only can we create nice docs sites from OpenAPI, but the spec itself can contain descriptions for every field and endpoint, which ensures our documentation doesn’t drift from the implementation. This helps the “robustness and sanity” goal by having one place to look for what everything means.
In summary, OpenAPI (with JSON Schema) is a very viable approach to define our hypermedia protocol. It brings the benefit of huge ecosystem support (codegen in TS/Go, documentation UIs, validators). By supplementing it with AsyncAPI, we can incorporate the real-time event streams as well. The major difference from something like lexicon or GraphQL is that it’s a bit more static (you generate artifacts from it rather than query it at runtime) and it doesn’t inherently enforce or include data type references at runtime. We can work around those issues with conventions if needed. This approach would be more familiar to many developers and might be easier to adopt incrementally (for example, we could start by documenting what we have with OpenAPI/JSON Schema and gradually enforce it more strictly).
IPLD Schemas and Custom Solutions
Considering we are using IPLD (InterPlanetary Linked Data) for our data, it’s worth examining IPLD Schemas themselves. IPLD Schemas are a type system and schema language provided by the IPFS/IPLD project to describe data structures that live in content-addressed storageipld.io. They allow one to define types (structs, enums, maps, etc.) which map onto IPLD’s data model (which is basically a superset of JSON’s data model with support for binary and links). An IPLD Schema can be used to validate data, guide code generation, and ensure that data across the network follows a certain structureipld.ioipld.io.
Structural Typing vs. Tagged Data: One interesting design decision in IPLD Schemas is that they use structural typing, meaning the schema is usually not embedded in the data; rather, data is valid if it “fits” the shape defined by the schemaipld.io. This contrasts with the AT Protocol lexicon approach (which uses a $type field for nominal typing). The advantage of structural typing is that you can validate “any old data” against a schema even if the data itself doesn’t declare what it is – useful if you have data at rest and you want to see if it matches a known version of schema. The downside is the data isn’t self-describing; you need external context to know what schema to apply. For our purposes, we probably prefer the data to carry a type hint (to be self-documenting), so we might combine approaches (IPLD Schema for the structure + including a type field in the actual data instances).
Code Generation: The IPLD project has been working on codegen tools. For example, there’s go-ipld-prime which can generate Go types from an IPLD Schema so that you can easily marshal data to/from those types and validate themgithub.com. There’s also a js-schema-gen for generating TypeScript interface definitions from an IPLD Schemagithub.com. This is promising, because it directly addresses our multi-language type safety for data. If we define an IPLD schema for, say, a “Profile” or a “Post”, we could generate Go struct definitions and TS interfaces for those. They would natively handle links (CIDs) as a type, which is nice since IPFS links are first-class in IPLD (something general JSON Schema doesn’t have a notion of, except as strings).
Custom RPC Layer: If we go this route purely, we’d still need to define the API calls. IPLD Schemas cover the data format and validation, but they don’t specify “RPC methods” or how data is communicated. We would have to design our own protocol for that – perhaps using libp2p’s request/response or pubsub. We could invent a small RPC mechanism (maybe using JSON over HTTP or using GraphQL-like or JSON-RPC, or even reuse gRPC with an IPLD codec). This becomes a build-your-own scenario: we’d be picking and choosing components and integrating them ourselves.
Is Building Our Own Worth It? The benefit of a custom solution is ultimate flexibility – we tailor the type system to exactly what we want, and we integrate tightly with IPFS/libp2p. For instance, we could decide that every message in the network is actually an IPLD block (CBOR) that includes a type and payload, and we define a set of message types for various actions. We could then just send those blocks over libp2p streams or pubsub. This would be very “native” to IPFS. In fact, libp2p allows defining custom protocols (with protocol IDs) so we could create, say, a /myprotocol/x/0.1.0 for requests. We might not need HTTP at all between nodes (only for web clients we’d provide an HTTP gateway or similar).
However, building a custom schema and protocol layer is a lot of work. We’d have to implement:
The schema compiler (or use IPLD’s) and maintain the schemas.
Our own codegen or runtime libraries for TS and Go (though IPLD provides some basis, we might need to extend it).
A mechanism for automated documentation (likely we’d have to write our own doc generator from the schema definitions, since it’s not mainstream like OpenAPI).
A whole custom client library for browsers if we don’t use an existing pattern (whereas if we used GraphQL or OpenAPI, many client tools already exist).
Effectively, this could become an entire standards project by itself, which might be overkill given our resources. The AT Protocol Lexicon is actually an example of someone doing this kind of heavy lifting (they made a new schema language, new networking patterns like XRPC and CAR file sync, etc., but they did it so others don’t have to redo it).
One idea would be to use IPLD Schemas underneath something else. For instance, we might embrace lexicon or OpenAPI for the top-level API, but use IPLD Schema definitions for the data structures to ensure compatibility with IPFS. Lexicon’s data model is actually very IPLD-like already. If we were to create our own, it might end up looking a lot like lexicon. For example, lexicon even calls out that it’s not using full RDF and is simpler, and it can be converted to JSON Schema/OpenAPIatproto.com. That suggests lexicon is kind of a custom layer built on JSON Schema logic with extras for atproto. We could do something similar – e.g., start with JSON Schema or IPLD Schema as a base and add a notion of RPC and streams. But again, that’s reinventing what lexicon already did.
Offline Schemas in IPFS: If we rolled our own, one neat thing is we could design it such that schema files themselves are content-addressed (which they would be if stored in IPFS) and we reference them by CID or a human-readable key that can be resolved. For example, a data object might say "$schema": "ipfs://bafy...someCID" pointing to the schema document. This would allow a truly peer-to-peer resolution of “what does this data mean.” It’s an appealing idea for decentralization. The AT Protocol considered something similar with NSIDs (namespaced IDs) for lexicons, which could potentially be fetched from a well-known location or registry. It’s not trivial to implement global schema discovery in a P2P network, but IPFS could host the schemas and a mapping from names to CIDs.
Complexity vs Benefit: A custom approach would give us a chance to create exactly what we want. But we have to question if the ecosystem benefits outweigh the cost. We wouldn’t be able to leverage as many existing libraries or community knowledge. Every new developer or implementor would have to learn our home-grown system. If there is an existing solution (or combination of solutions) that covers ~80-90% of our needs, it might be wiser to adopt that and perhaps extend it slightly, rather than start from zero.
For completeness, one could also consider other niche ecosystems:
Cap’n Proto (another IDL like protobuf, with an RPC layer) – very fast, but even less web-friendly and not widely used with IPFS.
JSON-LD/Hydra (linked-data APIs) – allows data to carry context for semantics (self-describing via linked vocabularies). This is hypermedia in the pure REST sense, but it doesn’t solve codegen or type-safety in the way we want, and it can be quite complex.
gRPC with custom codecs – possibly using an IPLD codec so that what’s sent over gRPC is DAG-CBOR data. This is an exotic hybrid that could let us use gRPC streaming and services but still send data in the exact binary form we store it. It would, however, confuse the normal expectations of gRPC (which usually expects proto-encoded data). Not impossible, but unusual.
Given all these, the custom route is powerful but likely the most effort with uncertain payoff. We should lean toward it only if no existing framework can be adapted to meet the requirements. At this point, it looks like Lexicon or a mix of OpenAPI/AsyncAPI (or GraphQL) can fulfill what we need with far less invention.
Recommendation and Next Steps
After examining the options, we have a clearer picture:
The AT Protocol Lexicon approach appears to tick most of our boxes: it’s built for a network of peers with varying roles, it defines both data types and API methods (including real-time streams) in one schema language, it encourages self-describing data (with $type fields)atproto.comatproto.com, and it has existing tools for code generation (especially for TypeScript, and potentially adaptable to Go)atproto.blue. It aligns with our use of IPFS/DAG-CBOR and offers a pathway to automatically generate documentation or at least keep schemas as the source of truthatproto.com. The downside is it’s a newer ecosystem, but being relatively new could also mean we have room to influence or extend it to our needs. If we choose lexicon, we could likely bootstrap quickly by defining our own lexicons for our protocol and using Bluesky’s lexicon parser/generator to produce initial libraries. We’d also gain compatibility with some existing ideas (for example, if one day we want to interoperate with AT Protocol or leverage their infra, we’d speak a similar language).
GraphQL is a strong candidate for client-server interactions and would make the API very accessible to developers (with tools like GraphiQL, etc.). It provides real-time via subscriptions and excellent documentation via introspectionadhithiravi.medium.com. However, it is less naturally suited for peer-to-peer node interactions and doesn’t inherently use content addressing or have a notion of our DAG-CBOR data model. We could certainly use GraphQL for the client API (exposing queries and mutations that internally fetch from IPFS data stores), but for node-to-node, a simpler replication protocol might still be needed (Bluesky, for instance, uses a custom “firehose” stream for nodes rather than GraphQL for that part). GraphQL could still be part of our stack (maybe as a layer on top for developers), but alone it won’t solve schema distribution or low-level data validation of offline content. It could complement an underlying schema system: e.g., we define GraphQL types that correspond to IPLD schema types, and use GraphQL just as an access mechanism.
gRPC/Protobuf could be continued for internal communications. With new tools like Buf’s Connect, we can improve browser support and even serve a JSON endpoint for the same proto servicebuf.build. Yet, gRPC lacks the human-friendly, hypermedia feel we’re aiming for. It’s great for tightly-coupled microservices but not as great for an open protocol. If we were to use gRPC heavily, we’d need to invest in additional layers: generating OpenAPI from proto for documentation, handling schema evolution carefully, etc. This double-work makes me lean away from gRPC as the primary face of the protocol (though using it under the hood is not off the table for performance-critical channels).
OpenAPI + AsyncAPI is an attractive alternative to Lexicon if we prefer established standards. By writing an OpenAPI spec (for our RPC calls) and an AsyncAPI spec (for our streams), we can get broad tool support. Developers are generally familiar with Swagger-style APIs, and the learning curve might be lower than something like lexicon. We would, however, need to enforce that the JSON Schemas defined in OpenAPI correspond strictly to what’s stored in IPFS. This is doable – we essentially treat those schemas as canonical. We’d likely build or use validators to check incoming data. The documentation story is excellent here, and multi-language codegen is readily availableswagger.ioswagger.io. The main drawback is that OpenAPI/AsyncAPI aren’t inherently designed for content-addressed networks. We’d have to introduce conventions (maybe include content identifiers in our models, or store the specs on IPFS for discovery). Another minor drawback is fragmentation: we’d have one spec for REST endpoints and one for events, whereas lexicon combines them in one. But that’s not too bad.
Our Recommendation: Given all factors, the best path seems to be building upon an existing schema ecosystem that closely matches our needs, rather than starting from scratch. In particular, adopting the AT Protocol’s Lexicon framework (or a variant of it) is highly recommended. Lexicon was essentially created to solve the exact problem we have – “a way to agree on behaviors and semantics in an open network”atproto.com – and it avoids us having to design a new IDL from zero. We can leverage their work to define our own lexicons for our protocol’s specific domains. By doing so, we get a lot for free: a clear schema language, existing codegen for TypeScript (which we can use and possibly extend to Go), a mechanism for embedding types in data, and compatibility with DAG-CBOR and content addressingatproto.blue. We also stay aligned with a growing ecosystem in decentralized tech, which could mean more community support down the line.
Concretely, the steps would be:
Define our schema using Lexicon JSON files: Write lexicons for each major component of our system (for example, one for “Archive Node API”, one for “Indexing Service”, etc., each with their records and procedures). Use the lexicon syntax for records, queries, procedures, subscriptions as needed.
Use or build code generators: Utilize Bluesky’s lexicon-cli or the atproto/lexicon npm packagenpmjs.com to generate TypeScript types and clients. For Go, we might write a small generator that reads the lexicon JSON and outputs Go type definitions and interface stubs (we can model this after how the TS one works).
Incorporate lexicon runtime: Use the concept of $type in our data objects so that wherever we serialize data to IPFS, we include the type referenceatproto.com. We might assign our own NSID namespaces (like com.ourproto.*) for our record types and methods.
Schema distribution: Publish our lexicon files in a well-known repository (and maybe on IPFS for permanence). Developers or nodes can retrieve them to know how to validate data. We might also include the lexicon definitions in our code repos for easy reference.
Automated docs: Since lexicon is convertable to JSON Schema/OpenAPIatproto.com, we can create a build step to transform our lexicon files into an OpenAPI spec or a nice Markdown documentation. This way we get human-readable docs without writing them by hand. Even if there’s no off-the-shelf converter yet, writing a script for it is feasible (Lexicon JSON is fairly straightforward to map to an OpenAPI structure).
Real-time: Implement subscription endpoints as defined in lexicon, likely using WebSockets or server-sent events. We can use the lexicon schema to validate the messages that go through these channels.
Type safety and testing: With generated types in TS and Go, our implementations and clients will catch mismatches early. We’ll also use the lexicon schemas in tests to validate that our serialized data (DAG-CBOR objects) indeed conform to the schema definitions (this ensures our IPFS-stored data is always protocol-compliant).
In doing all this, we essentially adopt a solution that provides the “documentation, robustness, and sanity” we strive for. We’ll have one canonical source for what each piece of data means and what each API call does, and all code and documentation will derive from that source.
As a secondary recommendation, if for some reason lexicon proved too limiting or if we want broader adoption, OpenAPI/JSON Schema would be the next best choice. We could achieve much of the same goals with it, at the cost of slightly more manual integration work for real-time features and perhaps not having self-describing data by default. However, it leverages a huge ecosystem and might be easier for outsiders to adopt. It’s a valid approach especially if we think lots of third parties will integrate and we want to meet them in their comfort zone (HTTP+JSON). Even in this scenario, we could still use JSON Schema to validate IPFS content and maybe add a field in objects to indicate which schema version they use.
GraphQL is an excellent tool but likely, we would use it in a more limited capacity (e.g., as a convenient query mechanism for clients on top of the underlying protocol). It doesn’t on its own address content addressing or offline schema needs as directly as lexicon or OpenAPI with IPFS does. If we do adopt lexicon fully, we might not need GraphQL at all; the client SDK generated from lexicon would provide similar convenience (remote calls that look like local function calls).
In conclusion, to bring coherence and reliability to our new hypermedia protocol, building on an existing schema-driven approach is the way to go. The AT Protocol’s Lexicon stands out as a tailored solution for networks like ours, offering a unified type system and API description language that covers records, RPC, and streaming with content-addressable supportatproto.comatproto.com. By leveraging such technology, we can save effort, ensure consistency across implementations, and provide future developers with clarity (via generated docs and code) on how to interact with our system. It will give our project a solid, extensible foundation.
All that remains is to dive in and start modeling our protocol in the chosen schema language. As we do so, we should keep the schemas under version control, perhaps even versioned on IPFS for permanence. Over time, this will become the single source of truth for our hypermedia protocol, and any change will be clearly reflected (and must maintain backward compatibility rules, which schema systems like lexicon and OpenAPI encourage). With this approach, we’ll effectively introduce a “protocol lexicon” for our network, bringing the much-needed meta-layer to explain and validate everything that happens in our IPFS+libp2p hypermedia world.
Citations