Hosted onseedteamtalks.hyper.mediavia theHypermedia Protocol

The Future of searchSemantic search using vector databases and MCP

    One of the key parts of any knowledge base is to be able to find similar ideas so that the creator can expand them, refute them or build on top of them.

    The problem is that often times the idea is fuzzy and the complex to slim it down to a query search. That's why keywords exists where the creator puts some important words on top of the article to hint readers about the content. However this has flaws since it requires effort from the original creator and even with that effort, it is impossible to summarize complex concepts in just words

    That is why a search has to be semantic, where we take into account the meaning of words. In order to do that we should translate the words to a series of vectors that later we can relate to and measure the relative distance from our now modified query search.

    Vectorize data

    First we have to vectorize the texts using an embedding model. These models translate text to a numeric representation so that we can apply a distance algorithm later given a query (also needs to be vectorized). This vectorizing process is multilingual so we can support major languages and the resulting numeric data would be the same independently of the language used. This is one of the benefits over traditional search, where no matter what languages you use in the query, you will find results with similar ideas no matter the language

      We can use llama.cpp to generate the embeddings. This comes handy as we can ship a llama server in the main daemon and we can call it locally in a way that is compatible with openai api. Then once we have the vectors we can store them in the database. A good enough embedding model would be the BGE-M3 . The drawback of using this approach is that apart from the llama.cpp, we would also need to ship the GGUF file along with the binary (lower resolution model is ~340MB)

      Another solution would be to use onnx runtime and a onnx embedding model. Those are typically lighter but we would have to do some glue logic as they do not offer an out of the box embedder. (tokenize → build input tensors (ids, mask) → run session → pool the token embeddings (most sentence-transformers export a pooling op)

    We can go with option number one but minimizing the impact of choosing a multilingual gguf. The Granite embedding 107m is good enough , supports 12 major languages with a minimal footprint (120MB) The Initial vectorization could be done at any time since the model just requires the plain text without prefixes which we have it in the database (ft5 table). However in every index we can spend a bit more embedding. So we would need to differentiate when we are in a re index from a normal indexing.

    Integration steps

      Sqlite-vec
      Init llama.cpp with the model
      Indexing Process (
      Reindexing process
      Glue Logic
      Tests
      Semantic search api