Decentralization: ketl's unorthodox dance

G'day Internet! In this blog post, I'll explain how we embraced data decentralization at ketl and how we achieved it. We'll go into somewhat technical details, so buckle up. To better understand how ketl works on a higher level, read "Tea, tech, and secrets: how ketl onboards founders and VCs with zk." LFG 🚀

What's up with decentralization?

Most decentralized social networks take one of the following approaches:

  1. Build a centralized version first and then upgrade the pieces to be more and more decentralized until the network is fully decentralized.
  2. Build a new protocol/client/node software that enables decentralization.

At Big Whale Labs, we initially followed the former approach when building Dosu (our previous rodeo at zk-social). However, after tinkering enough with zero-knowledge cryptography and EVM, we realized there was no need to start developing a brand-new social network with a layer that would become a legacy when the product was launched.

As for the federation route with custom decentralization software, the issue here is that you need to get enough people to run the nodes. And even when you get enough people, federation falls short of complete decentralization. Most of the time, your social graph (friends, followers, subscriptions) stays on one of the servers instead of being stored everywhere. God forbid you insult a node/server admin, and they wipe your data!

Building ketl, we followed a different strategy. The whole storage layer comes down to using two technologies that are already decentralized enough for no single point of failure: IPFS and EVM. Technically, users should be able to run their own EVM (mainnet/testnet/polygon/optimism/etc.) and IPFS nodes and keep using ketl if some server owner decides to cut them off from managed nodes. Basically, we're piggybacking on existing mature technologies to empower censorship resistance.

I do not know why we're the first to make it this way. Let's dive deeper into how it all works. The whole code is open source, and I'll reference parts throughout the article.

Storage

True to our initial goal of letting humans store whatever and making machines (clients) parse it, we started with running an IPFS node and allowing users to store any JSON.

  @Post('/')
  async upload(@Body({ required: true }) file: unknown) {
    const { cid } = await ipfs.add(JSON.stringify(file))
    await ipfs.pin.add(cid)
    return {
      cid: cid.toString(),
    }
  }

As you can see, there is no authentication here. We allow anyone to store anything on our IPFS node for now. In the future, we'll add a check here to provide a signature by a ketl user (that only connects this file to a ketl user, which is done anyway when they post the file on the network). After the user uploads JSON to our IPFS node, we automatically "pin" it. Pinning in IPFS lingo means the file won't be removed, ever (or until it's unpinned). As a result of this call, the client gets CID — this is a unique identifier of a file on IPFS.

What can go into these JSON files? Anything, really! The client must verify the format and render these files correctly. You can try to post a "malicious" file CID on the storage smart contract right now — the ketl app will simply filter it out. The main principle here (unlike other web3 social networks) is to allow humans to store chaos and make machines (clients) figure out how to display it. This also makes the app "future-proof" because we might store whatever today and add support for rendering it tomorrow!

After we upload and pin a file to our IPFS node, theoretically, it can be accessed by any IPFS node in the whole network. Here's what a post on ketl looks like:

{
  "text": "Any distance ceo",
  "extraText": "It seems the CEO (Luke Beard) of Anydistance has a reputation for being involved in inappropriate relationships with his employees, only to dismiss them afterwards. \n\nhttps://twitter.com/taylrn/status/1684411758709178370?s=46&t=-Ig6ajz0q7jqQ2z_dL_1Rg",
  "id": "4f2a12ea-edd6-4c50-a9bb-dddb64c051bb"
}

Considering that most of our data are unrelated, we could have safely used a non-relational database. But non-relational databases are so web2, and we want decentralization and censorship resistance! Permissionlessly replicating a MongoDB is a fun weekend project but has so many downsides and peculiarities that achieving it is a different startup idea. But where could we store unrelated CIDs? If you've been in web3 long enough, you know the most obvious answer: screw it. Just store it all on a blockchain!

Obviously, storing data on a blockchain is expensive. Enter layer 2! Storing a small piece of info (basically bytes32) is so cheap that $1 worth of gas can last a user a long time. We decided to go with Polygon Mumbai for the sake of testing, but we're open to experimenting with different layer 2 solutions, including Base and Optimism.

However, we cannot post plain CIDs on EVM. The types don't match. CID is a simple hash of the file and can be deconstructed into a type on EVM in the following way:

struct CID {
  bytes32 digest;
  uint8 hashFunction;
  uint8 size;
}

Now we're talking! We can now store CIDs like CID[] posts in Solidity. Which is precisely what we did:

struct Post {
  address author;
  CID metadata;
  uint timestamp;
  uint threadId;
  uint replyTo;
  uint numberOfComments;
}

Now, the main superclass for post collections in ketl is somewhat complicated, and you can read through the whole smart contract here. Essentially, we noticed that we wanted to have feeds where multiple users could post (e.g., t/startups and t/ketlTeam) as well as profile feeds for each user separately. The profile feeds aren't enabled yet (and we might even remove them later, granted our business direction decision), but overall the post collections allow users to:

  • Post a post to a collection of posts where users are allowed to post (another "post" for good measure; actually, take some more: "post," "post," and "post").
  • Change their profile data (which is a simple CID where you can upload profile data JSON)
  • Pin posts (available to the ketl team on shared feeds and to users on their profile feeds)
  • Add comments to posts (comments are also a type of post)
  • Add reactions to posts and comments
  • Fetch all the added data through accessors and events

For instance, here's how you add a post:

function addPost(
  address sender,
  PostRequest calldata postRequest
)
  external
  onlyAllowedCaller
  onlyKetlTokenOwners(sender)
  onlyAllowedFeedId(postRequest.feedId)
{
  uint feedId = postRequest.feedId;
  // Get current post id
  uint currentPostId = lastPostIds[feedId].current();
  // Create the post
  Post memory post = Post(
    sender,
    postRequest.postMetadata,
    block.timestamp,
    currentPostId,
    currentPostId,
    0
  );
  // Add the post
  posts[feedId].push(post);
  // Add the participants
  participants[feedId][currentPostId].push(sender);
  participantsMap[feedId][currentPostId][sender] = true;
  // Emit the event
  emit PostAdded(feedId, lastPostIds[feedId].current(), post, sender);
  // Increment current post id
  lastPostIds[feedId].increment();
}
  • It's an external function, so it can be called from the outside of the smart contract.
  • However, it is onlyAllowedCaller where only the parent contract of OBSS storage that holds references to other contracts and authorizes users can call this contract.
  • Only addresses that own ketl attestation token can call this method (more on this later when we'll talk about authentication).
  • Only allowed feeds can be posted to (currently, it's the hidden t/dev, public t/startups and t/ketlTeam — but we're considering adding more feeds).
  • We "zip" the data for the post to be posted to a PostRequest that looks like this (it simplifies handling posts in multiple places):
struct PostRequest {
  uint feedId;
  CID postMetadata;
}
  • We create a post, add the post to the storage, update the counters and post participants and emit the post event.
  • After the post is added, clients receive the post event and refresh the local cache of posts.

This is it, folks! Suddenly, you can store data in a decentralized place (IPFS) and structure it on a decentralized ledger (EVM). No formalities and limitations, and full liberty as Satoshi envisioned.

Authentication

ketl was built with pseudonymity in mind: everyone is anonymous but verified. For instance, you can post as a YC founder, but no one can tell precisely which YC founder you are. Authentication consists of two main parts:

  1. Minting ketl attestation token to receive the verification with ZK (we'll dig deeper into this point in one of the upcoming blog posts).
  2. Authenticating the author of a post before the post can be added to the blockchain.

The latter is, fortunately, handled by EVM itself! See, to post to ketl, you need to have an EVM-compatible private key. Whenever you submit a transaction to the blockchain, one can use msg.sender (or _msgSender() in the case they are using OpenGSN) which will be the authenticated and verified owner of the transaction! We let the EVM do its magic while reaping the rewards.

Ecksuseme, ackchyually...

"But wait," you'd say, "who pays for the gas?" Smart question! Big Whale Labs does. The amount of gas to support this whole structure is so minuscule compared to the traditional web2 infrastructure. I can't even start on why doing this any other way is irrational. We use OpenGSN to cover all the gas fees. This is essential as everybody uses burner wallets on ketl — and funding a burner wallet is so cumbersome it's crazy.

The best part about ketl is that you don't notice it's web3 when using it! There are no signatures, popups, or anything. The only place where it's evident that the app uses blockchain as storage is that the login is the seed phrase for the burner. Users also don't notice the cumbersomeness of zero knowledge. It's all hidden — yet as powerful as ever! ZK in ketl grants another technical post, so I won't discuss the details here.

"Isn't blockchain slow, though?" Another valid question! Funny thing, we had heated debates before implementing OBSS (Open Blockchain Storage System — this is what we call the storage layer) on whether it will ever live up to the speed of web2. Turns out, if you use something like The Graph or simply managed RPCs like Alchemy or Infura, they do some weird magic with caching of the events API, which speeds up fetching data so much that at some point, we joked that someone on the team has replaced the decentralized backend with a good old relational database.

Most of the drawbacks of storing data on blockchain seem to have evaporated by 2023, which might be the case for why we're the first ones to build a feasible social network on EVM without limiting the format of stored data.

"But is this scalable?" Well, the truth is, we don't know. So far, so good — and I anticipate that we should be fine until we hit like 10,000 DAU. Then we'll absolutely have to provide users with multiple RPC and IPFS nodes and make ways to specify their own. However, the underlying principles stay the same and straightforward. I don't anticipate overcomplicating either ketl or OBSS too much.

Conclusion

The solution described above might sound too simple to be true. One can poke so many holes in it that no implementation would ever be started. However, being rebels at Big Whale Labs, we said, "screw it, let's do it!" We tried — and, amazingly, it worked. No costly infrastructure and no challenging to maintain federated nodes. We mainly took off-the-shelf components, sprinkled a bunch of zero knowledge, added our expertise in cryptography, and voilà!

The sheer number of people already using ketl gives me anxiety. We have come a long way since founding Big Whale Labs, and it seems like all the experiments finally start paying off! ketl is no longer a proof of concept but a functioning social network built on IPFS and EVM.

Want to try it out yourselves? Go lurk at ketl.xyz. And who knows where else will this journey take us?