Tech

What is the HyVM and why is it so powerful for DeFi?

The HyVM is a smart contract written in Huff that acts as an EVM emulator built on the EVM. Find out how it will change DeFi.

Rudy Kadoch

Oct 18, 2023 • 8 min read

Welcome to the first of our series of technical articles exploring the inner workings of the Mass app and what makes it so unique. First up we have the HyVM - the beating heart of Mass.

TLDR: If you're only interested in a technical explanation of how the HyVM is implemented, just jump to "Deep dive into the HyVM" section.

🚀 Some context

There are plenty of amazing DeFi protocols out there. DeFi-natives know about them and use them on a daily basis. BUT:

Interacting with them requires you to learn how they work.
Interacting with multiple protocols (if not impossible) is often tough in a single transaction.
Once you've set up a complex strategy, it is tedious to reverse-engineer what you've done and get out of it manually.
For non-DeFi natives, complex strategies are simply out of reach, even when the scheme is explained clearly by someone they trust. There is just too much to learn. Too many platforms to understand to gain the confidence to set off on their own.

At Mass, we aim to provide a one-stop-shop for all DeFi protocols and to abstract that complexity away.

This leads us to wonder:

How do we provide maximum flexibility and features, without having to audit, deploy and maintain nightmarish smart contracts that are essentially supposed to do everything?

That is why we built the HyVM.

❓ What is the HyVM?

The HyVM is a smart contract that acts as a virtual machine. It is able to dynamically execute instructions (EVM bytecode), without having to deploy a dedicated contract to execute them.

In simpler terms, the HyVM avoids the need to create contracts for temporary & evolving use cases while providing more flexibility!

It is particularly well adapted for contracts with infrequent function calls.

The HyVm solves a big issue many dApps are facing every single day.

why does multicall have to actually exist on-chain?

worded differently:

how can we execute arbitrary non-state-changing code in a live EVM context?
— z80 (@0xz80) July 20, 2023

Of course, allowing users to run arbitrary bytecode on a contract that multiple users share would not be a feasible solution due to security risks. That is why you'll need a security model based on user segregation (which prevents using such an approach for protocols requiring all users' assets to be stored in a single contract, for instance).

If, however, your use case allows for a configuration where users have their own private contract, you will gain maximum flexibility with minimal on-chain maintenance by implementing the HyVm.

🔐 Mass’ security model

We've achieved user asset segregation via a simple approach. When an user wants to interact with Mass, here is what’s happening:

Each user creates its own segregated storage called Mass Smart Account (MSA). Each MSA acts as a wallet owned by one user.
Each user creates its own copy of the HyVM deployed - which is a proxy to a single implementation.

The copy of the HyVm freshly created can only be called by its associated MSA.

The HyVM is the only contract that is authorized to be called in delegatecall from the MSA.

Let’s now look at a practical example:

🔁 Without the HyVm vs With the HyVm

Without The HyVM:

When not using a MSA running the HyVm, a user can call contract functions directly or do a multicall with the assistance of a helper contract. However, there may be some limitations to consider when using these functions.

For example, in this scenario, the user wants to perform the following actions:

1 . Approve UniswapV3 router to spend half of my USDT.

2. Swap USDT into WETH.

3 . Approve Morpho’s contracts to supply WETH.

Supply WETH into Morpho.

The user has two options for executing these actions without the HyVM:

Make 4 separate calls - 4 transactions (2 approvals + 2 calls).
Make a single delegate call to a contract that holds the logic for all four steps - 2 transactions to sign (1 approval + 1 delegate call).

Now, without the HyVM, if we want to make this transaction again but want to spend 1/4 of our USDT at step 1, we would need to update and deploy a new contract to hold the updated logic.

With The HyVM

This is where the HyVM coupled with the MSA is so powerful. The user is able to give the logic as a parameter.

The user can pass pre-compiled code as a parameter to the function rather than deploying a new contract, as a result, the possibilities for interacting with smart contracts using the MSA are nearly endless and infinitely flexible.

📚 Some prerequisites about the EVM

EVM, contracts, languages, bytecode

If you learned how to program smart contracts, you likely did so using Solidity. But smart contracts and Solidity ARE NOT the same things.

You might have encountered the barbaric acronym "EVM" (standing for Ethereum Virtual Machine)… this is the smart-contract execution engine that powers all EVM compatible blockchains.

Solidity is just a programming language (such as Vyper or Huff to name a few) that is compiled into "Bytecode."

This "bytecode" is run by the EVM, which doesn't know anything about Solidity.

Let's not dive into how to deploy a contract (i.e. constructorin Solidity, which executes code during the deployment). To put it simply, to be run, this bytecode needs to be deployed to a static address. Anyone calling this address later is then able to run it.

How the EVM execution works

nb: If you know how other VMs work, like the CLR (the VM behind languages like C# or F#) or the JVM (behind Java or Kotlin), you'll feel right at home: EVM bytecode is very similar to MSIL or JVM bytecode, although much more straightforward.

A bytecode is a stream of instructions, each one represented by one byte (except push1...32 which take 2…33 bytes).

You can find a complete list of all instructions at evm.codes

Each instruction has a well-defined behavior, which can do one or more of three things:

Consume values from "the stack."
Push a value on "the stack."
Affect the environment: read/write data from "the memory", read/write data from "storage", read data from call data, call other contracts, get data from other sources (We won't go into too much detail).

"the stack" is, as its name suggests, just a stack of 32-byte values that is THE thing you manipulate with instructions. You give instructions on what to do through this stack. For instance, computing40+2 would be done via this stream of instructions:

push1 40 👉 will push the value "40" on stack
push1 2  👉 will push the value "2" on stack
add  👉  will pop two values from stack, and push the result of their addition on stack

"the memory" is simply what you would expect: a linear space where you can store things to be retrieved later in the execution. Several instructions allow writing to memory.

"the storage" is a key-value pair where you can store things. It is persistent throughout executions.

This section does not aim to be a fully-fledged tutorial about the insights of the EVM. I'll leave it at that. It should be enough to give you hints about how it works or relates to other VMs you already know. I'd suggest digging deeper into dedicated resources if you want to learn more.

Now let's dig into the real stuff.

🔍 Deep dive into the HyVM

👉 TLDR: The main contract you'll want to explore is here.

The HyVM is a VM written in Huff.

When dealing with VMs, we often encounter the following concepts, which will be explained in this section.

The "host" program itself is responsible for running the virtualized environment. In our case, it is the HyVM contract (running itself inside the EVM).
The "guest" program is the contract that the host runs without it being able to notice that it is not running in a "normal" environment (it won't notice that it is not a deployed contract). Here, our guest program is passed as call data to the HyVM.
Host & guest memory segregation: the host and the guest need to store information in memory, but they can't know about each other.

A word about memory

A classical EVM contract will be able to allocate memory addresses from 0 to infinity. Meaning that when running as a guest in our VM, it MUST be able to do this if we want it to run as expected.

That said, the host also needs to store things in memory, which must not be accessible to the guest. This begs the question: Where do we put our host's memory when guests need to have [0; +∞] accessible?

To solve this, we have to know how much memory space our host needs (this must be a fixed quantity) and offset the guest's memory by that amount. In other words: The 0x00 address of the guest will, in reality, be stored at a higher address. Then, we'll fix addresses pushed to instructions accessing memory by that amount (more on that later).

Fun fact: OSs (Linux, Windows, etc.) isolate kernel memory from user memory using the same kind of technique.

Breaking down the HyVM codebase

There are several noticeable parts:

MAIN() macro… the entry point
CONTINUE()macro… the "run the next opcode" macro
All the instruction implementation labels (labels beginning with op_)
FIX_MEMOFFSET() macro… responsible for fixing the memory addresses of the guest

Let's dive into this.

1️⃣ MAIN() does only two things:

Setup the host memory loading a so-called "jump-table", which is a mapping between opcodes and their implementation labels
Start the execution by calling the first CONTINUE()

2️⃣ CONTINUE() is responsible for running the next operation

It loads the "execution pointer" from memory (i.e., our execution cursor = the address of the following instructions to be executed in our bytecode)
Then, it moves this execution pointer to the next instruction
Finally, it jumps to the right "instruction implementation" using the jump-table loaded by MAIN()

3️⃣ The instruction implementations

This part is where the actual guest execution will take place.

The key thing to understand is that when jumping to an execution implementation, the host will have left no value of its own on the stack => All the values on stack will have been pushed by the guest contract.

Implementing most operations will be as simple as just running the corresponding EVM instruction and then jumping to the next one. That's why most operation implementations look stupidly simple and are almost always the same. For instance, add:

op_add:
	add   👈  just adds the two values on stack
	CONTINUE() 👈 jump to next instruction

That said, as stated in the "a word about memory" section, all instructions that are reading from or writing to memory need to be fixed to account for the host's memory. This is done by running FIX_MEMOFFSET()macro on all stack values that refer to a memory pointer.

For instance, mload:

op_mload:
	FIX_MEMOFFSET()   👈 fixes the memory pointer on top of stack
	mload  👈 executes the actual mload instruction on fixed address
	CONTINUE() 👈 jump to next instruction

Likewise, other more anecdotal instructions like PC or codecopywill be implemented differently because their underlying value is virtualized (PCaccesses the program counter, which is actually stored in memory, and codecopy will have to load code from call data, not from actual code — which is the HyVM code)

4️⃣ FIX_MEMOFFSET()

Once you understand everything above, its implementation is pretty straightforward, so I won't dive into more details.

Got Questions?

We will soon be releasing our DeFi as a Service (DaaS) offer, where other projects can use our entire stack to build lego finance protocols requiring the highest degree of composability.

Here some protocols that could benefit from the HyVM:

Dex aggregators (Paraswap, 0x, 1inch, etc.)
Perps aggregators (Mux)
Facilitators (InstaDapp, DeFiSaver, Rhino),
Yield protocols (StakeDAO, Convex, Yearn)

Everything will, of course, be open-source.

Want to know more about the HyVM? Would you like to collaborate? Want to help change DeFi?

We’d love to chat! Swing by our Discord server here and say Hi on the #Mass-Tech-Talk channel!