Thoughts on virtual machines

June 27, 2023

I think about VM's a lot.

I don't mean the kind you run an OS in (a system virtual machine), I'm talking about process virtual machines.

Maybe you do too, or you wanna know more.

So, I'm going to dump the things I've discovered into a blog post, to help inform others how the sausage is made.

And yes, I have made VM's before. I have made both stack and register-based VM's. I'm actually thinking about making a JIT'ed VM, just to brush up my low-level platform skills.

Register vs stack-based VM's

For those who don't know, there are two main kinds of VM's: stack-based and register-based.

Stack-based VM's

Stack-based VM's manipulate a stack to store all data. It's not the most efficient way, but it works and is very simple to implement. CPython, PyPy, Perl, Ruby, CLR, PostScript, the JVM, and most others you'll encounter are stack-based.

Register-based VM's

Register-based VM's use registers to store data. They're not quite as common because they can be quite complex (more on that in a bit). A few register-based VM's you may know of are LLVM, BEAM (used by Elixir and Erlang), Dalvik, and Lua.

Side note

Many register-based VM's do have a stack to spill registers. Likewise, with a JIT, stack-based VM's often do register allocation. The difference can be quite muddied and sometimes not always so clear.

Personal opinion

You know, I really like register-based VM's in concept, but I can kinda see why register VM's aren't as popular.

I like registers much more than I like a stack. This is just a personal preference, but I've always found registers more easy to reason about than stacks.

Register-based VM's tend to be faster, because they're closer to how the machine actually works. However, they're harder to write a good JIT for.

JIT for a stack-based VM

A stack-based VM JIT is relatively straightforward. You can just “cache” hot stack elements in a register and operate on them, without having to repeatedly manipulate the stack. You know how many registers the hardware has, so you know how much you can cache. And when you need to spill back to the stack, it's relatively straightforward.

JIT for a register-based VM

This isn't so clear.

You often have thousands of registers to contend with, and you need to decide what needs to be spilled and what doesn't.

You get 6-7 registers on x86_32 to play with, so register allocation is obviously much more annoying on this platform. x86_64 has 16, and most RISC-y architectures have at least 32, so it's not as big of a problem on these.

Deciding where things should go on the actual hardware stack, what to keep in registers, etc. is not as straightforward. It's not like, intractable, it's just much more of an art (although optimal register allocation is an NP-complete problem, you don't need perfection, you just need it to be good enough).

Conclusion

I don't really have a conclusion for anything. I think stack-based and register-based VM's both have a place in this world. I think they're both neat! I have my biases, but I think that VM's are fascinating and I enjoy thinking and reasoning about them.

I hope I've learnt you some stuff about them too! 💜

If you have any questions, comments, corrections, etc. drop me a line on fedi. This blog is also available on Fedi at @elizafox@blog.elizafox.space.

— Elizabeth Ashford (Elizafox) Fedi (elsewhere): @Elizafox@social.treehouse.systems Tip jar: PayPal || CashApp || LiberaPay