LuaVela - the LuaJIT fork I've worked on

Publication date:

Recently, IPONWEB open sourced its fork of LuaJIT called LuaVela. The original announcement post can be found here where a lot of details about it can be found. For those who don’t know - my real name is Ilya Daylidyonok, and I’m mentioned in the announcement!

I’ve worked on LuaVela for the last 7 months and now I want to tell about my experience.

Table Of Contents

Intro

Lua has been dear for me for a long time. It’s an amazing language. It’s easy to integrate it with C++, it’s fast, and it’s a joy to write code with it. I’ve used it in my games and other projects for the last 6 years, so it’s my “mother tongue” as much as C++ at this point.

Another thing that makes Lua dear to me is that the articles I’ve written about it were received very well and this has given me a lot of motivation to write more. If you Google “Lua C++”, you’ll see my blog somewhere on the first page (maybe it’ll even be the first result!). That’s how popular the articles have gotten.

I’ve been fascinated by compilers for a long time, and I’ve always wanted to do some work in that field. And this became possible when I’ve started working at IPONWEB.

LuaJIT’s 2GB problem

You don’t usually stumble upon companies working on their own compilers (or even forks of them). One of the reasons IPONWEB choose to do so, is that it hit (in)famous limitation of LuaJIT: its 2GB RAM limit.

LuaJIT was originally written with 32 bit architecture/pointers in mind. When you ran it on 64 bit platforms, you had a limitation: you could only adress 1GB of RAM (because of mmap limitations), so this was LuaJIT’s memory limit. In newer Linux kernels the limit was raised to 2GB, but it still wasn’t enough for some projects IPONWEB did. This became a serious problem by 2015. LuaJIT 2.1 wasn’t stable enough for production use at this point, and other possible solutions to the problem the problem just weren’t good enough. People at IPONWEB decided to fork LuaJIT.

Forking LuaJIT

Lua community is one of the most segmented communities I’ve ever seen. LuaJIT can be partially blamed for that. A lot of people stayed somewhere between Lua 5.1 and Lua 5.2 because they used LuaJIT. LuaJIT got huge performance gains for them, so migrating to Lua 5.2 and Lua 5.3 was not possible.

LuaJIT also has a lot forks. People add optimizations which work well for them, but don’t work that well in general case. People fix bugs, which can’t be easily ported to upstream, because of its cross-platform support and very high standards to which patches must conform (which are justified!).

When LuaVela (called uJIT internally until the release) became yet another fork, people who started it wanted for it to conform to the standard (vanilla) Lua as much as possible. A lot of tests were added to ensure standard conformance. LuaVela is a “drop-in” replacement for Lua 5.1 and LuaJIT. It’s likely that if you replace your Lua/LuaJIT headers in your code, you’ll just get LuaVela to work with your code and might see performance benifits immediately.

Another thing that was done early on was dropping cross-platform support. We used LuaVela for projects which ran on x86-64 Linux only and it was difficult for our small team to try to support all the other platforms.

What I’ve found interesting about LuaJIT

During my first days at IPONWEB, I’ve started digging into LuaVela’s and LuaJIT’s codebase. There is an in-depth e-mail by Mike Pall (the author of LuaJIT) which explains some of the stuff about how LuaJIT works and why it is so fast and good at generating assembly. I’ll go over a few things I’ve found fascinating about LuaJIT’s implementation.

First of all, LuaJIT’s interpreter is written in assembly (in DynASM, to be precise) and it can perform faster than vanilla Lua 5.1 in 2-4 times. One of the reasons for that are some incredible performance optimizations that were done in handcrafted assembly to reduce the number of RAM lookups and stores. A lot of the computations are done using CPU registers - most of the time you have function’s parameters and local variables stored in CPU registers and so a lot of computations are done without accessing the RAM. There are also some optimizations (like expression folding) which are done at script’s initial convertion to bytecode (when your module is loaded by Lua).

Another cool thing is that LuaJIT and C share the same stack and LuaJIT honors C ABI to do function calls. This is one of the reason why Lua/C calls are so cheap there - they’re almost identical to C function calls.

LuaJIT’s interpreter is written in DynASM, a higher level assembly, which allows you to write “macros”, have constants and other things to make your assembly writing process faster, safer and easier to read. You can find LuaVela’s improved interpreter here. A lot was done to refactor and document original LuaJIT’s interpreter, so I think that it’s a useful learning resource even if you don’t plan to use LuaVela in your project.

When it comes to JIT/compilation part, LuaJIT shines again - it uses a huge number of optimizations to make generated assembly fast. It generates linear “traces” - assembly without branches and jumps. The only jumps that are there are exit conditions: you compile a trace using some invariants and assumptions, e.g. that some variable should stay constant or have a certain type. When this assumption gets broken, you exit the trace, and either find another trace or just continue execution in the interpreter.

All function calls in a trace are inlined - this one also gives a considerable performance boost. There is also a huge number of “fold” optimizations which transform things like 2 + 2 + x + x into 4 + 2*x (even function calls, especially to math functions can be folded sometimes!).

There are also built-ins like string.find or math.abs which are either written in C/asm, or are written as C functions which tell LuaJIT which IRs to emit, so you get very efficient assembly as the result of a final trace generation.

For example, calling math.abs essentially turns into few instructions in a trace. There’s no table lookup into math table, there’s even no call to C’s abs function!

LuaJIT’s IR is linear: it’s laid out contiguously in memory. It’s one of the reasons why optimizations and code generation happen quickly and don’t have noticeable performance overhead in most cases.

LuaVela’s new features

What makes LuaVela different from other LuaJIT forks? I’ll quote the original announcement post:

My contributions to LuaVela

Here’s some of the stuff I did in the 7 months that I’ve worked on LuaVela:

local t = {}
for token in ujit.string.split("a,b,c", ",") do
    table.insert(t)
end
-- t == { "a", "b", "c" }
struct some_type* ptr = &some_obj;
printf("pointer: %p", (void*)ptr);
//                    ^^^^^^^
void f() {
    return g();
}

Future

At the moment, the development of LuaVela is finished. We’ll fix critical bugs, but we felt that LuaVela is close to being feature complete and fast enough for most of our use cases, so we’ve moved on to other projects. LuaVela was open sourced as a “thank you” for Lua and LuaJIT community and developers. I hope that some of the unique things LuaVela has will be later ported to other active forks and make the software which uses it even faster and better.

I’ve enjoyed working on LuaVela: I’ve learned a lot about Lua and JIT compilation. I’ve also got some real life C programming experience. It was great.

Thanks for reading!