Linear algebra with linear types… not great

When writing 3D software, it is quite common to use some linear algebra in performance-critical code. In most languages I’ve used, combining easily readable mathematics with performant code is practically impossible. This is not going to be any different (yet) in Idris.

Consider the formula

vout = vin * vs + vt

Ideally, we want code to look exactly like this formula, though I will make a concession for the lack of bold font or arrows to denote vectors.

In languages without operator overloading, like Java or JavaScript, we are immediately reduced to much less named functions

let vOut = Vec3.add(Vec3.mult(vIn, vScale), vTrans);

But performance of this code will be poor if your language is pass-by-reference. Both Vec3.mul and Vec3.add, if used in this manner, will allocate a new value, which is a relatively slow operation and not something we wish to do in performance-critical code. That is why fast linear algebra libraries, like glMatrix, will support output vectors to store the result. We are left with:

//Assume vTemp and vOut were already created
Vec3.mult(vTemp, vIn, vScale);
Vec3.add(vOut, vTemp, vTrans);

Ugly but relatively fast.

For languages that support pass-by-value, like C++, we can pass the vectors as values and get fairly nice looking code (using operator overloading, which C++ does support):

vOut = vIn * vScale + vTrans

Now it’s not perfect, but it will do for simple linear algebra (though I still hold that Latex-level formula would be a great for readability e.g. when debugging Chebyshev’s inequality to create VSM shadows and all you ASCII punks need to suck it).

The downside of the pass-by-value approach is that with larger types (longer vector or matrices) copying values up and down the stack can be more expensive than passing a reference, so this readability trick only really works for sufficiently small values.

Functional languages are even more awkward. Since values are immutable, and we often do not know when they go out of scope, we must use allocation for simple vector math. As I understand, allocation can be much faster in functional languages with a moving garbage collector like Haskell, but those will pause execution, which is probably undesired for 3D applications.

Linear types (the fact that they share the “linear” denotation with linear algebra is coincidence) can help us here, because they allow us to express mutation, but using them in Idris has been hell for code clarity. The types for addition and multiplication are as follows:

add : (1 _ : Vec3) -> (1 _ : Vec3) -> (1 _ : Vec3) -> LPair Vec3 (LPair Vec3 Vec3)
mult : (1 _ : Vec3) -> (1 _ : Vec3) -> (1 _ : Vec3) -> LPair Vec3 (LPair Vec3 Vec3)

which result in:

  vTemp # (vIn # vScale) = Vec3.add vTemp vIn vScale
  vOut # (vTemp # vTrans) = Vec3.mult vOut vTemp vTrans

The need to consume and recreate linear types is really hurting legibility of the code. The particular way in which Idris deals with FFI also makes me really nervous about performance.

Back in my 4th dev log for this project, I mentioned how an “in” keyword, particular in the context of FFI, could greatly improve code clarity and even performance when handling linearity:

add : (1 _ : Vec3) -> (in _ : Vec3) -> (in _ : Vec3) -> Vec3
mult : (1 _ : Vec3) -> (in _ : Vec3) -> (in _ : Vec3) -> Vec3

-- [...]
    vTemp = Vec3.add vTemp vIn vScale
    vOut = Vec3.mult vOut vTemp vTrans

We could do something similar for an implicit out argument that’s syntactic sugar for passing an argument as input that has the same name as the output:

add : {out _ : Vec3} -> (in _ : Vec3) -> (in _ : Vec3) -> Vec3
mult : {out _ : Vec3} -> (in _ : Vec3) -> (in _ : Vec3) -> Vec3

-- [...]
    vTemp = Vec3.add vIn vScale
    vOut = Vec3.mult vTemp vTrans

And we might go even further, but this “fix” ultimately misses the point.

The real problem here is that we’re failing at separation of concerns. The same code/language that expresses what has to be computed must also worry about memory management. With our current programming languages this is probably inevitable. OOP and FP both promise this separation of concerns, but we can see above that both have failed to deliver, because we must adapt our specification to account for performance. We are still, essentially, writing machine instructions, even though the machine may not be fully specified.

I think the reason this issue wasn’t as relevant when I made e.g. programs linear, is that operations on programs are machine instructions, there isn’t really a what, just a how.

I’m toying with the idea of replacing vector values with vector generators, where e.g. v1 + v2 is not evaluated to a new vector, but to a vector program. This is similar to the approaches of Accelerate and TensorFlow. On the flip side, I don’t think I could get rid of the overhead, and I expect much smaller computation loads than aforementioned libraries, so overheads could be very significant. The added benefit of using vector generators is that the generator could not only be evaluated, but also be turned into a Latex formula.