Planning my own programming language.



During my years of writing software, I’ve come across a few issues with how we write software. Since I’ve covered my reasoning before, I will only summarize them here: type safety is nice, but doesn’t extend past the memory of a single machine. I therefore want to introduce “holistic programing”: instead of writing individual programs, we write the entirety of the multi-machine system.

There’s a few more things I want for this language.

Additionally, I want to adopt unison’s hash-based function identifiers. Rather than using names to identify functions, it generates hashes based on its contents. This way there is a single source of truth for function references, and human-readable formatting is decoupled from program specification. Hash-based identifiers should make some things a lot easier, with modest up-front development time, as the people who made unison already did the hard work of figuring out what to do. Among other things, it should simplify modules, and once I get to it, database changes.

So, with the feature set of the core language broadly laid out, where do I start?

I decided not to fork Idris 2. Not just because I’m a sucker for peer pressure, but also because Idris 2 is a bit too far from where I want to go, and the compiler seems a tad too complicated to boot.

That said, I will use it to build my language. I sorely missed dependent types the last time I decided to make a functional programming language and there are not many languages that support them. Plus, none of the problems I’ve encountered with Idris in the past would be relevant for a compiler. That’s no coincidence, since Idris 2 is self-hosted, so any such problems are likely caught by the developers.

Since I’m starting from scratch, I should begin by making an AST (Abstract Syntax Tree, the type representation of the language syntax), followed by an evaluator and a parser. I will also need to research & implement a type calculus (likely quantitative type theory), and I will need to implement hash-generation. That seems like a solid beginning. Concurrently, there is a fundamental issues I need to resolve: “real” async interop, particularly concurrent database use.

With real async interop, I mean operations that are fundamentally asynchronous. A Turing-complete language may be able to compute anything that is computable, but that does not include real-world interactions, nor does it mean there is a clear mapping from/to asynchronous and functional representation. For the most part, I am fine with the worst case scenario where I just cannot do this, with one exception: concurrent database use.

Databases are pretty common, and so is having multiple users/processes accessing one concurrently. Unfortunately I am not aware of any calculus that expresses concurrent database read/write in such a way that meaningful analysis and optimization can be performed. Linear types can sort-of model database changes, but not concurrent ones.

someWriteOp :: TableTable 
someWriteOp (row : rows) =
    row' = doSomethingToRow row
    (row' : rows)

The above may be correct, but consumes the entire table for just 1 row. Native lenses may be of help, but I rather stay away from anything native.