Planning my own programming language.

drBearhands

2021-11-4

During my years of writing software, I’ve come across a few issues with how we write software. Since I’ve covered my reasoning before, I will only summarize them here: type safety is nice, but doesn’t extend past the memory of a single machine. I therefore want to introduce “holistic programing”: instead of writing individual programs, we write the entirety of the multi-machine system.

There’s a few more things I want for this language.

Accessible: nothing about a language matters if nobody can use it because it is too complicated.
Functional, dependently typed: this is the current standard for type safety. Dependent typing is not difficult, it is just very poorly supported by documentation and examples. It also requires tearing down the mental barriers between types and terms if you have prior experience in strongly typed programming.
Support for linear values: linearity is the best way to handle resources and communication that I know of. In my experience, it is not particularly hard, removes the need for other complicated language structures, and catches more problems that traditional approaches like the with keyword. Unfortunately, few languages support it well.
Architecture agnostic: when creating systems the traditional way, we must architect the solution before writing code, because we need to identify what programs to write before we can write them. That requirement disappears with hollistic (purely declarative) programming. By omitting considerations tied to an architecture or backend from the code, we can postpone optimization and architecting to a later moment. This includes all performance considerations.

Additionally, I want to adopt unison’s hash-based function identifiers. Rather than using names to identify functions, it generates hashes based on its contents. This way there is a single source of truth for function references, and human-readable formatting is decoupled from program specification. Hash-based identifiers should make some things a lot easier, with modest up-front development time, as the people who made unison already did the hard work of figuring out what to do. Among other things, it should simplify modules, and once I get to it, database changes.

So, with the feature set of the core language broadly laid out, where do I start?

I decided not to fork Idris 2. Not just because I’m a sucker for peer pressure, but also because Idris 2 is a bit too far from where I want to go, and the compiler seems a tad too complicated to boot.

That said, I will use it to build my language. I sorely missed dependent types the last time I decided to make a functional programming language and there are not many languages that support them. Plus, none of the problems I’ve encountered with Idris in the past would be relevant for a compiler. That’s no coincidence, since Idris 2 is self-hosted, so any such problems are likely caught by the developers.

Since I’m starting from scratch, I should begin by making an AST (Abstract Syntax Tree, the type representation of the language syntax), followed by an evaluator and a parser. I will also need to research & implement a type calculus (likely quantitative type theory), and I will need to implement hash-generation. That seems like a solid beginning. Concurrently, there is a fundamental issues I need to resolve: “real” async interop, particularly concurrent database use.

With real async interop, I mean operations that are fundamentally asynchronous. A Turing-complete language may be able to compute anything that is computable, but that does not include real-world interactions, nor does it mean there is a clear mapping from/to asynchronous and functional representation. For the most part, I am fine with the worst case scenario where I just cannot do this, with one exception: concurrent database use.

Databases are pretty common, and so is having multiple users/processes accessing one concurrently. Unfortunately I am not aware of any calculus that expresses concurrent database read/write in such a way that meaningful analysis and optimization can be performed. Linear types can sort-of model database changes, but not concurrent ones.

someWriteOp :: Table ⊸ Table 
someWriteOp (row : rows) =
  let
    row' = doSomethingToRow row
  in
    (row' : rows)

The above may be correct, but consumes the entire table for just 1 row. Native lenses may be of help, but I rather stay away from anything native.