Hello, monad!

Intro

In this chapter we’re going to write a hello world program. Doing so in Haskell requires monads, a concept from category theory. You might think this is overly complicated for something as simple as printing some output, but you’d be wrong, printing output is not simple at all. Just think about buffers, encoding, concurrency… If you judge a language by the ease with which you can write helloworld, you’re going to pick a language that hides important “details”.

We’re first going to look at pure functions, i.e. functions without any side-effects, and their types. Purity is the default in Haskell, but does not allow writing output, which is a (side) effect.

We will then cover the IO monad, which is how Haskell handles necessary effects. This will allow us to finally write helloworld.

This chapter is probably the hardest part of this tutorial and what makes it for madmen.

Pure functions

In essence, every function in Haskell has exactly 1 input and 1 output. We can still use “multiple” inputs by making that output itself a function. Let’s look at how this works with addition:

(+) 3 7

Writing two values separated by a space is function application, f a is applying argument a to function f.

Function application in Haskell is left-associative, so the code above is equivalent to:

((+) 3) 7

Here, we have a function (+), to which we apply 3. The result is a new function that takes a number and adds 3 to it. To this new function we then apply 7.

Now it’s time to talk about types. As mentioned, a function only has one input and on output, and we type it using an arrow. A function that takes some type a as input and outputs some b has type a -> b. Consequently, a function that takes a a and produces a second function with type b -> c, has type a -> (b -> c). For our function (+), that means the type must be:

(+) :: Int -> (Int -> Int)

(Note: in Haskell, :: indicates a type declaration)

Since arrows in type declarations are right-associative, the parentheses are superfluous in this case, and we can also write:

(+) :: Int -> Int -> Int

Because we know functions cannot have side-effects, the type declaration tells us pretty much exactly what the function is going to do: it will compute an Int given two other Ints. Nothing else will happen, and it won’t matter when we evaluate this function.

We don’t have to apply all arguments at once. The following is perfectly valid:

add3 :: Int -> Int
add3 = (+) 3

ten :: Int
ten = add3 7

The (+) function is a bit special. Using symbols in parentheses as a function names turns it into an infix operator. We can therefore also write a much more natural looking sum:

ten :: Int
ten = 3 + 7

Finally, a note about lazy evaluation. In the above examples, ten does not have the value 10, rather it is an expression that will result in the value 10. Thanks to referential transparency, the compiler can decide not to compute the value of ten right away, but just pass (a reference to) it’s body (3 + 7) and postpone evaluation to whenever it’s actually needed. Not only does this let us avoid unnecessary computations, but it also lets us use infinite data structures. An example:

infiniteListOf1s :: [Int]
infiniteListOf1s = 1 : infiniteListOf1s

(Note: : is a “prepend” operator with type a -> [a] -> [a], e.g. 1 : [2,3] = [1, 2, 3])

This list is infinite! Because Haskell is non-strict, this is fine. The compiler will make sure we only compute the part of the list we actually need by using lazy evaluation. So as long as we don’t try to read the whole list, it won’t enter an infinite loop.

Effects

Let us now, armed with the above knowledge, look at the project generated by the command stack new.

In app/Main.hs we find the following:

module Main where

import Lib

main :: IO ()
main = someFunc

This should confuse you. So let’s look at the individual parts.

module Main where declares the name of the module. “Main” is a special name, as you probably expect.

import Lib imports the Lib module and adds all the symbols Lib exports to the current namespace. This includes someFunc, but we can’t tell from the import statement, we will be changing this line to something more informative in a bit.

main :: IO () declares that the type of the value called main is IO (). This would quite rightly confuse you about now. Why isn’t this a function? What are these mysterious parentheses? What is an IO?

Why isn’t main a function? As mentioned, functions compute one value from another, but this isn’t really the nature of computer programs. Rather, we want a program that does stuff, whereas a function is just an inert formula. main should be a list of actions / effects, such as writing to stdout or listening for http requests.

Then what is ()? This is a special type called unit. Every type has one or more inhabitants, a boolean has inhabitants True and False, an unsigned 8-bit integer is inhabited by numbers 00 through 282^8. Unit has only 1 inhabitant, the unit. This entails that a value with type unit carries no information. Having a pure function that outputs a unit would be pointless, because we could simply substitute the answer without ever needing to evaluate said function. However, the unit type acts a lot like the number 1, and has a lot of uses when combined with other types. One of those is as used here. Both the type and the inhabitant of unit are written as () in Haskell.

IO, if you’re particularly observant, you may have noticed we seem to have applied it just like a function, but in the type declaration. Just like values have types, types have kinds. The kind of Int, (), String, Char, etc. is *, the kind of IO is * -> *. Just as with function application, the kind of IO (), that is () applied to IO, is therefore *. As for the “meaning” of IO, it turns a type that is computed with pure functions, into one that is computed using effects. We will see how to use impure functions in a bit.

Putting all that together, main, by its type IO () is a unit computed using side effects. Since () contains no information, IO () is just a series of effects / instructions “without” output. That is pretty much what we generally look for in a program!

Fixing the import statement

After all that stuff about the type of main, we turn to look at the next line, the term of main: main = someFunc. This is rather disappointing, as it merely references a single other value. someFunc is exported by the Lib module. We pretty much have to guess that, because with the current import notation, it is not explicit. This becomes a problem once you have more modules. So let’s get ahead of ourselves and change that import to the following:

import Lib (someFunc)

this will expose only someFunc from Lib, and tells us exactly where symbols are coming from. We can also make the import qualified, which forces us to mention the module name explicitly:

import qualified Lib
main = Lib.someFunc

plus it’s totally valid to do both:

import qualified Lib
import Lib (someFunc)

Which will expose someFunc, but allows you to access other symbols through the explicit notation from above.

Combining effects with monads

Now it’s time to look at what is happening in someFunc. But where can we find the Lib module that defines it? If you haven’t changed anything, it should be src/Lib.hs (modules names must match relative path names). You can change which directory are part of your project in package.yaml.

The file should look like this:

module Lib
    ( someFunc
    ) where

someFunc :: IO ()
someFunc = putStrLn "someFunc"

We’ve seen the module declaration before. This one also specifies which symbols to export, someFunc in this case.

someFunc is apparently defined as the application of the String "someFunc" applied to putStrLn. putStrLn is part of the standard prelude, which is implicitly imported in every file. It’s type is String -> IO () and, as you might expect, what it does is write a string to stdout (without flushing buffers).

But how can we have multiple effects? Is there some function with type IO () -> IO () -> IO () that combines the effects in both arguments and that we have to tediously add everywhere? Well, such a function exists but there’s better ways to go about it.

As we’ve seen, IO has kind * -> *. When a type of kind * -> * follows certain rules, we say that it is a monad. In particular, for any monad m, the following functions must exist:

return :: a -> m a
fmap :: (a -> b) -> m a -> m b
(>>=) :: m a -> (a -> m b) -> m b

IO is a monad. Substituting IO for m, we know that we have at least the following functions:

return :: a -> IO a
fmap :: (a -> b) -> IO a -> IO b
(>>=) :: IO a -> (a -> IO b) -> IO b

return simply pretends an pure computation is impure. Similarly, fmap turns a function over pure values into one over impure values. If, for instance, we read some input string, that string has type IO String, but our pure functions only work for String! fmap remedies that problem by lifting functions to IO. (>>=) is a bit more complicated. You can think of it a bit like unix pipes. It takes some impurely computed value, and feeds that value to the next impure computation.

Let’s look at a few examples, take the time to let this sink in:

computeHelloWorld :: IO String
computeHelloWorld = return "Hello, World!"

computeHelloWorldLength :: IO Int
computeHelloWorldLength = fmap length computeHelloWorld

greetTheWorld :: IO ()
greetTheWorld = computeHelloWorld >>= putStrLn

(putStrLn is the output-writing function with type String -> IO ())

For a slightly more useful example we’ll use getLine from the prelude. It has type IO String and will get a line from stdin. We can now do:

nameToGreeting :: String -> String
nameToGreeting name = "Hello " ++ name ++ ", I am monad."

greetPerson :: IO ()
greetPerson =
  fmap nameToGreeting getLine >>= putStrLn

This is not very easy to read, and it would get a lot worse if we were to also write to stdout before reading from stdin. Luckily, Haskell has some syntactic sugar for monads in the form of do-blocks.

greetPerson :: IO ()
greetPerson =
  do
    putStrLn "I am monad, what is your name?"
    personName <- getLine
    putStrLn ("Hello, " ++ personName ++ "! What shall we *do* together?")

The type of a do-block will be the type of its last element.

Side-note

Suppose we have the type IO (IO a). By (>>=), we know that it is equivalent to IO (). Because for any foo :: IO (IO a) we can do bar = foo >>= id, resulting in a bar :: IO a. But by return bar we get foo again! In other words, it doesn’t matter how often you apply a monad to a type, it will be equivalent (technically, isomorphic) to a single application of that monad. This is what the famous phrase “monads are just monoids in the category of endofunctors” means (sort-of). A monoid being something that stays the same (type) under repeated application of a certain operation.

Conclusion

You’ve learned how effects can be represented in a functionally pure language by using monads. It is usually better to avoid using IO monads when possible. Trivially, we could write our entire program using monads, but we would lose many advantages of programming in Haskell.

I would personally argue that an IO monad is not a good way to represent effects, but it’s the current standard for generic functionally pure programming languages. Some interesting alternatives include uniqueness types (used in Clean), free monads (Haskell) and model-update systems (Elm).

You’ve also been introduced to lazy evaluation. There are both advantages and disadvantages to it, but that is outside the scope of this chapter.

Final code branch for this chapter


If you like my work, please consider buying me a coffee or using the brave browser.


previous
overview
next