Hello, monad!
Intro
In this chapter we’re going to write a hello world program. Doing so in Haskell requires monads, a concept from category theory. You might think this is overly complicated for something as simple as printing some output, but you’d be wrong, printing output is not simple at all. Just think about buffers, encoding, concurrency… If you judge a language by the ease with which you can write helloworld, you’re going to pick a language that hides important “details”.
We’re first going to look at pure functions, i.e. functions without any side-effects, and their types. Purity is the default in Haskell, but does not allow writing output, which is a (side) effect.
We will then cover the IO monad, which is how Haskell handles necessary effects. This will allow us to finally write helloworld.
This chapter is probably the hardest part of this tutorial and what makes it for madmen.
Pure functions
In essence, every function in Haskell has exactly 1 input and 1 output. We can still use “multiple” inputs by making that output itself a function. Let’s look at how this works with addition:
+) 3 7 (
Writing two values separated by a space is function application,
f a
is applying argument a
to function
f
.
Function application in Haskell is left-associative, so the code above is equivalent to:
+) 3) 7 ((
Here, we have a function (+)
, to which we apply
3
. The result is a new function that takes a number and
adds 3 to it. To this new function we then apply 7
.
Now it’s time to talk about types. As mentioned, a function only has
one input and one output, and we type it using an arrow. A function that
takes some type a
as input and outputs some b
has type a -> b
. Consequently, a function that takes an
a
and produces a second function with type
b -> c
, has type a -> (b -> c)
. For
our function (+)
, that means the type must be:
(+) :: Int -> (Int -> Int)
(Note: in Haskell, ::
indicates a type
declaration)
Since arrows in type declarations are right-associative, the parentheses are superfluous in this case, and we can also write:
(+) :: Int -> Int -> Int
Because we know functions cannot have side-effects, the type
declaration tells us pretty much exactly what the function is going to
do: it will compute an Int
given two other
Int
s. Nothing else will happen, and it won’t matter when we
evaluate this function.
We don’t have to apply all arguments at once. The following is perfectly valid:
add3 :: Int -> Int
= (+) 3
add3
ten :: Int
= add3 7 ten
The (+)
function is a bit special. Using symbols in
parentheses as a function names turns it into an infix operator. We can
therefore also write a much more natural looking sum:
ten :: Int
= 3 + 7 ten
Finally, a note about lazy evaluation. In the above examples,
ten
does not have the value 10
, rather it is
an expression that will result in the value 10
. Thanks to
referential transparency, the compiler can decide not to compute the
value of ten
right away, but just pass (a reference to) its
body (3 + 7
) and postpone evaluation to whenever it’s
actually needed. Not only does this let us avoid unnecessary
computations, but it also lets us use infinite data structures. An
example:
infiniteListOf1s :: [Int]
= 1 : infiniteListOf1s infiniteListOf1s
(Note: :
is a “prepend” operator with
type a -> [a] -> [a]
, e.g.
1 : [2,3]
= [1, 2, 3]
)
This list is infinite! Because Haskell is non-strict, this is fine. The compiler will make sure we only compute the part of the list we actually need by using lazy evaluation. So as long as we don’t try to read the whole list, it won’t enter an infinite loop.
Effects
Let us now, armed with the above knowledge, look at the project
generated by the command stack new
.
In app/Main.hs
we find the following:
module Main where
import Lib
main :: IO ()
= someFunc main
This should confuse you. So let’s look at the individual parts.
module Main where
declares the name of the module.
“Main” is a special name, as you probably expect.
import Lib
imports the Lib module and adds all the
symbols Lib
exports to the current namespace. This
includes someFunc
, but we can’t tell from the import
statement, we will be changing this line to something more informative
in a bit.
main :: IO ()
declares that the type of the value called
main
is IO ()
. This would quite rightly
confuse you about now. Why isn’t this a function? What are these
mysterious parentheses? What is an IO
?
Why isn’t main
a function? As mentioned, functions
compute one value from another, but this isn’t really the nature of
computer programs. Rather, we want a program that does stuff, whereas a
function is just an inert formula. main
should be a list of
actions / effects, such as writing to stdout or listening for http
requests.
Then what is ()
? This is a special type called
unit. Every type has one or more inhabitants, a
boolean has inhabitants True
and False
, an
unsigned 8-bit integer is inhabited by numbers \(0\) through \(2^8\). Unit has only 1 inhabitant, the
unit. This entails that a value with type unit carries no information.
Having a pure function that outputs a unit would be pointless, because
we could simply substitute the answer without ever needing to evaluate
said function. However, the unit type acts a lot like the number 1, and
has a lot of uses when combined with other types. One of those is as
used here. Both the type and the inhabitant of unit are written as
()
in Haskell.
IO
, if you’re particularly observant, you may have
noticed we seem to have applied it just like a function, but in the type
declaration. Just like values have types, types have kinds. The kind of
Int
, ()
, String
,
Char
, etc. is *
, the kind of IO
is * -> *
. Just as with function application, the kind
of IO ()
, that is ()
applied to
IO
, is therefore *
. As for the “meaning” of
IO
, it turns a type that is computed with pure functions,
into one that is computed using effects. We will see how to use impure
functions in a bit.
Putting all that together, main
, by its type
IO ()
is a unit computed using side effects. Since
()
contains no information, IO ()
is just a
series of effects / instructions “without” output. That is pretty much
what we generally look for in a program!
Fixing the import statement
After all that stuff about the type of main
, we turn to
look at the next line, the term of main:
main = someFunc
. This is rather disappointing, as it merely
references a single other value. someFunc
is exported by
the Lib
module. We pretty much have to guess that, because
with the current import notation, it is not explicit. This becomes a
problem once you have more modules. So let’s get ahead of ourselves and
change that import to the following:
import Lib (someFunc)
this will expose only someFunc
from
Lib
, and tells us exactly where symbols are coming from. We
can also make the import qualified, which forces us to mention the
module name explicitly:
import qualified Lib
= Lib.someFunc main
plus it’s totally valid to do both:
import qualified Lib
import Lib (someFunc)
Which will expose someFunc
, but allows you to access
other symbols through the explicit notation from above.
Combining effects with monads
Now it’s time to look at what is happening in someFunc
.
But where can we find the Lib
module that defines it? If
you haven’t changed anything, it should be src/Lib.hs
(modules names must match relative path names). You can change which
directory are part of your project in package.yaml
.
The file should look like this:
module Lib
( someFuncwhere
)
someFunc :: IO ()
= putStrLn "someFunc" someFunc
We’ve seen the module declaration before. This one also specifies
which symbols to export, someFunc
in this case.
someFunc
is apparently defined as the application of the
String
"someFunc"
applied to
putStrLn
. putStrLn
is part of the standard
prelude, which is implicitly imported in every file. It’s type
is String -> IO ()
and, as you might expect, what it
does is write a string to stdout (without flushing buffers).
But how can we have multiple effects? Is there some function with
type IO () -> IO () -> IO ()
that combines the
effects in both arguments and that we have to tediously add everywhere?
Well, such a function exists but there’s better ways to go about it.
As we’ve seen, IO
has kind * -> *
. When
a type of kind * -> *
follows certain rules, we say that
it is a monad. In particular, for any monad m
, the
following functions must exist:
return :: a -> m a
fmap :: (a -> b) -> m a -> m b
(>>=) :: m a -> (a -> m b) -> m b
IO
is a monad. Substituting IO
for
m
, we know that we have at least the following
functions:
return :: a -> IO a
fmap :: (a -> b) -> IO a -> IO b
(>>=) :: IO a -> (a -> IO b) -> IO b
return
simply pretends a pure computation is impure.
Similarly, fmap
turns a function over pure values into one
over impure values. If, for instance, we read some input string, that
string has type IO String
, but our pure functions only work
for String
! fmap
remedies that problem by
lifting functions to IO. (>>=)
is a bit more
complicated. You can think of it a bit like unix pipes. It takes some
impurely computed value, and feeds that value to the next impure
computation.
Let’s look at a few examples, take the time to let this sink in:
computeHelloWorld :: IO String
= return "Hello, World!"
computeHelloWorld
computeHelloWorldLength :: IO Int
= fmap length computeHelloWorld
computeHelloWorldLength
greetTheWorld :: IO ()
= computeHelloWorld >>= putStrLn greetTheWorld
(putStrLn
is the output-writing function with type
String -> IO ()
)
For a slightly more useful example we’ll use getLine
from the prelude. It has type IO String
and will get a line
from stdin. We can now do:
nameToGreeting :: String -> String
= "Hello " ++ name ++ ", I am monad."
nameToGreeting name
greetPerson :: IO ()
=
greetPerson fmap nameToGreeting getLine >>= putStrLn
This is not very easy to read, and it would get a lot worse if we were to also write to stdout before reading from stdin. Luckily, Haskell has some syntactic sugar for monads in the form of do-blocks.
greetPerson :: IO ()
=
greetPerson do
putStrLn "I am monad, what is your name?"
<- getLine
personName putStrLn ("Hello, " ++ personName ++ "! What shall we *do* together?")
The type of a do-block will be the type of its last element.
Side-note
Suppose we have the type IO (IO a)
. By
(>>=)
, we know that it is equivalent to
IO ()
. Because for any foo :: IO (IO a)
we can
do bar = foo >>= id
, resulting in a
bar :: IO a
. But by return bar
we get
foo
again! In other words, it doesn’t matter how often you
apply a monad to a type, it will be equivalent (technically,
isomorphic) to a single application of that monad. This is what
the famous phrase “monads are just monoids in the category of
endofunctors” means (sort-of). A monoid being something that stays the
same (type) under repeated application of a certain operation.
Conclusion
You’ve learned how effects can be represented in a functionally pure
language by using monads. It is usually better to avoid using
IO
monads when possible. Trivially, we could write our
entire program using monads, but we would lose many advantages of
programming in Haskell.
I would personally argue that an IO monad is not a good way to represent effects, but it’s the current standard for generic functionally pure programming languages. Some interesting alternatives include uniqueness types (used in Clean), free monads (Haskell) and model-update systems (Elm).
You’ve also been introduced to lazy evaluation. There are both advantages and disadvantages to it, but that is outside the scope of this chapter.