Tuesday, December 12, 2017

Functors According to Haskell

Functors are talked about a lot these days in blogs and videos. But the explanations tend to vary, and the quality and accuracy of information seems dubious. I decided instead to go straight to the source, the Haskell documentation. Here's what I learned.

tl;dr In Haskell, "Functor" is the name of an interface. Types that implement that interface are said to be an instance of Functor and can be mapped polymorphically.

And now the long version. Even though my target audience is JavaScript developers, I'm going to relate Haskell back to C++, because unfortunately JavaScript simply doesn't have comparable language features. So strap in; we're going to learn some C++ too.

C++ Templates

Normally, C++ classes and functions are statically typed. For example:

C++
class Box {
    public:
        int value;
};

Box{42}.value; // ok; int 42
Box{"Hello"}.value; // type error

This Box class can only ever hold an int. If we try to initialize it with a string, we get an error.

But C++ templates, on the other hand, let us write classes and functions with placeholders for the types.

C++
template<typename Placeholder_type>
    class Box {
        public:
            Placeholder_type value;
    };

Here, Box isn't a class; it's a class template. We can generate a real class from this template by providing a real type for the placeholder, using an angle-bracket syntax to pass types like parameters. Box<int>, for example, is the name of a real class, generated from our template, where Placeholder_type is replaced with int. Likewise, Box<string> is the name of a real class, generated from our template, where Placeholder_type is replaced with string.

C++
Box<int>{42}.value; // ok; int 42
Box<string>{"Hello"}.value; // ok; string "Hello"

This produces the same result as if we had written the same class twice with the two different types.

C++
class Box_int {
    public:
        int value;
};

class Box_string {
    public:
        string value;
};

Box_int{42}.value; // ok; int 42
Box_string{"Hello"}.value; // ok; string "Hello"

Note: I named my template parameter Placeholder_type because I wanted to prioritize clarity before brevity, but in C++ it's customary to name template parameters T -- just the single letter "T" -- and you'll see that often in code elsewhere.

Haskell Type Constructors

Haskell type constructors are like C++ templates. Normally, Haskell data types and functions are statically typed. For example:

Haskell
data Box = Box { value :: Int }

-- ok; int 42
value (Box 42)

-- type error
value (Box "Hello")

This Box data type can only ever hold an Int. If we try to initialize it with a string, we get an error.

But Haskell type constructors, on the other hand, let us write data types and functions with placeholders for the types.

Haskell
data Box placeholderType = Box { value :: placeholderType }

-- ok; int 42
value (Box 42)

-- ok; string "Hello"
value (Box "Hello")

Note: I named my type parameter placeholderType because I wanted to prioritize clarity before brevity, but in Haskell it's customary to name type parameters a -- just the single letter "a" -- and you'll see that often in code elsewhere.

C++ Concepts

So far, our templates will accept any type for its parameter, but there's usually some implicit requirements on the type we receive. For example:

C++
template<typename Placeholder_type>
    bool is_equal(Placeholder_type value_a, Placeholder_type value_b) {
        return value_a == value_b;
    }

is_equal<int>(42, 14); // ok; bool false
is_equal<Box<int>>(Box<int>{42}, Box<int>{14}); // error; no == operator for type Box<int>

Here, the function template is_equal will compare two values of the same type for equality. But to do so, it implicitly requires that the == operator be meaningful for that type. The == operator is meaningful for ints, but it isn't meaningful for Boxes.

C++ concepts define an interface for type parameters. They let us express formal constraints so that the requirements on a type parameter are explicit. (Note: C++ concepts are still a proposal.)

C++
template<typename Placeholder_type>
    concept EqualityComparable = requires(Placeholder_type value_a, Placeholder_type value_b) {
        { value_a == value_b } -> bool;
    };

Here, the concept EqualityComparable requires that a type parameter support an == operation that returns a boolean. We can use this and rewrite our is_equal function template like so:

C++
template<EqualityComparable Placeholder_type>
    bool is_equal(Placeholder_type value_a, Placeholder_type value_b) {
        return value_a == value_b;
    }

We replaced the keyword typename with EqualityComparable. When we used the keyword typename, any type parameter was let through, but now that we're using the named concept EqualityComparable, it accepts only types where the expression value_a == value_b compiles and returns a boolean.

Though, even after this rewrite, is_equal<int>(42, 14) will still succeed, and is_equal<Box<int>>(Box<int>{42}, Box<int>{14}) will still fail. Using concepts made our requirements on the type explicit, but our type Box<int> still doesn't satisfy those requirements. To satisfy those requirements, we'll need to define an == function for our type.

C++
template<typename Placeholder_type>
    bool operator==(Box<Placeholder_type> box_a, Box<Placeholder_type> box_b) {
        return box_a.value == box_b.value;
    }

Now Box<int> satisfies the EqualityComparable concept, and the is_equal function template will accept it.

C++
is_equal<int>(42, 14); // ok; bool false
is_equal<Box<int>>(Box<int>{42}, Box<int>{14}); // ok; bool false

Haskell Typeclasses

Haskell typeclasses are like C++ concepts. So far, our type constructors will accept any type for its parameter, but there's usually some implicit requirements on the type we receive. For example:

Haskell
isEqual valueA valueB = valueA == valueB

-- ok; bool false
isEqual 42 14

-- error; no == operator for type Box Int
isEqual (Box 42) (Box 14)

Here, the function isEqual will compare two values of variable types for equality. But to do so, it implicitly requires that the == operator be meaningful for those types. The == operator is meaningful for Ints, but it isn't meaningful for Boxes.

Haskell typeclasses define an interface for type parameters. They let us express formal constraints so that the requirements on a type parameter are explicit.

Haskell
class EqualityComparable placeholderType where  
    (==) :: placeholderType -> placeholderType -> Bool

Here, the typeclass EqualityComparable requires that a type parameter support an == operation that returns a boolean, and we can now add a type signature to our isEqual function like so:

Haskell
isEqual :: (EqualityComparable placeholderType) => placeholderType -> placeholderType -> Bool
isEqual valueA valueB = valueA == valueB

Though, even after this rewrite, isEqual 42 14 will still succeed, and isEqual (Box 42) (Box 14) will still fail. The typeclass made our requirements on the type explicit, but our type Box still doesn't satisfy those requirements. To satisfy those requirements, we'll need to define an == function for our type.

Haskell
instance (EqualityComparable placeholderType) => EqualityComparable (Box placeholderType) where  
    Box valueA == Box valueB = valueA == valueB

Now Box satisfies the EqualityComparable typeclass, and the isEqual function will accept it.

Haskell
-- ok; bool false
isEqual 42 14

-- ok; bool false
isEqual (Box 42) (Box 14)

The Functor Typeclass

In Haskell, "Functor" is the name of a typeclass. As we now know, a typeclass is an interface that defines requirements on a type parameter, and types that satisfy those requirements are said to be an instance of that typeclass. The Functor typeclass defines one function, fmap, and any type that implements fmap is an instance of Functor.

The phrase "instance of Functor" can get tiring to say, so a type that is an instance of Functor is often just called a functor itself. I'll try to distinguish between these two usages with capitalization. If I say "Functor" with a capital "F", then I'm referring to the named typeclass, and if I say "functor" with a lower "f", then I'm referring to an instance of Functor.

Here's what Haskell's definition of the Functor typeclass looks like:

Haskell
class Functor f where  
    fmap :: (a -> b) -> f a -> f b

Here, the names f, a, and b are all type variables, the sort of thing I previously named placeholderType. Notice that the type variables a and b are passed as parameters to the type variable f. That means f can't be just any type; it must be a type constructor. A type constructor, remember, takes types as parameters to produce new types, just as how Box Int is a type produced by passing the type Int as a parameter to our type constructor Box.

For a concrete example, let's imagine an instance where f is assigned our Box type, a is assigned the String type, and b is assigned the Int type. For that combination, fmap's signature would be a function that, for its first argument, accepts another function that takes a String and returns an Int, and for its second argument, accepts a Box String, and finally returns a Box Int.

Not Quite Functors

A Haskell functor isn't simply something that can be mapped over. After all, we could define, for example, a boxMap function for our Box type just like so:

Haskell
boxMap f (Box a) = Box (f a)

Now our Box type is mappable, and yet in Haskell, it's objectively not a functor. To be a functor, our type must implement a specific interface, the one defined by the Functor typeclass and that requires that our type implement fmap.

Haskell
instance Functor Box where  
    fmap f (Box a) = Box (f a)

Only now is our Box type a functor. But what did we gain from that? If our type was already mappable with boxMap, then what do we gain by being mappable with fmap?

Polymorphism.

In the absence of a common interface, if we wanted to write a function that operates on any of Tree or List or Box values, for example, that would be difficult because we'd need to invoke different functions with different names (treeMap vs listMap vs boxMap) for the different types. But when a wide variety of types share a common interface, then we can write a function that accepts any of those types and operates on them polymorphically. That's what the Functor typeclass gets us. It defines the interface that types must implement if they want to be polymorphically mappable.

Functors in JavaScript

Assuming we wanted to implement Haskell's Functor in JavaScript, the question of whether we even could very much depends on how exactly we want to mirror Haskell's implementation. Haskell's Functor, remember, is an interface on type parameters. But neither interfaces nor type parameters exist in JavaScript. Game over?

Let's deviate from Haskell just a little and see how close we can get. We can forgo the formal, explicitly defined and explicitly enforced, interface and instead rely on implicit duck interfaces. Now we have to decide what that interface will be. Should we expect types to implement a free standing function named fmap? That would most closely match Haskell, but it would be impractical in JavaScript. Defining the same function name multiple times but with different argument types (that is, function overloading) works in Haskell because it has static types, but JavaScript, of course, has dynamic types. We can't overload JavaScript functions by argument type because the arguments have no types.

Let's deviate from Haskell again. Rather than a free standing function named fmap, we could instead expect types to implement a method named fmap. That solves the same problem the Functor typeclass was meant to solve. If multiple types all implement a method named fmap, then we can operate on objects from any of those types polymorphically.

We could stop here and be done, but maybe we should deviate from Haskell once more. The name fmap is a bit arbitrary. We could pick something different for our implicit duck interface and still solve the original problem of mapping polymorphically. We could instead name it "transform" or "convert" -- or simply "map". That last option, "map", fits best with existing JavaScript style. If we choose that as our common interface, only then could we say that JavaScript's Array, for example, is a functor -- using the term loosely at this point, since by now we've deviated from Haskell in several ways.

Addendum

I've tried to be diligent to say "Haskell functor" rather than simply "functor", because nothing I've said here is applicable to the mathematical concept of a functor. The mathematical concept may have served as inspiration, but at the end of the day, the Haskell Functor bears no more resemblance to the mathematical category theory functor than the C++ class does to the mathematical set theory class, and telling programmers that a functor is a homomorphism between categories is no more useful than telling them that a class is a collection of sets.