Wednesday, February 7, 2018

The Blurry Line Between Closures and Not

What is a closure? It's a question that shows up all the time in blogs about interview questions. But in JavaScript, the line between what is and isn't a closure is a bit blurry.

Here are three questions to consider when deciding if a function is a closure:

  1. Is the function conceptually a closure? That is, does the function refer to non-local variables (also called free variables)?
  2. Under the hood, does the function use the implementation technique of a closure? That is, is the function really a record storing a plain function and an environment?
  3. And finally, does the function need to use the implementation technique of a closure? That is, could an interpreter quietly omit the record storing an environment and still get a correct result?

Let's start with something that is unambiguously a closure:

function f() {
    const name = "Mozilla";
    function displayName() {
        console.log(name);
    }
    return displayName;
}

f()();

Here we have an inner function that refers to a non-local variable defined in an outer function, and we execute the inner function after the outer function has finished and returned. The inner function is conceptually a closure since it refers to a non-local variable. It also uses the implementation technique of a closure, since it's really a function object that stores an environment. And finally, it needs to use the implementation technique of a closure, otherwise the non-local variable name would have been destroyed before the inner function was executed.

Next, let's look at an example where the third criteria from the list above doesn't apply.

function f() {
    const name = "Mozilla";
    function displayName() {
        console.log(name);
    }
    displayName();
};

f();

Like before, we have an inner function that refers to a non-local variable defined in an outer function, but this time we execute the inner function before the outer function has finished and returned. The inner function is still conceptually a closure since it refers to a non-local variable. It also still uses the implementation technique of a closure since it's really a function object that stores an environment. They key difference is this function doesn't need to use the implementation technique of a closure. The inner function is only ever executed when the outer function's stack frame and all its variables still exist. Which means the inner function could just be a plain function -- no stored environment -- and it would still be able to safely access the outer function's variables. In fact, to prove this is true, we can write this same code in C and it would work.

Let's look at another scenario:

const globalName = "Mozilla";

function f() {
    function displayName() {
        console.log(globalName);
    }
    return displayName;
}

f()();

Like the first example, this inner function is still conceptually a closure since it refers to a non-local variable. It also still uses the implementation technique of a closure since it's really a function object that stores an environment. This inner function is even executed after its containing function has finished and returned. But still, this inner function doesn't need to use the implementation technique of a closure because this time the non-local variable is global. Global variables, of course, exist for as long as the program exists. We don't have to worry about them going out of scope and being destroyed. This too could be implemented in C and it would Just Work.

Let's look at one last example.

const globalName = "Mozilla";

function displayName() {
    console.log(globalName);
}

displayName();

This time there's no more nested functions. Just an ordinary global variable and an ordinary global function. Conceptually, this function is still a closure since it refers to a non-local variable. And this being JavaScript, it also still uses the implementation technique of a closure since it's really a function object that stores an environment. But -- obviously -- it doesn't need to use the implementation technique of a closure.

So what, then, defines a closure? Is it the implementation? Is a record storing the combination of a plain function and an environment the defining characteristic of a closure? If so, then every function in JavaScript -- literally every single one -- is a closure. But however technically true that may be, it isn't useful in day-to-day work and conversation. If you were in a JavaScript interview and said an ordinary global function was a closure, the interviewer would probably look at you funny. So then do we ignore the underlying implementation and instead say a function is a closure only if it couldn't be implemented as a plain function? That's what we in the JavaScript community have historically done, but it muddies the terminology in ways that wouldn't be accectable outside the world of JavaScript. If you were in a C++ interview, for example, and said display_name = [] () { cout << global_name; }; wasn't a closure because it didn't need to be, then that interviewer would look at you funny too.

So which is the right way to use the term? I don't have an easy answer for you here. Most times in conversation, I'll skirt the issue with phrases like "technically a closure" or "a closure is useful when..." to stay technically correct without digressing.

Tuesday, December 12, 2017

Functors According to Haskell

Functors are talked about a lot these days in blogs and videos. But the explanations tend to vary, and the quality and accuracy of information seems dubious. I decided instead to go straight to the source, the Haskell documentation. Here's what I learned.

tl;dr In Haskell, "Functor" is the name of an interface. Types that implement that interface are said to be an instance of Functor and can be mapped polymorphically.

And now the long version. Even though my target audience is JavaScript developers, I'm going to relate Haskell back to C++, because unfortunately JavaScript simply doesn't have comparable language features. So strap in; we're going to learn some C++ too.

C++ Templates

Normally, C++ classes and functions are statically typed. For example:

C++
class Box {
    public:
        int value;
};

Box{42}.value; // ok; int 42
Box{"Hello"}.value; // type error

This Box class can only ever hold an int. If we try to initialize it with a string, we get an error.

But C++ templates, on the other hand, let us write classes and functions with placeholders for the types.

C++
template<typename Placeholder_type>
    class Box {
        public:
            Placeholder_type value;
    };

Here, Box isn't a class; it's a class template. We can generate a real class from this template by providing a real type for the placeholder, using an angle-bracket syntax to pass types like parameters. Box<int>, for example, is the name of a real class, generated from our template, where Placeholder_type is replaced with int. Likewise, Box<string> is the name of a real class, generated from our template, where Placeholder_type is replaced with string.

C++
Box<int>{42}.value; // ok; int 42
Box<string>{"Hello"}.value; // ok; string "Hello"

This produces the same result as if we had written the same class twice with the two different types.

C++
class Box_int {
    public:
        int value;
};

class Box_string {
    public:
        string value;
};

Box_int{42}.value; // ok; int 42
Box_string{"Hello"}.value; // ok; string "Hello"

Note: I named my template parameter Placeholder_type because I wanted to prioritize clarity before brevity, but in C++ it's customary to name template parameters T -- just the single letter "T" -- and you'll see that often in code elsewhere.

Haskell Type Constructors

Haskell type constructors are like C++ templates. Normally, Haskell data types and functions are statically typed. For example:

Haskell
data Box = Box { value :: Int }

-- ok; int 42
value (Box 42)

-- type error
value (Box "Hello")

This Box data type can only ever hold an Int. If we try to initialize it with a string, we get an error.

But Haskell type constructors, on the other hand, let us write data types and functions with placeholders for the types.

Haskell
data Box placeholderType = Box { value :: placeholderType }

-- ok; int 42
value (Box 42)

-- ok; string "Hello"
value (Box "Hello")

Note: I named my type parameter placeholderType because I wanted to prioritize clarity before brevity, but in Haskell it's customary to name type parameters a -- just the single letter "a" -- and you'll see that often in code elsewhere.

C++ Concepts

So far, our templates will accept any type for its parameter, but there's usually some implicit requirements on the type we receive. For example:

C++
template<typename Placeholder_type>
    bool is_equal(Placeholder_type value_a, Placeholder_type value_b) {
        return value_a == value_b;
    }

is_equal<int>(42, 14); // ok; bool false
is_equal<Box<int>>(Box<int>{42}, Box<int>{14}); // error; no == operator for type Box<int>

Here, the function template is_equal will compare two values of the same type for equality. But to do so, it implicitly requires that the == operator be meaningful for that type. The == operator is meaningful for ints, but it isn't meaningful for Boxes.

C++ concepts define an interface for type parameters. They let us express formal constraints so that the requirements on a type parameter are explicit. (Note: C++ concepts are still a proposal.)

C++
template<typename Placeholder_type>
    concept EqualityComparable = requires(Placeholder_type value_a, Placeholder_type value_b) {
        { value_a == value_b } -> bool;
    };

Here, the concept EqualityComparable requires that a type parameter support an == operation that returns a boolean. We can use this and rewrite our is_equal function template like so:

C++
template<EqualityComparable Placeholder_type>
    bool is_equal(Placeholder_type value_a, Placeholder_type value_b) {
        return value_a == value_b;
    }

We replaced the keyword typename with EqualityComparable. When we used the keyword typename, any type parameter was let through, but now that we're using the named concept EqualityComparable, it accepts only types where the expression value_a == value_b compiles and returns a boolean.

Though, even after this rewrite, is_equal<int>(42, 14) will still succeed, and is_equal<Box<int>>(Box<int>{42}, Box<int>{14}) will still fail. Using concepts made our requirements on the type explicit, but our type Box<int> still doesn't satisfy those requirements. To satisfy those requirements, we'll need to define an == function for our type.

C++
template<typename Placeholder_type>
    bool operator==(Box<Placeholder_type> box_a, Box<Placeholder_type> box_b) {
        return box_a.value == box_b.value;
    }

Now Box<int> satisfies the EqualityComparable concept, and the is_equal function template will accept it.

C++
is_equal<int>(42, 14); // ok; bool false
is_equal<Box<int>>(Box<int>{42}, Box<int>{14}); // ok; bool false

Haskell Typeclasses

Haskell typeclasses are like C++ concepts. So far, our type constructors will accept any type for its parameter, but there's usually some implicit requirements on the type we receive. For example:

Haskell
isEqual valueA valueB = valueA == valueB

-- ok; bool false
isEqual 42 14

-- error; no == operator for type Box Int
isEqual (Box 42) (Box 14)

Here, the function isEqual will compare two values of variable types for equality. But to do so, it implicitly requires that the == operator be meaningful for those types. The == operator is meaningful for Ints, but it isn't meaningful for Boxes.

Haskell typeclasses define an interface for type parameters. They let us express formal constraints so that the requirements on a type parameter are explicit.

Haskell
class EqualityComparable placeholderType where  
    (==) :: placeholderType -> placeholderType -> Bool

Here, the typeclass EqualityComparable requires that a type parameter support an == operation that returns a boolean, and we can now add a type signature to our isEqual function like so:

Haskell
isEqual :: (EqualityComparable placeholderType) => placeholderType -> placeholderType -> Bool
isEqual valueA valueB = valueA == valueB

Though, even after this rewrite, isEqual 42 14 will still succeed, and isEqual (Box 42) (Box 14) will still fail. The typeclass made our requirements on the type explicit, but our type Box still doesn't satisfy those requirements. To satisfy those requirements, we'll need to define an == function for our type.

Haskell
instance (EqualityComparable placeholderType) => EqualityComparable (Box placeholderType) where  
    Box valueA == Box valueB = valueA == valueB

Now Box satisfies the EqualityComparable typeclass, and the isEqual function will accept it.

Haskell
-- ok; bool false
isEqual 42 14

-- ok; bool false
isEqual (Box 42) (Box 14)

The Functor Typeclass

In Haskell, "Functor" is the name of a typeclass. As we now know, a typeclass is an interface that defines requirements on a type parameter, and types that satisfy those requirements are said to be an instance of that typeclass. The Functor typeclass defines one function, fmap, and any type that implements fmap is an instance of Functor.

The phrase "instance of Functor" can get tiring to say, so a type that is an instance of Functor is often just called a functor itself. I'll try to distinguish between these two usages with capitalization. If I say "Functor" with a capital "F", then I'm referring to the named typeclass, and if I say "functor" with a lower "f", then I'm referring to an instance of Functor.

Here's what Haskell's definition of the Functor typeclass looks like:

Haskell
class Functor f where  
    fmap :: (a -> b) -> f a -> f b

Here, the names f, a, and b are all type variables, the sort of thing I previously named placeholderType. Notice that the type variables a and b are passed as parameters to the type variable f. That means f can't be just any type; it must be a type constructor. A type constructor, remember, takes types as parameters to produce new types, just as how Box Int is a type produced by passing the type Int as a parameter to our type constructor Box.

For a concrete example, let's imagine an instance where f is assigned our Box type, a is assigned the String type, and b is assigned the Int type. For that combination, fmap's signature would be a function that, for its first argument, accepts another function that takes a String and returns an Int, and for its second argument, accepts a Box String, and finally returns a Box Int.

Not Quite Functors

A Haskell functor isn't simply something that can be mapped over. After all, we could define, for example, a boxMap function for our Box type just like so:

Haskell
boxMap f (Box a) = Box (f a)

Now our Box type is mappable, and yet in Haskell, it's objectively not a functor. To be a functor, our type must implement a specific interface, the one defined by the Functor typeclass and that requires that our type implement fmap.

Haskell
instance Functor Box where  
    fmap f (Box a) = Box (f a)

Only now is our Box type a functor. But what did we gain from that? If our type was already mappable with boxMap, then what do we gain by being mappable with fmap?

Polymorphism.

In the absence of a common interface, if we wanted to write a function that operates on any of Tree or List or Box values, for example, that would be difficult because we'd need to invoke different functions with different names (treeMap vs listMap vs boxMap) for the different types. But when a wide variety of types share a common interface, then we can write a function that accepts any of those types and operates on them polymorphically. That's what the Functor typeclass gets us. It defines the interface that types must implement if they want to be polymorphically mappable.

Functors in JavaScript

Assuming we wanted to implement Haskell's Functor in JavaScript, the question of whether we even could very much depends on how exactly we want to mirror Haskell's implementation. Haskell's Functor, remember, is an interface on type parameters. But neither interfaces nor type parameters exist in JavaScript. Game over?

Let's deviate from Haskell just a little and see how close we can get. We can forgo the formal, explicitly defined and explicitly enforced, interface and instead rely on implicit duck interfaces. Now we have to decide what that interface will be. Should we expect types to implement a free standing function named fmap? That would most closely match Haskell, but it would be impractical in JavaScript. Defining the same function name multiple times but with different argument types (that is, function overloading) works in Haskell because it has static types, but JavaScript, of course, has dynamic types. We can't overload JavaScript functions by argument type because the arguments have no types.

Let's deviate from Haskell again. Rather than a free standing function named fmap, we could instead expect types to implement a method named fmap. That solves the same problem the Functor typeclass was meant to solve. If multiple types all implement a method named fmap, then we can operate on objects from any of those types polymorphically.

We could stop here and be done, but maybe we should deviate from Haskell once more. The name fmap is a bit arbitrary. We could pick something different for our implicit duck interface and still solve the original problem of mapping polymorphically. We could instead name it "transform" or "convert" -- or simply "map". That last option, "map", fits best with existing JavaScript style. If we choose that as our common interface, only then could we say that JavaScript's Array, for example, is a functor -- using the term loosely at this point, since by now we've deviated from Haskell in several ways.

Addendum

I've tried to be diligent to say "Haskell functor" rather than simply "functor", because nothing I've said here is applicable to the mathematical concept of a functor. The mathematical concept may have served as inspiration, but at the end of the day, the Haskell Functor bears no more resemblance to the mathematical category theory functor than the C++ class does to the mathematical set theory class, and telling programmers that a functor is a homomorphism between categories is no more useful than telling them that a class is a collection of sets.

Friday, December 8, 2017

JavaScript / C++ Rosetta Stone

I'm going to reproduce the behavior of the JavaScript language in C++. Dynamic duck typing, objects out of nothing, prototypal inheritance, lexical closures -- all of it -- in C++. The goal is to gain insights into JavaScript's inner workings. For JavaScript developers, this exercise can help to understand what the language is doing under the hood and let us talk about its behavior in concrete terms rather than in metaphors. And for developers coming from other languages and new to JavaScript, this exercise can help to understand the concrete operations that lie beneath such a dynamic language.
  1. Dynamic types
    1. Variant
    2. Any
  2. Objects
  3. Arrays
  4. Prototypal inheritance
  5. Variadic functions
    1. Mixed-type arguments
  6. "This"
  7. Closures
  8. Function objects
  9. Lexical environment and scope chains
  10. Capture by reference vs by value
    1. let and capturing by value
  11. Garbage collection
  12. Classes
    1. Factory functions
    2. Prototype chains
    3. Constructor functions
  13. That's all folks! (...for now)

Dynamic types

First up, dynamic types. JavaScript, of course, is dynamically typed. A variable can be assigned and reassigned any type of value. C++, on the other hand, is statically typed. If you declare a variable to be an int, then C++ will allocate a fixed-size block of memory, maybe 4 bytes, which means that later trying to reassign an 8-byte double or a 32-byte string into that variable wouldn't work because there simply isn't enough space.

Variant

One way around that problem is to allocate as much space as the largest of several types. C++ supports this with a feature called unions. The union of a 4-byte int, an 8-byte double, and a 32-byte string would yield a 32-byte data type -- as large as the largest member. This means a variable of that union type could be reassigned at any time an int or a double or a string.
It's dangerously easy, however, to assign an int, for example, and then try to read and interpret that binary data as if it were a string, so for safety, we wrap and control the usage of a union with a well behaved abstraction, a user-defined type that we traditionally name variant.
JavaScript
let x;

x = true;
x = 42;
x = "Hello";
C++
variant<bool, int, string> x;

x = true;
x = 42;
x = "Hello"s;

Any

But because the size of a union is at least as large as its largest member, space is wasted. A variable that needs only 1 byte for a bool might nonetheless allocate 32 bytes just in case we later wanted to assign a string into it. But also as bad, we have to know about and list every type that might be assigned to the variant. Ideally we'd like to be able to assign any arbitrary type. That brings us to another user-defined type named, unsurprisingly, any. It works by allocating the value on the free store and using it through a pointer. If all we need is a 1-byte bool, then it dynamically allocates and points to just 1 byte. Later if we reassign a string into that variable, then it frees the 1-byte bool and dynamically allocates and points to 32 bytes for the string. Now we don't need to know about every type ahead of time, and it can store (nearly) any arbitrary type we want.
JavaScript
class SomeArbitraryType {}

let x;

x = true;
x = 42;
x = "Hello";
x = new SomeArbitraryType();
C++
class Some_arbitrary_type {};

any x;

x = true;
x = 42;
x = "Hello"s;
x = Some_arbitrary_type{};

Objects

In JavaScript, an object is a collection of key-value pairs. In other languages, a collection of key-value pairs is instead known as an associative array, or a map, or a dictionary, and it's often implemented as a hash table. Which means we can reproduce JavaScript's objects simply by instantiating a hash table. In C++, the standard hash table type is named unordered_map.
JavaScript
let myCar = {
    make: "Ford",
    model: "Mustang",
    year: 1969
};
C++
unordered_map<string, any> my_car {
    {"make", "Ford"s},
    {"model", "Mustang"s},
    {"year", 1969}
};
I'll admit this revelation in particular ruined some of JavaScript's mystique for me. We in the JavaScript community often tout that JavaScript can create objects ex nihilo ("out of nothing"), something we're told few languages can do. But I've realized this isn't a technical achievement; it's a branding achievement. Every language I'm aware of can create hash tables ex nihilo, and it seems JavaScript simply re-branded hash tables as the generic object.

Arrays

In JavaScript, an array is itself an object -- in the JavaScript sense of the word. Which is to say, a JavaScript array is a hash table where the string keys we insert just happen to look like integer indexes.
JavaScript
let fruits = ["Mango", "Apple", "Orange"];
C++
unordered_map<string, any> fruits {
    {"0", "Mango"s},
    {"1", "Apple"s},
    {"2", "Orange"s}
};
And, of course, since arrays are objects, we can still assign string keys that don't look like integer indexes into our array.
JavaScript
let fruits = ["Mango", "Apple", "Orange"];
fruits.model = "Mustang";
C++
unordered_map<string, any> fruits {
    {"0", "Mango"s},
    {"1", "Apple"s},
    {"2", "Orange"s}
};
fruits["model"] = "Mustang"s;

Prototypal inheritance

Next up, the now famous prototypal inheritance. I'm assuming my audience here already knows that JavaScript objects delegate to other objects. Since JavaScript's objects are ultimately hash tables, we can likewise describe JavaScript's prototypal inheritance as hash tables delegating element access to other hash tables.
This time, unfortunately, C++ does not have a ready-made type for this task. We'll have to make it ourselves. We in the JavaScript community sometimes say it's easy for prototypal to emulate classical but hard for classical to emulate prototypal, but implementing prototypal inheritance turned out to be far simplier than even I originally expected. Up to this point, we've been using unordered_map as our equivalent to JavaScript's objects. Now we're going to extend that type to add the so-called [[Prototype]] link, and we're going to override the element access operator so it will delegate access misses to that link.
C++
class Delegating_unordered_map : private unordered_map<string, any> {
    public:
        Delegating_unordered_map* __proto__ {};

        auto find_in_chain(const string& key) {
            // Check own property
            auto found_value = find(key);
            if (found_value != end()) return found_value;

            // Else, delegate to prototype
            if (__proto__) {
                auto found_value = __proto__->find_in_chain(key);
                if (found_value != __proto__->end()) return found_value;
            }

            return end();
        }

        any& operator[](const string& key) {
            auto found_value = find_in_chain(key);
            if (found_value != end()) return found_value->second;

            // Else, super call, which will create and return an empty `any`
            return unordered_map<string, any>::operator[](key);
        }

        // Borrow constructor
        using unordered_map<string, any>::unordered_map;
};
With that done, let's take a look at prototypal inheritance side-by-side in JavaScript and C++.
JavaScript
let o = {a: 1, b: 2};
let oProto = {b: 3, c: 4};
Object.setPrototypeOf(o, oProto);

// Is there an "a" own property on o? Yes, and its value is 1.
o.a; // 1

// Is there a "b" own property on o? Yes, and its value is 2.
// The prototype also has a "b" property, but it's not visited.
o.b; // 2

// Is there a "c" own property on o? No, check its prototype.
// Is there a "c" own property on o.[[Prototype]]? Yes, its value is 4.
o.c; // 4

// Is there a "d" own property on o? No, check its prototype.
// Is there a "d" own property on o.[[Prototype]]? No, check its prototype.
// o.[[Prototype]].[[Prototype]] is null, stop searching.
// No property found, return undefined.
o.d; // undefined
C++
Delegating_unordered_map o {{"a", 1}, {"b", 2}};
Delegating_unordered_map o_proto {{"b", 3}, {"c", 4}};
o.__proto__ = &o_proto;

// Is there an "a" own property on o? Yes, and its value is 1.
o["a"]; // 1

// Is there a "b" own property on o? Yes, and its value is 2.
// The prototype also has a "b" property, but it's not visited.
o["b"]; // 2

// Is there a "c" own property on o? No, check its prototype.
// Is there a "c" own property on o.__proto__? Yes, its value is 4.
o["c"]; // 4

// Is there a "d" own property on o? No, check its prototype.
// Is there a "d" own property on o.__proto__? No, check its prototype.
// o.__proto__.__proto__ is null, stop searching.
// No property found, return undefined.
o["d"]; // undefined (an empty `any`)
We've finally settled on the finished object structure to reproduce JavaScript's behavior (or as close as we'll get here, anyway), so for convenience in the rest of the samples, I'm going to alias that object type to a shorter, friendlier name.
C++
using js_object = Delegating_unordered_map;

Variadic functions

Variadic functions are functions that take a variable number of arguments. In JavaScript, every function is variadic. You can call any function with any number of arguments, and they will all be available in a special array-like variable called arguments. C++, on the other hand, needs to know the exact number and types of arguments to be able to call a function correctly. But as it turns out, it's simple to reproduce JavaScript's behavior. Our C++ functions could accept just a single parameter, an array of anys, and we can name that parameter arguments. In C++, the standard dynamically-sized array type is named vector.
JavaScript
function plusAll() {
    let sum = 0;
    for (let i = 0; i < arguments.length; i++) {
        sum += arguments[i];
    }
    return sum;
}

plusAll(4, 8); // 12
plusAll(4, 8, 15, 16, 23, 42); // 108
C++
any plus_all(vector<any> arguments) {
    auto sum = 0;
    for (auto i = 0; i < arguments.size(); ++i) {
        sum += any_cast<int>(arguments[i]);
    }
    return sum;
}

plus_all({4, 8}); // 12
plus_all({4, 8, 15, 16, 23, 42}); // 108
Admittedly both of these code samples were written to highlight the similarities, but neither adheres to what each language would consider good style. The JavaScript sample should use rest parameters and reduce, and the C++ sample should use iterators and standard algorithms.
JavaScript
function plusAll(...args) {
    return args.reduce((accumulator, currentValue) => {
        return accumulator + currentValue;
    }, 0);
}
C++
any plus_all(vector<any> arguments) {
    return accumulate(
        arguments.begin(), arguments.end(), 0,
        [] (auto accumulator, auto current_value) {
            return accumulator + any_cast<int>(current_value);
        }
    );
}

Mixed-type arguments

You may have noticed that the C++ version assumes every item in the argument list is an int. In JavaScript, the arguments could have been a mix of strings and numbers, but the C++ version, as currently written, will only add numbers. How does JavaScript pull this off? Turns out the addition operator does some type checking. Every time we use the + operator, JavaScript will check if either operand is a string, and if so, it will do concatention. Otherwise, it will do numeric addition.
JavaScript
function plusAll(...args) {
    return args.reduce((accumulator, currentValue) => {
        return accumulator + currentValue;
    }, 0);
}

plusAll(4, 8, "!", 15, 16, 23, 42); // "12!15162342"
C++
any js_plus(const any& lval, const any& rval) {
    // If either operand is a string...
    if (
        lval.type() == typeid(string) ||
        rval.type() == typeid(string)
    ) {
        // Convert both operands to a string and do concatenation
        auto lval_str = (
            lval.type() != typeid(string) ?
                to_string(any_cast<int>(lval)) :
                any_cast<string>(lval)
        );
        auto rval_str = (
            rval.type() != typeid(string) ?
                to_string(any_cast<int>(rval)) :
                any_cast<string>(rval)
        );

        return lval_str + rval_str;
    }

    // Else, numeric addition
    return any_cast<int>(lval) + any_cast<int>(rval);
}

any plus_all(vector<any> arguments) {
    return accumulate(
        arguments.begin(), arguments.end(), any{0},
        [] (auto accumulator, auto current_value) {
            return js_plus(accumulator, current_value);
        }
    );
}

plus_all({4, 8, "!"s, 15, 16, 23, 42}); // "12!15162342"

"This"

Although this is handled separately from other parameters, nonetheless in a lot of ways, JavaScript's this behaves just like a parameter, albeit one that is often passed and received implicitly. We can reproduce JavaScript's behavior by adding one extra parameter before our arguments list, and we can name that parameter this_. The trailing underscore is there because this is a reserved name in C++.
JavaScript
function add(c, d) {
    return this.a + this.b + c + d;
}

let o = {a: 1, b: 3};

// The first parameter is the object to use as
// "this"; subsequent parameters are passed as
// arguments in the function call
add.call(o, 5, 7); // 1 + 3 + 5 + 7 = 16

// The first parameter is the object to use as
// "this"; the second is an array whose
// elements are used as the arguments in the function call
add.apply(o, [10, 20]); // 1 + 3 + 10 + 20 = 34
C++
any add(any this_, vector<any> arguments) {
    return (
        any_cast<int>(any_cast<js_object>(this_)["a"]) +
        any_cast<int>(any_cast<js_object>(this_)["b"]) +
        any_cast<int>(arguments[0]) +
        any_cast<int>(arguments[1])
    );
}

js_object o {{"a", 1}, {"b", 3}};

// The first parameter is the object to use as
// "this"; the second is an array whose
// elements are used as the arguments in the function call
add(o, {5, 7}); // 1 + 3 + 5 + 7 = 16
add(o, {10, 20}); // 1 + 3 + 10 + 20 = 34
The way we now call our C++ functions exactly matches JavaScript's apply, where the first argument is the this value and the second argument is an array that lists all other arguments. We're sometimes told that using this is inherently stateful, but at the end of the day, this is just a parameter. If we want our object methods to be pure, for example, then we can choose to not mutate this just like we would choose to not mutate any other parameter.

Closures

Conceptually, a closure is a function with non-local (free) variables. The concrete technique we use to implement that concept is a structure containing a function and an environment record. Or, put another way, a closure is an object with a single member function and private data. When we execute that function, it can access the "captured" variables from the environment stored in the closure object's private data.
Depending on the language, we can make the closure object itself callable. You may have already seen callable objects in other languages. Python, for example, lets you define a __call__ method, and PHP has a special __invoke method. In C++, we define an operator() member function. And it's these functions that are executed when an object is called as if it were a function.
As you read these descriptions, keep in mind that C++'s notion of a function is not the same as JavaScript's notion of a function. In C++, a function can be distinct from a closure and not associated with any environment. But in JavaScript, everyfunction has an associated environment. What JavaScript calls a function isn't the function part of a closure. Rather, what JavaScript calls a function is the closure itself -- a callable object that contains an environment.
JavaScript
function outside(x) {
    function inside(y) {
        return x + y;
    }

    return inside;
}

let fnInside = outside(3);
fnInside(5); // 8

outside(3)(5); // 8
C++
auto outside(int x) {
    // A class that privately stores "x" and
    // can be called as if it were a function
    class Inside {
        int x_;

        public:
            Inside(int x) : x_ {x} {}

            auto operator()(int y) {
                return x_ + y;
            }
    };

    // This is our closure, an instance of the above class, a callable object
    // that is constructed with and stores a value from its environment
    Inside inside {x};

    return inside;
}

auto fn_inside = outside(3);
fn_inside(5); // 8

outside(3)(5); // 8
This C++ sample demonstrates what's ultimately happening under the hood, but admittedly it's verbose. In 2011, C++ introduced a lambda expression syntax, which produces a closure object and is syntactic sugar for defining and instantiating a callable class, same as we did above.
C++
auto outside(int x) {
    // This is our closure, a callable object that
    // stores a value from its environment
    auto inside = [x] (int y) {
        return x + y;
    };

    return inside;
}

auto fn_inside = outside(3);
fn_inside(5); // 8

outside(3)(5); // 8
In these C++ samples, I defined only inside as a callable object and I left outside as a plain function because that's all that was necessary to reproduce JavaScript's behavior for this particular code sample. But in truth, every function in JavaScript is a callable object. Every function is a closure.

Function objects

In fact, JavaScript's functions aren't just objects in the C++ sense of the word, they're also objects in the JavaScript sense of the word. Which means JavaScript's functions are callable hash tables. We can both invoke them as a function and assign to them key-value pairs. To reproduce this behavior in C++, we'll extend the Delegating_unordered_map we made earlier and specialize it to also be callable.
C++
class Callable_delegating_unordered_map : public Delegating_unordered_map {
    function<any(any, vector<any>)> function_body_;

    public:
        Callable_delegating_unordered_map(function<any(any, vector<any>)> function_body) :
            function_body_ {function_body}
        {}

        auto operator()(any this_ = {}, vector<any> arguments = {}) {
            return function_body_(this_, arguments);
        }
};
With that done, let's take a look at callable hash tables.
JavaScript
function square(number) {
    return number * number;
}

square.make = "Ford";
square.model = "Mustang";
square.year = 1969;

square(4); // 16
C++
Callable_delegating_unordered_map square {[] (any this_, vector<any> arguments) {
    return any_cast<int>(arguments[0]) * any_cast<int>(arguments[0]);
}};

square["make"] = "Ford"s;
square["model"] = "Mustang"s;
square["year"] = 1969;

square(nullptr, {4}); // 16
We've finally settled on the finished function object structure to reproduce JavaScript's behavior (or as close as we'll get here, anyway), so for convenience in the rest of the samples, I'm going to alias that object type to a shorter, friendlier name.
C++
using js_function = Callable_delegating_unordered_map;

Lexical environment and scope chains

The C++ closure examples from earlier do all we need to capture individual variables from any scope level, but that's not quite how JavaScript works. In JavaScript, what we see as local variables are actually entries in an object called an environment record, which pairs variable names with values. And each environment delegates accesses to the environment record of the next outer scope.
That sounds familiar. We've seen this pattern before. As it turns out, the Delegating_unordered_map we made to reproduce prototypal inheritance is exactly what we need to also reproduce JavaScript's scope chains.
JavaScript
let globalVariable = "xyz";

function f() {
    let localVariable = true;

    function g() {
        let anotherLocalVariable = 123;

        // All variables of surrounding scopes are accessible
        localVariable = false;
        globalVariable = "abc";
    }
}
C++
Delegating_unordered_map global_environment;
global_environment["globalVariable"] = "xyz"s;

global_environment["f"] = js_function{[&] (any this_, vector<any> arguments) {
    Delegating_unordered_map f_environment;
    f_environment.__proto__ = &global_environment;

    f_environment["localVariable"] = true;

    f_environment["g"] = js_function{[&] (any this_, vector<any> arguments) {
        Delegating_unordered_map g_environment;
        g_environment.__proto__ = &f_environment;

        g_environment["anotherLocalVariable"] = 123;

        // All variables of surrounding scopes are accessible
        g_environment["localVariable"] = false;
        g_environment["globalVariable"] = "abc"s;

        return any{};
    }};

    return any{};
}};

Capture by reference vs by value

In C++, a closure's environment can store either a copy of values in scope or a reference to values in scope. But JavaScript's closures will only ever capture by reference. The difference is especially apparent in the famous problem with creating closures inside a loop.
JavaScript
let functions = [];

for (var i = 0; i < 3; i++) {
    // Every closure we push captures a reference to the same "i"
    functions.push(() => {
        console.log(i);
    });
}

// 3, 3, 3
functions.forEach(fn => {
    fn();
});
C++
vector<js_function> functions_by_value;
vector<js_function> functions_by_ref;

for (auto& i = *(new int{0}); i < 3; ++i) {
    // This closure will capture the value of "i"
    // at the moment the closure is created
    functions_by_value.push_back(js_function{[i] (any this_, vector<any> arguments) {
        cout << i;
        return any{};
    }});

    // This closure will capture a reference to the same "i"
    functions_by_ref.push_back(js_function{[&i] (any this_, vector<any> arguments) {
        cout << i;
        return any{};
    }});
}

// 0, 1, 2
for_each(functions_by_value.begin(), functions_by_value.end(), [] (auto fn) {
    fn();
});

// 3, 3, 3
for_each(functions_by_ref.begin(), functions_by_ref.end(), [] (auto fn) {
    fn();
});

let and capturing by value

Recently JavaScript added the let statement, which brings block scoping to the language. But when it comes to for-loops, then let does more than mere block scoping. At every iteration of the loop, JavaScript creates a brand new copy of the loop variable. That's not normal block scoping behavior. Rather, this seems to emulate capture by value in a language that otherwise only supports capture by reference. A JavaScript closure within the loop will still capture by reference, but it won't capture the loop variable itself; it will capture a per-iteration copy of the loop variable.
JavaScript
let functions = [];

for (let i = 0; i < 3; i++) {
    // Every closure we push captures a reference to a per-iteration copy of "i"
    functions.push(() => {
        console.log(i);
    });
}

// 0, 1, 2
functions.forEach((fn) => {
    fn();
});
C++
vector<js_function> functions;

for (auto i = 0; i < 3; ++i) {
    // Create a per-iteration copy of "i"
    auto& i_copy = *(new int{i});

    // Every closure we push captures a reference to a per-iteration copy of "i"
    functions.push_back(js_function{[&i_copy] (any this_, vector<any> arguments) {
        cout << i_copy;
        return any{};
    }});
}

// 0, 1, 2
for_each(functions.begin(), functions.end(), [] (auto fn) {
    fn();
});

Garbage collection

One problem with capturing by reference is we now have to be conscientious of memory allocation lifetimes. Normally the lifetime of a value in memory is tied to a particular block, such as a for-loop or a function, but since the closures in the last sample will outlive their block, then any values they capture by reference will also need to outlive their block. In the last C++ sample, we avoided that issue by allowing the per-iteration copies to stay alive forever. But in JavaScript, memory is garbage collected. The lifetime of any value isn't necessarily tied to any particular block. If something somewhere retains a reference to a value, then that value will be kept alive.
In C++, a guiding principle for the language is you don't pay for what you don't use, so there are several garbage collection strategies available -- such as unique_ptrshared_ptr, and deferred_ptr -- each with incrementally more features but also with more overhead. Since we're reproducing the behavior of JavaScript, we'll go straight to deferred_ptr for garbage collection.
Here's the last C++ sample again but this time with a garbage collector to alleviate the object lifetime issue.
C++
deferred_heap my_heap;

vector<js_function> functions;

for (auto i = 0; i < 3; ++i) {
    // Create a per-iteration copy of "i"
    auto i_copy = my_heap.make<int>(i);

    // Every closure we push captures a reference to a per-iteration copy of "i"
    functions.push_back(js_function{[i_copy] (any this_, vector<any> arguments) {
        cout << *i_copy;
        return any{};
    }});
}

// 0, 1, 2
for_each(functions.begin(), functions.end(), [] (auto fn) {
    fn();
});

// Destroy and deallocate any unreachable objects
my_heap.collect();
In the sample above, only the per-iteration copy is garbage collected. If we want all of our objects, arrays, and function objects to also be garbage collected, then we'll have to make a few changes. First, we need to update the Delegating_unordered_map and change the raw __proto__ pointer to the garbage collector deferred_ptr.
C++
class Delegating_unordered_map : private unordered_map<string, any> {
    public:
        deferred_ptr<Delegating_unordered_map> __proto__ {};

        // ...
};
And second, when we want a new js_object, we'll have to ask the garbage collector to make it for us by calling, for example, my_heap.make<js_object>( js_object{{"a", 1}, {"b", 2}} ). And we'll have to refer to those garbage collected object types as deferred_ptr<js_object>. That can be just a little tedious and verbose, so let's define a couple helper functions and a couple type aliases to be a touch cleaner.
C++
auto make_js_object(const js_object& obj = {}) {
    return my_heap.make<js_object>(obj);
}

auto make_js_function(const js_function& func) {
    return my_heap.make<js_function>(func);
}

using js_object_ref = deferred_ptr<js_object>;
using js_function_ref = deferred_ptr<js_function>;
You may notice I aliased a type name with the suffix "ptr" to a name with the suffix "ref". Admittedly the terminology here gets a bit awkward. What JavaScript calls a reference behaves like what C++ calls a pointer. C++ also has a feature it calls a reference, but it's different than what JavaScript calls a reference.

Classes

JavaScript has a multitude of styles for creating objects, but despite the variety and how different the syntax for each may look, they build on each other in incremental steps.

Factory functions

The first and simplest way to create a large set of objects that share a common interface and implementation is a factory function. We just create a js_object "ex nilo" inside a function and return it. This lets us create the same type of object multiple times and in multiple places just by invoking a function.
JavaScript
function thing() {
    return {
        x: 42,
        y: 3.14,
        f: function() {},
        g: function() {}
    };
}

let o = thing();
C++
auto thing = make_js_function({[] (any this_, vector<any> arguments) {
    return make_js_object({
        {"x", 42},
        {"y", 3.14},
        {"f", make_js_function({[] (any this_, vector<any> arguments) { return any{}; }})},
        {"g", make_js_function({[] (any this_, vector<any> arguments) { return any{}; }})}
    });
}});

auto o = (*thing)();
But there's a drawback. This approach can cause memory bloat because each object contains its own unique copy of each function. Ideally we want every object to share just one copy of its functions.

Prototype chains

We can take advantage of JavaScript's prototypal inheritance and change our factory function so that each object it creates contains only the data unique to that particular object, and delegate all other property requests to a single, shared object.
JavaScript
let thingPrototype = {
    f: function() {},
    g: function() {}
};

function thing() {
    let o = {
        x: 42,
        y: 3.14
    };

    Object.setPrototypeOf(o, thingPrototype);

    return o;
}

let o = thing();
C++
auto thing_prototype = make_js_object({
    {"f", make_js_function({[] (any this_, vector<any> arguments) { return any{}; }})},
    {"g", make_js_function({[] (any this_, vector<any> arguments) { return any{}; }})}
});

auto thing = make_js_function({[=] (any this_, vector<any> arguments) {
    auto o = make_js_object({
        {"x", 42},
        {"y", 3.14}
    });

    o->__proto__ = thing_prototype;

    return o;
}});

auto o = (*thing)();
In fact, this is such a common pattern in JavaScript that the language has built-in support for it. We don't need to create our own shared object (the prototype object). Instead, a prototype object is created for us automatically, attached to every function under the property name prototype, and we can put our shared data there.
JavaScript
function thing() {
    let o = {
        x: 42,
        y: 3.14
    };

    Object.setPrototypeOf(o, thing.prototype);

    return o;
}

thing.prototype.f = function() {};
thing.prototype.g = function() {};

let o = thing();
C++
js_function_ref thing = make_js_function({[=] (any this_, vector<any> arguments) {
    auto o = make_js_object({
        {"x", 42},
        {"y", 3.14}
    });

    o->__proto__ = any_cast<js_object_ref>((*thing)["prototype"]);

    return o;
}});

(*thing)["prototype"] = make_js_object({
    {"f", make_js_function({[] (any this_, vector<any> arguments) { return any{}; }})},
    {"g", make_js_function({[] (any this_, vector<any> arguments) { return any{}; }})}
});

auto o = (*thing)();
But there's a drawback. This is going to result in some repetition during object creation in every such delegating-to-prototype-factory-function.

Constructor functions

JavaScript has some built-in support to isolate the repetition. The new keyword will create an object that delegates to some other arbitrary function's prototype. Then it will invoke that function to perform initialization with the newly created object as the this argument. And finally it will return the object.
JavaScript
function Thing() {
    this.x = 42;
    this.y = 3.14;
}

Thing.prototype.f = function() {};
Thing.prototype.g = function() {};

let o = new Thing();
C++
auto js_new(js_function_ref constructor, vector<any> arguments = {}) {
    auto o = make_js_object();
    o->__proto__ = any_cast<js_object_ref>((*constructor)["prototype"]);

    (*constructor)(o, arguments);

    return o;
}

auto Thing = make_js_function({[] (any this_, vector<any> arguments) {
    (*any_cast<js_object_ref>(this_))["x"] = 42;
    (*any_cast<js_object_ref>(this_))["y"] = 3.14;

    return any{};
}});

(*Thing)["prototype"] = make_js_object({
    {"f", make_js_function({[] (any this_, vector<any> arguments) { return any{}; }})},
    {"g", make_js_function({[] (any this_, vector<any> arguments) { return any{}; }})}
});

auto o = js_new(Thing);

That's all folks! (...for now)

I hope you found this look under the hood just as interesting and enlightening as I did.
I'd like this to become a living and growing document. I'm always looking to make the explanations better and the code clearer, as well as to lift the hood of even more dark corners of the language. I welcome and appreciate any contributions to make it better.
This article was originally published at https://github.com/Jeff-Mott-OR/javascript-cpp-rosetta-stone.