Published on

# Implementing Ruby’s `Array#flatten` in Haskell

Ruby’s `Array#flatten` is on occasion a very handy method. It reduces `n`-dimensional arrays to `1`-dimensional arrays, containing all the elements of the top-level-array and its sub-arrays. Therefore, it is said to be “flattening” the `n`-dimensional array.

``````2.1.4 :001 > [ ].flatten
=> []
2.1.4 :002 > [ [] ].flatten
=> []
2.1.4 :003 > [ 1, 2, 3 ].flatten
=> [1, 2, 3]
2.1.4 :004 > [ 1, 2, [3] ].flatten
=> [1, 2, 3]
2.1.4 :005 > [ [1], 2, [3, 4] ].flatten
=> [1, 2, 3, 4]
2.1.4 :006 > [ 1, [2], [[3, 4], 5] ].flatten
=> [1, 2, 3, 4, 5]
2.1.4 :007 > [ 1, 2, [2, 3] ].flatten # retains duplicates
=> [1, 2, 2, 3]
2.1.4 :008 > [ 2, 1, [4, 3] ].flatten # retains order
=> [2, 1, 4, 3]``````

As you can see, `#flatten` is conceptually quite simple. The same functionality can be implemented in Haskell. But, as usual, the solution requires a little bit of abstract thinking.

## Naive Approach

Let’s start with `1`-dimensional lists and work our up to `n` dimensions. The first dimension is trivial: All we need is the identity function.

``````flatten1 :: [a] -> [a]
flatten1 = id -- flattening a 1-dim list is a no-op``````

For the second dimension, we could do something like the following:

``````flatten2 :: [[a]] -> [a]
flatten2 []     = []
flatten2 (x:xs) = (flatten1 x) ++ (flatten2 xs)``````

This isn’t really what we want, because we’ve told the compiler that the argument has exactly `2` dimensions. For that reason, we have to define another function to support three dimensions:

``````flatten3 :: [[[a]]] -> [a]
flatten3 []     = []
flatten3 (x:xs) = (flatten2 x) ++ (flatten3 xs)``````

By now, you can probably see the pattern. We have to define yet another and another and another function for each additional dimension. Thus, the naive approach doesn’t get us very far…

## Generic Approach

An instance of `Array` in Ruby doesn’t restrict the types of its elements in any way (i.e. it’s just a bunch of references to instances of `Object`). Therefore, the elements of an `Array` instance can…

1. … have different types. For example, `[1, "abc"]` is completely valid in Ruby (the first element is an instance `Fixnum` and the second element is an instance of `String`).
2. … reference other `Array` instances (all classes in Ruby, including `Array`, are sub-classes of `Object`). Consequently, `Array`s can be arbitrarily nested.

Haskell is statically-typed, whereas Ruby is dynamically-typed. Due to restrictions imposed by the static type system, there are certain things that cannot be done in Haskell.

Okay, that’s enough background information for now. Let’s flatten some stuff :)

Regarding (1): There is no counterpart for Ruby `Array`s with mixed elements in Haskell. All elements in a list, must be values of the same type. That being said, we could define a custom data type to circumvent this issue. However, that only works for types we know about at compile-time.

Regarding (2): Unless we know `n` at compile-time, the concept of `n`-dimensional lists doesn’t make any sense in Haskell. Instead, we have to tell the compiler more explicitly what we really want. Namely, a `n`-ary tree. Fortunately, Haskell makes it very easy for us define such trees with algebraic data types:

``````data Tree a = Node [Tree a]
| Leaf a``````

What does this mean? Well, `Tree a` is a algebraic data type with two constructors. The first one, `Node`, encapsulates a list `[Tree a]`. The second one, `Leaf`, encapsulates a value of type `a`.

Next, we are going to illustrate our new data type by translating the Ruby array `foo = [1, 2, [3, 4], []]` to Haskell:

``````foo :: Tree Integer
foo = Node [ Leaf 1
, Leaf 2
, Node [ Leaf 3, Leaf 4 ]
, Node [ ] ]``````

The last thing we still need, is the implementation of `flatten` itself. However, that’s the easy part: The function simply has to recursively traverse the given tree and collect the all values from its `Leaf`s.

``````flatten :: Tree a -> [a]
flatten ( Leaf v  ) = [v]
flatten ( Node ts ) = concat \$ fmap flatten ts``````

We can use the tree we’ve defined before (i.e. `foo`) to test the function in GHCi:

``````~ ➜ ghci
Prelude> :l Flatten.hs
[1 of 1] Compiling Flatten          ( Flatten.hs, interpreted )
*Flatten> let foo = Node [ Leaf 1, Leaf 2, Node [ Leaf 3, Leaf 4 ], Node [] ]
*Flatten> flatten foo
[1,2,3,4]
*Flatten>``````

`[1,2,3,4]` is of course exactly the result we’ve expected.

## Conclusion

If you are not familiar with Haskell, you might think now that this is a lot of overkill. Personally, I’m a big proponent for Haskell, but I agree that `#flatten` is easier to implement in Ruby.

However, I would argue that this a consequence of Haskell’s static type system and not its functional nature. If you disagree, I encourage you to implement `#flatten` in a statically-typed and object-oriented language (e.g. Java) – you’ll see what I mean.

What’s the benefit of functional programing and static typing? Safety and by extension confidence in your code. Dynamic typing is convenient, that’s for sure. But, saying that the price we are paying for that convenience is high, would be an understatement. There’s approximately one gazillion ways in which your Haskell program can blow up. With Ruby and dynamic typing you get two gazillion…

While writing this post, I’ve actively tried to break Ruby’s `#flatten` to substantiate my point. This is what I came up with after a couple of minutes:

``````2.1.4 :001 > a = [1, 2, 3]
=> [1, 2, 3]
2.1.4 :002 > a << a
=> [1, 2, 3, [...]]
2.1.4 :003 > a.flatten
ArgumentError: tried to flatten recursive array
from (irb):3:in `flatten'
from (irb)34
from /Users/cs/.rvm/rubies/ruby-2.1.4/bin/irb:11:in `<main>'``````

It never ceases to amaze me how easy it is to break Ruby programs.

Notes
Software: MRI Ruby 2.1.4-p265, Glasgow Haskell Compiler 7.8.3.