Thanks to the Enum
module, in Elixir we can trivially remove duplicates from a list.
In the following example, we take a list of integers and pass it to the Enum.uniq/1
function which removes duplicates from the list without altering the original order of the remaining elements.
list = [1, 2, 2, 3, 3, 1, 2, 4]
Enum.uniq(list) # Returns [1, 2, 3, 4]
If you are trying to only remove consecutive duplicate elements, then there is Enum.dedup/1
:
list = [1, 2, 2, 3, 3, 1, 2, 4]
Enum.dedup(list) # Returns [1, 2, 3, 1, 2, 4]
(Note: We append /1
simply as a notation indicating the arity of a function, that is how many arguments it accepts. my_func/1
accepts one argument, my_func/2
two, and so on.)
Enum is full of helpful functions when working with collection data types that implement the Enumerable
protocol (e.g., lists, maps, ranges, streams, etc.) and it’s worth getting acquainted with.
Removing duplicates using recursion
Alright, Elixir does the heavy lifting for us in this case, but how would we go about removing duplicates from a list in Elixir without using Enum.uniq/1
? I mean from scratch, simply using recursion without relying on Enum
, sets, :lists.usort/1
, etc.
It is worth asking such a question to both exercise our recursion muscle (something that doesn’t come naturally to most programmers) and so that we’re ready to handle conceptually similar problems that do not have pre-made functions but could benefit from a recursive solution.
There are likely a few ways to implement this, but this what sprang to mind when I thought about it:
defmodule MyList do
def uniq([]), do: []
def uniq([head | tail]) do
[head | for(x <- uniq(tail), x != head, do: x)]
end
end
Calling MyList.uniq(list)
will then return the same list without duplicates as Enum.uniq(list)
did. (Although, it’s worth noting, that we implemented a List
-specific version of the uniq/1
function).
Let’s see how this works. If the list is empty (i.e., []
) we obviously return an empty list, as there is nothing to remove. This is our base case for the recursion.
If the list is not empty, it will have a head and a tail, and we use Elixir’s pattern matching to bind the first element of the list passed to the function to head
and the rest of the elements to the list tail
.
Note that a proper list with a single element will simply have an empty list as its tail. So writing [3]
is equivalent to writing [3|[]]
where 3
is the head, []
is the tail, and |
is the cons operator (short for constructor operator, as it’s used to construct lists).
So far so good. Here is where things get a little trickier. Let’s analyze this line:
[head | for(x <- uniq(tail), x != head, do: x)]
The code is wrapped in square brackets [...]
which means that we are returning a list. Then you’ll notice the |
cons operator. So we are constructing a list that has head
as its first element and whatever the rest of that line of code does, as its tail.
This makes sense if you think about it. Sure, the list might have duplicates, but the first element will always be included. If a duplicate of the first element exists, that’s the one that is going to be removed and not the first element.
So we are building a list and the first element of the original list is also the first element of our deduplicated list. What goes into the rest of the list?
Comprehensions
We see a for
. Unlike many programming languages, for
is not a loop keyword in Elixir. Rather, it is used for comprehensions (a form of syntax sugar to generate lists from existing collections). Syntax, which is not too different from mathematical notation.
Here is a simple example of how to use them:
for x <- [1, 2, 3, 4], do: x + x # Returns [2, 4, 6, 8]
“For each element x
in [1, 2, 3, 4]
do x + x
and put the result in a list.”
It also accepts filters, which allows us to specify a condition:
for x <- [1, 2, 3, 4], x < 3, do: x + x # Returns [2, 4]
In this example, the condition is that x
is smaller than 3
, so only the first two elements, which are lesser than 3
, get doubled and added to the resulting list.
Recursing our way to the base case
OK, back to our “cryptic” line:
[head | for(x <- uniq(tail), x != head, do: x)]
head
is our first element and then we are using a comprehension to generate a list without duplicates.
We are saying, for each x
in a deduplicated tail
, add x
to the list if it’s different from our first element head
.
The part that gets people weirded out about is recursively calling uniq(tail)
. We can get away with this because we have a base case that ensures we don’t recurse forever.
At each call of uniq(tail)
we are making the tail shorter by one element.
For example, executing MyList.uniq([1, 2, 3, 3])
will make the following recursive calls:
MyList.uniq([1, 2, 3, 3])
MyList.uniq([2, 3, 3])
MyList.uniq([3, 3])
MyList.uniq([3])
MyList.uniq([])
When we eventually get to the tail being []
, which is our base case,[]
is returned and MyList.uniq/1
is no longer called.
Recursion can be hard to grasp at first, but it’s a powerful tool and a staple of functional programming, so it’s well worth practicing.
As pointed out in the comment section, this implementation is quite illustrative but not very efficient. In production, you’d want to opt for the built-in functions or implement a tail-recursive version that leverages MapSet
. And although tail recursion is faster in this case, it’s worth noting that even that is not a silver bullet.
Get more stuff like this
Subscribe to my mailing list to receive similar updates about programming.
Thank you for subscribing. Please check your email to confirm your subscription.
Something went wrong.
Nice demonstration of comprehensions, but: since it compares against the rest of the list, for each element, wouldn’t that be pretty bad performance, at O(n^2)? You could do a lot better just iterating over the list once with a MapSet to check what’s in it, and a list as an accumulator. Or if you don’t care about order, just make a MapSet out of it.
For sure, Dave. I wanted to demonstrate recursion with something simple but still a bit more complex than the factorial or Fibonacci. In production, one would use built-in functions or a tail-recursive version that leverages MapSet.