Solving Algorithmic Problems: Find a Duplicate in an Array
Problem
- Find a duplicate in an array
Given an array of n + 1 integers between 1 and n, find one of the duplicates.
If there are multiple possible answers, return one of the duplicates.
If there is no duplicate, return -1.
Example:
Input: [1, 2, 2, 3]Output: 2
Input: [3, 4, 1, 4, 1]Output: 4 or 1
Category: Arrays
Solving Process
Before to see the solutions, let’s talk a bit about the problem. We get an array of n + 1
element with integers between 1 and n
.
For example, with an array of 5 integers, it implies that each integer will have a value between 1 and 4 (included). This means there will automatically be at least
The only exception is with an array of size 1. This is the only case where we should return a -1.
Brute Force
The brute force solution is to implement two nested loops this way:
for i = 0; i < size(a); i++ { for j = i+1; j < size(a); j++ { if(a[i] == a[j]) return a[i] }}
O(n²) time complexity and O(1) space complexity.
Count Iterations
Another solution is to have a data structure to count the number of iterations of each integer. This solution could be implemented either with an array or a hash table.
An implementation in Java:
The value of index i
represent the number of iterations of i+1
.
This solution is O(n) time but O(n) space as we need an extra structure.
Sorted Array
If we apply the simplification technique, we could try to find a solution with a sorted array.
In this case, we would just have to compare each element with its right neighbor.
In Java:
O(1) in space but O(n log(n)) in time as we need to sort the collection up front.
Sum of the Elements
A direction we may think about is to sum the elements of the array and to compare it with 1 + 2 + … + n.
Let’s see an example:
Input: [1, 4, 3, 3, 2, 5]Sum = 18
As in this example, we have n = 5:Sum of 1 to 5 = 1 + 2 + 3 + 4 + 5 = 15
=> 18 - 15 = 3 so 3 is the duplicate
In this example, we are able to find the solution in O(n) time and O(1) space. Yet, it is working solely in the case where we have one duplicate.
A counter-example:
Input: [1, 2, 3, 2, 3, 4]Sum = 15
As in this example we have n = 5,Sum of 1 to 5 = 1 + 2 + 3 + 4 + 5 = 15
/!\ Not working
This direction is going nowhere. But sometimes we need to try stuff to find the most optimal solution.
Marker
There is something interesting to mention. So far, our solutions are not really making use of the fact that each integer is between 1 and n
. Due to this interesting constraint, each value has its own corresponding index in the array.
The solution is to consider this particular array as a sort of linked list. Any index is pointing to the value of that index.
We iterate over each element and mark its corresponding index by setting its sign to minus. If we already marked it as negative, it means its index is a duplicate.
Let’s see a concrete step by step example:
Input: [2, 3, 3, 1]
* Iteration 0: Absolute value = 2 Put a minus sign to a[2] => [2, 3, -3, 1]
* Iteration 1: Absolute value = 3 Put a minus sign to a[3] => [2, 3, -3, -1]
* Iteration 2: Absolute value = 3 As a[3] is already negative, it means 3 is a duplicate
In Java:
This solution is O(n) time and O(1) space. Yet, it requires to mutate the input list.
Runner Technique
There is another solution which also considers the given array as a sort of linked list (again, this is possible because of the constraint that each integer is between 1 and n
).
Let’s analyze the example [1, 2, 3, 4, 2]
:
With this representation, we can simply say that a duplicate exists when a cycle does exist. Moreover, the duplicate is the entry point of the cycle (in this case, the second element).
If we take inspiration from Floyd’s cycle-finding algorithm, we can derive the following algorithm:
- Initiate two pointers
slow
andfast
- For each step, slow is moving one step at a time with
slow = a[slow]
, whereas fast is moving two steps at a time withfast = a[a[fast]]
- When
slow == fast
, we are in a cycle
Is the algorithm completed? Not yet. The entry point of this cycle will be the duplicate. We have to reset slow
and move both pointers step by step until they are equals again.
A possible implementation in Java:
This solution is O(n) space and O(1) time.