287. Find the Duplicate Number
Problem Description
In this problem, we're given an array called nums
which contains n + 1
integers, where every integer is within the range of 1
to n
, both inclusive. There's an important constraint in the problem: there is exactly one number in the array nums
that appears more than once, and our task is to find that one repeated number. The challenge is to solve this problem under the following two conditions: we are not permitted to alter the original array, and we have to solve it using only a constant amount of space, which eliminates the possibility of using additional data structures that can grow with the size of the input.
Intuition
We can approach this problem with a binary search technique despite the fact that the array is not sorted. This might seem counterintuitive at first because binary search usually requires a sorted array. However, the key insight here is to use binary search not on the elements of the array itself, but rather on the range of numbers between 1
to n
.
The intuition is based on the Pigeonhole Principle which states that if you have more pigeons than pigeonholes, at least one pigeonhole must contain more than one pigeon. In this context, if there are more numbers in the array than the range it covers (n
numbers in the range 1
to n
), one of the numbers must be a duplicate.
We start by considering the entire range of numbers from 1
to n
. Then, we use binary search to split this range into two halves: the first half (from 1
to mid
) and the second half (from mid + 1
to n
). The helper function, f
, calculates how many numbers in the array are less than or equal to a given middle value, x
. If the count is greater than x
, we know the duplicate number must be in the first half; otherwise, it's in the second half.
By repeatedly halving the search space, we can eventually narrow down the range to a single number, which is the duplicate we're looking for. The bisect_left
function in Python assists us in performing this binary search, and the key f
, is passed to determine whether we should go left or right in our search.
Learn more about Two Pointers and Binary Search patterns.
Solution Approach
The solution approach for finding the duplicate number in the array leverages binary search, which is an efficient algorithm for finding an item from a sorted list by repeatedly dividing the search interval in half. Although the array itself is not sorted, we use binary search on the range of possible numbers (1
to n
) to find the duplicate.
Let's walk through the implemented solution step by step:
-
Define the Helper Function
f(x)
: This function takes an integerx
and returnsTrue
if the number of elements innums
less than or equal tox
is greater thanx
itself. Otherwise, it returnsFalse
. This function is essential because it determines whether the duplicate number lies in the lower half (1
tox
) or the upper half (x + 1
ton
) of the current search interval. -
Binary Search with
bisect_left
: We use Python'sbisect_left
function from thebisect
module to apply binary search. Thebisect_left
function takes three arguments:- The range to perform the search on, which is
range(len(nums))
. Note that this range goes from0
ton
inclusive since the arraynums
hasn + 1
elements. - A boolean value that we are trying to find in the hypothetical sorted array of booleans,
True
in this case since we are looking for the point wheref(x)
transitions fromFalse
toTrue
. - The
key
function, which in our solution is the helper functionf
. This function is applied to the middle value in the current search interval to guide the binary search process.
- The range to perform the search on, which is
-
Finding the Duplicate Number: The binary search proceeds by checking the middle of the current interval. If
f(mid)
isTrue
, it means there are more numbers innums
that are less than or equal tomid
than there should be, indicating that the duplicate number must be less than or equal tomid
. Iff(mid)
isFalse
, it means that the duplicate is greater thanmid
and we shift our search to the upper half. This process continues until the algorithm converges on the duplicate number, which will be the point at whichf(x)
changes fromFalse
toTrue
.
By repeatedly narrowing the search interval, we eventually find the duplicate number with O(log n)
search iterations, with each iteration involving an O(n)
operation to calculate the sum within the helper function. Overall, the solution thus takes O(n log n)
time with constant space complexity, as we do not use any additional data structures that are dependent on the size of the input.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's consider an example to understand the solution approach. Imagine we have an array nums
with the size of n + 1
, and it looks like this: [1, 3, 4, 2, 2]
. The n
in this case is 4
since the range of numbers is from 1
to 4
.
Here's how we would walk through the problem step by step:
-
Define the Helper Function
f(x)
:- We need to implement a function
f(x)
that returnsTrue
if the count of numbers innums
that are less than or equal tox
is greater thanx
. For example,f(3)
would count how many numbers innums
are<= 3
. In our example,f(3)
would returnTrue
because there are 4 numbers that fit the condition (1, 2, 2, and 3), which is greater than 3.
- We need to implement a function
-
Binary Search with
bisect_left
:- Now we initiate a binary search on the range of numbers from
1
ton
(1
to4
in our example). - The
bisect_left
function will effectively split this range and use our functionf
to decide whether to look in the lower half or the upper half. - In the first iteration, the middle value
mid
between1
and4
is2
. We calculatef(2)
, which isFalse
since there are only 2 numbers innums
that are less than or equal to2
, which is not greater than2
. This tells us that the duplicate must be larger than2
. - In the next iteration, the middle value between
3
and4
is3
. We calculatef(3)
and, as stated earlier, it returnsTrue
. This tells us to search in the lower half, but since only3
and4
is left, we have narrowed down3
as the potential duplicate.
- Now we initiate a binary search on the range of numbers from
-
Finding the Duplicate Number:
- Once we have the bounds narrowed down to a single number where
f(x)
transitions fromFalse
toTrue
, or vice versa, we know we have found the duplicate number. In our example, whenf(3)
returnsTrue
andf(2)
returnedFalse
, we know that3
is the value where the transition happens, thus3
is the duplicate number.
- Once we have the bounds narrowed down to a single number where
In conclusion, even though the number array is not sorted, we used the properties of binary search on the range 1
to n
to efficiently find the duplicate number. Since f(x)
only involves counting elements, and we only needed a range to apply bisect_left
, we maintained constant space usage as per the problem's constraints.
Solution Implementation
1from typing import List
2from bisect import bisect_left
3
4class Solution:
5 def findDuplicate(self, nums: List[int]) -> int:
6 # Define a helper function that will check if the count of numbers less than
7 # or equal to x is greater than x itself.
8 def is_duplicate_above_x(x: int) -> bool:
9 # Count the numbers less than or equal to x
10 count = sum(num <= x for num in nums)
11 # If the count is greater than x, we might have a duplicate above x
12 return count > x
13
14 # Use binary search (implemented as bisect_left) to find the duplicate.
15 # The search range is from 1 to len(nums) - 1 as len(nums) could be the maximum number possible
16 # since there is exactly one duplicate.
17 duplicate_number = bisect_left(range(1, len(nums)), True, key=is_duplicate_above_x)
18
19 return duplicate_number
20
1class Solution {
2 public int findDuplicate(int[] nums) {
3 // Initializing the low and high pointers for binary search.
4 int low = 0;
5 int high = nums.length - 1;
6
7 // Binary search to find the duplicate number.
8 while (low < high) {
9 // Calculating the middle index.
10 int middle = (low + high) / 2; // same as (low + high) >> 1 but clearer to understand
11 int count = 0; // Counter for the number of elements less than or equal to middle.
12
13 // Iterate over the array and count elements less than or equal to middle.
14 for (int value : nums) {
15 if (value <= middle) {
16 count++;
17 }
18 }
19
20 // Determine if the duplicate is in the lower half or upper half.
21 // If the count is greater than middle, the duplicate is in the lower half.
22 if (count > middle) {
23 high = middle; // Narrow the search to the lower half.
24 } else {
25 low = middle + 1; // Narrow the search to the upper half.
26 }
27 }
28
29 // When low == high, we have found the duplicate number.
30 return low;
31 }
32}
33
1#include <vector>
2
3class Solution {
4public:
5 int findDuplicate(vector<int>& nums) {
6 // Initialize the search range
7 int left = 0;
8 int right = nums.size() - 1;
9
10 // Use binary search to find the duplicate
11 while (left < right) {
12 // Find the midpoint of the current search range
13 int mid = left + (right - left) / 2;
14
15 // Count how many numbers are less than or equal to 'mid'
16 int count = 0;
17 for (int num : nums) {
18 if (num <= mid) {
19 count++;
20 }
21 }
22
23 // If the count is more than 'mid', then the duplicate is in the left half
24 if (count > mid) {
25 right = mid; // Search in the left half
26 } else {
27 left = mid + 1; // Search in the right half
28 }
29 }
30
31 // 'left' is the duplicate number
32 return left;
33 }
34};
35
1function findDuplicate(nums: number[]): number {
2 // Define the search range start and end, initially set to 1 and the number of elements - 1
3 let left = 1;
4 let right = nums.length - 1;
5
6 while (left < right) {
7 // Calculate the midpoint of the current search range
8 const mid = Math.floor((left + right) / 2);
9 let count = 0;
10
11 // Count how many numbers in the array are less than or equal to the midpoint
12 for (const value of nums) {
13 if (value <= mid) {
14 count++;
15 }
16 }
17
18 // If the count is greater than the midpoint, this indicates that the duplicate
19 // is within the range [left, mid], so we focus the search there.
20 // Otherwise, the duplicate is in the range [mid + 1, right].
21 if (count > mid) {
22 right = mid; // Narrow the search to the left half
23 } else {
24 left = mid + 1; // Narrow the search to the right half
25 }
26 }
27
28 // Once left meets right, we've found the duplicate number
29 return left;
30}
31
Time and Space Complexity
The time complexity of the given code snippet is O(n * log n)
. This is because the bisect_left
function performs binary search, which has a time complexity of O(log n)
, and it calls the f
function on each step of the binary search. The f
function has a time complexity of O(n)
since it iterates over all n
elements in the nums
list to calculate the sum of all elements less than or equal to x
. Since the binary search is performed in the range of len(nums)
, which is n
, the f
function is called O(log n)
times, resulting in an overall time complexity of O(n * log n)
.
The space complexity of the code is O(1)
. The code uses a constant amount of additional space: the f
function and the binary search do not use any extra space that grows with the input size. Therefore, regardless of the size of the input list nums
, the additional space required by the algorithm remains constant.
Learn more about how to find time and space complexity quickly using problem constraints.
Consider the classic dynamic programming of longest increasing subsequence:
Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.
For example, the length of LIS for [50, 3, 10, 7, 40, 80]
is 4
and LIS is
[3, 7, 40, 80]
.
What is the recurrence relation?
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Want a Structured Path to Master System Design Too? Don’t Miss This!