1794. Count Pairs of Equal Substrings With Minimum Difference

MediumGreedyHash TableString

Problem Description

In this problem, we are given two strings, firstString and secondString, and our task is to count the number of unique quadruples of indices (i,j,a,b) that meet certain conditions. Each quadruple represents a matching substring between firstString and secondString with the indices fulfilling the following:

0 <= i <= j < firstString.length, meaning that i and j are valid indices within the firstString, and i can be equal to j to allow for single-character substrings.
0 <= a <= b < secondString.length, indicating that a and b are valid indices within the secondString, with the same provision for a to equal b.
The substring from i to j (inclusive) in firstString is exactly the same as the substring from a to b in secondString.
j - a should be the smallest possible difference for all such valid quadruples. This implies that we're interested in matching substrings that are closest to the start of firstString.

The goal is to return the count of such quadruples.

Intuition

To find a solution, we need to identify all possible matching substrings between firstString and secondString, but with an added twist of minimizing the j - a difference. This suggests that a brute force method of checking every possible substring between the two strings would be inefficient, particularly with large strings. Instead, we should find a way to both confirm when substrings match and also ensure we're doing so in a way that minimizes j - a.

A key observation is that for any matching substring, the last character of the substring in secondString must be the closest it can possibly be to the start of secondString compared to its index in firstString. So, for every character in firstString, we need to find the last occurrence of that character in secondString and track the minimum j - a value.

The intuition behind the solution is to keep track of the closest last occurrence of each character from secondString as we iterate through firstString. We maintain a dictionary, last, that maps each character to its last occurrence in secondString. As we go through each character c in firstString, we calculate the difference between the current index in firstString and the index of the last occurrence of c in secondString. If the current difference is less than the minimum difference found so far (mi), we update mi and reset the count of valid quadruples to 1, as this is the new minimum difference. If we encounter the same minimum difference again, we increment our count.

By tracking the minimum difference as we iterate through firstString and updating when necessary, we are able to tally the number of quadruples efficiently and ensure that the condition j - a is minimized.

Learn more about Greedy patterns.

Solution Approach

The solution uses a dictionary to store the mapping of characters to their last occurrence in secondString. This is crucial because the conditions specify that we want to match substrings and minimize j - a, which inherently requires us to know the last (closest to the end) index of a character.

The steps of the implementation are as follows:

We first initialize the dictionary last to map each character c in secondString to the index i of its last occurrence, as given by {c: i for i, c in enumerate(secondString)}. This operation happens once and is an O(n) operation where n is the length of secondString.
We set ans to 0, which will keep the count of the number of valid quadruples, and mi to inf (infinity), which is a placeholder for the minimum value of j - a we have seen so far.
We then iterate through each character c and its index i in firstString, checking if c is present in last. If it is not, then there's no matching substring ending with that character, and we move on.
When we do find a matching character, we calculate the difference t = i - last[c], which represents j - a.
We then compare t with mi. If t is smaller, it means we have found a closer match in terms of the indices, and we update mi to t and reset ans to 1 because we've found a new minimum. If t is equal to mi, it means we've found another quadruple with the same minimum difference, so we increment ans by 1.
Finally, we return the value of ans, which is the total count of such quadruples after iterating through the entire firstString.

This approach is efficient as it only requires O(n + m) time complexity, where n is the length of firstString and m is the length of secondString, because we go through each string only once. It avoids the much less efficient O(nmmin(n,m)) time complexity that would be required if we were to naively look for matching substrings with a nested loop and substring comparisons.

A key algorithmic principle at play here is the use of a "greedy" strategy that aims to always choose the closest last occurrence of a character when considering matching substrings, ensuring the smallest possible j - a as required by the problem.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's illustrate the solution approach with a small example:

Suppose we have firstString = "abac" and secondString = "cabba". We want to find the number of unique quadruples where the substrings match and have the smallest j - a difference.

We start by creating the last dictionary mapping each character in secondString to its last occurrence: {'c': 0, 'a': 4, 'b': 3}.
We set ans to 0 and mi (minimum difference) to infinity.
Iterate through each character in firstString:
- For i = 0 with c = 'a', last['a'] is 4, so t = 0 - 4 = -4. Since t is less than mi, we set mi to -4 and ans to 1.
- For i = 1 with c = 'b', there is no 'b' in last, so we move on.
- For i = 2 with c = 'a', last['a'] is still 4, so t = 2 - 4 = -2. t is greater than mi, so we ignore this.
- For i = 3 with c = 'c', last['c'] is 0, so t = 3 - 0 = 3. t is greater than mi, so again, we ignore this.
After iteration, we find that ans is 1, which indicates there is only one quadruple satisfying the conditions: (0, 0, 4, 4), representing the substring 'a' from both strings.

Thus, the function would return 1 as the count of such quadruples.

By following this approach, we efficiently calculated the result without having to compare each possible substring pair, saving computation time and adhering to the problem's constraints.

Solution Implementation

1class Solution:
2    def countQuadruples(self, first_string: str, second_string: str) -> int:
3        # Create a dictionary to map each character in the second string to the index of its last occurrence.
4        last_occurrence_index = {char: index for index, char in enumerate(second_string)}
5      
6        # Initialize answer to 0, and minimum difference as infinity (a large number).
7        answer = 0
8        minimum_difference = float('inf')  # 'inf' represents infinity in Python.
9      
10        # Loop over each character and its index in the first string.
11        for index, char in enumerate(first_string):
12            # Check if the current character exists in the second string.
13            if char in last_occurrence_index:
14                # Calculate the difference between the index of the character in
15                # the first string and the last occurrence index in the second string.
16                difference = index - last_occurrence_index[char]
17              
18                # If the current difference is less than the minimum difference recorded:
19                if minimum_difference > difference:
20                    # Update minimum difference and reset answer to 1.
21                    minimum_difference = difference
22                    answer = 1
23                # If the current difference matches the minimum difference:
24                elif minimum_difference == difference:
25                    # Increment the answer by 1.
26                    answer += 1
27      
28        # Return the final count of quadruples.
29        return answer
30

1class Solution {
2    public int countQuadruples(String firstString, String secondString) {
3        // Initialize an array to store the last occurrence index of each character
4        int[] lastOccurrence = new int[26];
5        // Traverse through the second string to fill the last occurrence array
6        for (int i = 0; i < secondString.length(); ++i) {
7            lastOccurrence[secondString.charAt(i) - 'a'] = i + 1;
8        }
9      
10        int count = 0; // Counter for the number of quadruples
11        int minDifference = Integer.MAX_VALUE; // Variable to track the minimum difference
12      
13        // Traverse through the first string to find valid quadruples
14        for (int i = 0; i < firstString.length(); ++i) {
15            // Get the last occurrence of the current character from the first string in the second string
16            int j = lastOccurrence[firstString.charAt(i) - 'a'];
17            // Ensure that the character also occurs in the second string
18            if (j > 0) {
19                int difference = i - j; // Calculate the difference
20                // If the difference is less than the current minimum, update the minimum and reset the count
21                if (minDifference > difference) {
22                    minDifference = difference;
23                    count = 1;
24                } else if (minDifference == difference) {
25                    // If the difference is the same as the current minimum, increment the count
26                    ++count;
27                }
28            }
29        }
30        // Return the count of quadruples
31        return count;
32    }
33}
34

1#include <string>
2#include <vector>
3#include <climits>
4
5class Solution {
6public:
7    // Count the number of quadruples where the last occurrence of a character in the first string
8    // comes before the last occurrence of the same character in the second string
9    int countQuadruples(string firstString, string secondString) {
10        // Initialize an array to store the last occurrence indices for each character in the second string
11        // Note that the array is initialized with zeros (0) which signifies that the character hasn't been found yet.
12        int lastOccurrence[26] = {};
13
14        // Find the last occurrence of each character in the second string
15        for (int i = 0; i < secondString.size(); ++i) {
16            // Using character 'a' as base index 0 for 'a' to 'z' as 0 to 25
17            lastOccurrence[secondString[i] - 'a'] = i + 1; // Store index + 1 to differentiate between found and not found.
18        }
19
20        int count = 0; // Will hold the final count of quadruples
21        int minDifference = INT_MAX; // Start with the maximum value as the difference to find the minimum
22
23        // Iterate through the first string to find the quadruples
24        for (int i = 0; i < firstString.size(); ++i) {
25            int currentIndex = lastOccurrence[firstString[i] - 'a']; // Get the last occurrence index from the second string
26
27            // If the character is present in the second string
28            if (currentIndex) {
29                int difference = i - (currentIndex - 1); // Calculate the difference i - j
30
31                // If a new minimum difference is found, update 'minDifference' and reset count to 1
32                if (minDifference > difference) {
33                    minDifference = difference;
34                    count = 1;
35                } 
36                // If the same minimum difference is found again, increment count
37                else if (minDifference == difference) {
38                    ++count;
39                }
40            }
41        }
42
43        // Return the total count of quadruples found
44        return count;
45    }
46};
47

1// The function takes two strings and counts the number of quadruples where the last occurrence of a
2// character in the first string comes before the last occurrence of the same character in the second string.
3function countQuadruples(firstString: string, secondString: string): number {
4    // An array to keep track of the last occurrence indices for each letter in the second string.
5    // A value of 0 indicates the character has not been found yet.
6    let lastOccurrence: number[] = new Array(26).fill(0);
7
8    // Find the last occurrence index of each character in the second string.
9    for (let i = 0; i < secondString.length; i++) {
10        // Convert the character to an index from 0 to 25 (for 'a' to 'z').
11        let charIndex = secondString.charCodeAt(i) - 'a'.charCodeAt(0);
12        // Store the last occurrence index of the current character + 1 (to differentiate between found and not found).
13        lastOccurrence[charIndex] = i + 1;
14    }
15
16    let count = 0; // Used to keep track of the number of quadruples found.
17    let minDifference = Number.MAX_SAFE_INTEGER; // Initialize with the largest safe integer value as a starting point.
18
19    // Iterate through each character in the first string to find quadruples.
20    for (let i = 0; i < firstString.length; i++) {
21        // Get the index of the last occurrence of the current character from the second string.
22        let charIndex = firstString.charCodeAt(i) - 'a'.charCodeAt(0);
23        let currentIndex = lastOccurrence[charIndex];
24
25        // Check if the current character is present in the second string.
26        if (currentIndex) {
27            // Calculate the difference between indices (i - j).
28            let difference = i - (currentIndex - 1);
29
30            // Update 'minDifference' and reset 'count' if a new minimum difference is found.
31            if (minDifference > difference) {
32                minDifference = difference;
33                count = 1;
34            }
35            // If the same minimum difference is found, increment the 'count'.
36            else if (minDifference === difference) {
37                count++;
38            }
39        }
40    }
41
42    // Return the final count of quadruples found.
43    return count;
44}
45

Time and Space Complexity

The provided Python code defined within the Solution class is designed to find the number of times a minimal difference between indices of matching characters from firstString and secondString occurs. The time complexity and space complexity of this code are as follows:

Time Complexity

The time complexity of this code can be split into two major components:

Building the last dictionary, where the last occurrence of each character from secondString is stored with the character as the key and its index as the value. In the worst case, enumerate(secondString) will iterate through all characters of secondString, which takes O(n), where n refers to the length of secondString.
Iterating over firstString to calculate the differences and counting the minimal differences. This is another loop that goes through all the characters in firstString, which, in the worst case, results in O(m), where m refers to the length of firstString.

Since these operations are sequential, the overall time complexity is the sum of the two individual complexities: O(n) + O(m). Since these two strings are independent, simplifying the expression does not combine the terms, and the final time complexity is O(m + n).

Space Complexity

The space complexity is determined by the additional space required by the algorithm which is not part of the input or the output. In this case, it's:

The space used by the last dictionary which, in the worst case, contains an entry for every unique character in secondString, and since there is a fixed limit to the character set (in the case of ASCII, a maximum of 128 characters), the space taken by last could be considered O(1).
The two variables ans and mi occupy constant space O(1).

Therefore, the space complexity is the larger of the space used by the last dictionary and the space for the variables ans and mi, which results in O(1) space complexity.

Overall, the provided code has a time complexity of O(m + n) and a space complexity of O(1).

Learn more about how to find time and space complexity quickly using problem constraints.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Question 1 out of 10

Which algorithm should you use to find a node that is close to the root of the tree?

Depth First Search

Breadth First Search

I don’t know