1348. Tweet Counts Per Frequency

MediumDesignHash TableBinary SearchOrdered SetSorting
Leetcode Link

Problem Description

The problem requires us to design a class TweetCounts that can track the number of tweets posted on a social media platform and allow querying the number of tweets that occur in specified periods of time. These periods are partitioned into smaller time chunks that are either every minute, every hour, or every day.

The main operations that we need to support are:

  • recordTweet(tweetName, time): This method records a tweet at a given time for a particular tweet name. The assumed time unit is seconds.

  • getTweetCountsPerFrequency(freq, tweetName, startTime, endTime): This method retrieves the number of tweets for a given tweetName within a specified time period [startTime, endTime]. The result will be a list where each element corresponds to the number of tweets in a time chunk of the size specified by freq. The freq parameter can be "minute", "hour", or "day", which corresponds to time chunks of 60 seconds, 3600 seconds, and 86400 seconds, respectively.

The problem also specifies that if the last chunk of time does not fit perfectly into the frequency's chunk size, it should end with the end time of the period, and it may be shorter.

The goal is to implement the functionality of this class such that the queries can be answered efficiently.

Intuition

For the recordTweet method, we require an efficient means to store and access timestamps of tweets given their names. A defaultdict of SortedList from the sortedcontainers module is utilized for this purpose because it allows us to maintain a sorted list of tweet timestamps for each tweet name, which is optimal for the range queries we need to perform later.

The getTweetCountsPerFrequency method is where the core of our task lies. Since we have the timestamps sorted, we can quickly find the number of tweets in a specific time range using binary search. The SortedList class has a bisect_left method which can find the index of the first timestamp greater than or equal to a given start time of the chunk quickly, and similarly for the end time of the chunk. By using these indices, we easily compute the number of tweets within this time chunk.

For each time chunk in the period [startTime, endTime], determined by the frequency freq, we find the number of tweets in that chunk by doing a binary search for the start and end of the chunk. The difference between the indices gives us the count of tweets within that chunk. This step is continued until we cover the entire period from startTime to endTime.

As chunks are usually fixed except for the last one, we must handle the case where the last chunk may not align perfectly with the time frequency. This is accomplished by taking the minimum of the start time plus the frequency and endTime + 1 for the binary search.

This approach leverages the fact that binary search on a sorted list takes O(log n) time, making the overall process efficient for the range queries we need to perform.

Learn more about Binary Search and Sorting patterns.

Solution Approach

The implementation uses two key data structures: a dictionary (defaultdict from collections) paired with a sorted list (SortedList from sortedcontainers). Here's how each component plays a role in the solution:

Data Structures Used

  • defaultdict(SortedList): The defaultdict is initialized with SortedList, which means that every key in this dictionary will have a SortedList as its value by default. This is a sophisticated choice because it allows us to automatically handle the insertion of timestamps for each tweetName without having to check if the key already exists. A SortedList maintains the elements in sorted order, enabling binary search operations out of the box.

  • self.d: This is a simple dictionary that translates the frequency from its string representation ("minute", "hour", "day") into the number of seconds (60, 3600, 86400), easing the calculations for our chunks.

Algorithms and Patterns

recordTweet Method

  • Insertion in Sorted Order: The recordTweet method simply inserts the timestamp into the appropriate SortedList for the given tweetName. The SortedList takes care of maintaining the order upon insertion. The insertion time is logarithmic relative to the number of elements already in the list because the underlying structure of SortedList allows for efficient insertion.

getTweetCountsPerFrequency Method

  • Binary Search: To count tweets in each time chunk, binary search is utilized. We use self.data[tweetName].bisect_left(t) to find the left boundary (inclusive) of the current time chunk and self.data[tweetName].bisect_left(min(t + f, endTime + 1)) to find the right boundary (exclusive) of the chunk. The bisect_left function of SortedList is a binary search that returns the index where the given element (time in this case) would be inserted in the list. Since the list is in sorted order, the indices between these two boundaries contain all timestamps within our time chunk.

  • Iteration over Time Chunks: The method uses a while loop to traverse the entire time range [startTime, endTime] in increments of the frequency's chunk size f. During each iteration, it calculates the counts using binary search as described and appends it to the answer list ans.

  • Handling End of Time Range: For the binary search of the right boundary, we must ensure that we don't go beyond endTime. Hence min(t + f, endTime + 1) is used to get the exclusive end of the time chunk. The +1 ensures that if a tweet occurred precisely at endTime, it will be included in the last chunk.

The final output is the list of tweet counts per frequency chunk which is obtained by executing the above algorithm for each frequency chunk in our specified time range.

This approach is efficient provided that the amount of tweets for a given name does not grow too large, as it relies on binary search which has a logarithmic time complexity. Since the requirements of the problem guarantee that there will be at most 10^4 calls in total to both recordTweet and getTweetCountsPerFrequency, the solution is well suited for the problem's constraints.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's illustrate the solution approach by walking through a small example with a hypothetical TweetCounts object.

  1. Initialization

    We first create an instance of the TweetCounts class:

    tweet_counts = TweetCounts()

    At this point, our data structure is empty, waiting for tweets to be recorded.

  2. Recording Tweets

    Let's record some tweets:

    tweet_counts.recordTweet("tweet1", 15)    # A tweet named "tweet1" at time=15 seconds
    tweet_counts.recordTweet("tweet1", 60)    # Another "tweet1" at time=60 seconds
    tweet_counts.recordTweet("tweet1", 300)   # Another "tweet1" at time=300 seconds (5 minutes)

    The defaultdict(SortedList) data structure in our TweetCounts object now looks like this:

    {
      "tweet1": SortedList([15, 60, 300])
    }

    Each time a tweet is recorded, it's added to the SortedList corresponding to its name. The list always remains sorted.

  3. Querying Tweet Counts

    Now, let's say we want to get the count of "tweet1" every minute from time=0 to time=10 minutes (600 seconds):

    counts = tweet_counts.getTweetCountsPerFrequency("minute", "tweet1", 0, 600)

    To process this, the program will:

    • Convert "minute" into its seconds representation using self.d. Here, 1 minute equals 60 seconds.
    • It starts at startTime=0 and will run while startTime <= endTime.

    Processing the first chunk (0-60 seconds):

    • Perform a binary search to find the number of tweets between time=0 and time=60.
    • The bisect_left method finds the index for time=0 and time=60 in the SortedList, yielding 0 and 2 because timestamps 15 and 60 fall within this range. Thus, there are 2 - 0 = 2 tweets in this time chunk.

    Processing the second chunk (60-120 seconds):

    • Since there's no new tweet until time=300, the binary searches for these middle chunks will reveal 0 tweets each.

    Processing subsequent chunks until (300-360 seconds):

    • When processing the chunk with time=300, the binary search will find indices for 300 and 360, yielding indices 2 and 3 and telling us there's 1 tweet within this time chunk.

    Continuing this process for each minute up to 10 minutes will give us our final counts list.

    The final result is:

    counts = [2, 0, 0, 0, 0, 1, 0, 0, 0, 0] # Counts of tweets for every minute chunk up to 10 minutes
  4. Handling the End Time Range

    If the endTime were at 250 seconds and we still query per minute, we would only need the counts up until the time chunk that includes 250 seconds:

    counts = tweet_counts.getTweetCountsPerFrequency("minute", "tweet1", 0, 250)

    This would give us a shorter list, stopping at the chunks that include 250 seconds:

    counts = [2, 0, 0, 0, 1] # Counts of tweets for every minute chunk until 4 minutes and 10 seconds

    The count for the last chunk (240-250 seconds) is again found by binary searching the indices for 240 and 250 in the SortedList.

By following the prescribed steps, the TweetCounts class effectively tracks and counts tweets over various time frequencies. The implementation assures that queries regarding tweet counts over different time chunks are answered efficiently.

Solution Implementation

1from sortedcontainers import SortedList
2from collections import defaultdict
3from typing import List
4
5class TweetCounts:
6    def __init__(self):
7        # A dictionary to map the frequency names to their respective seconds
8        self.frequency_to_seconds = {"minute": 60, "hour": 3600, "day": 86400}
9        # Defaultdict to store the sorted list of tweet timestamps for each tweet name
10        self.tweets_data = defaultdict(SortedList)
11
12    def record_tweet(self, tweet_name: str, time: int) -> None:
13        """
14        Record a tweet at a given time.
15
16        :param tweet_name: Name of the tweet
17        :param time: The timestamp at which the tweet was posted
18        """
19        # Add the time to the sorted list for the specified tweet name
20        self.tweets_data[tweet_name].add(time)
21
22    def get_tweet_counts_per_frequency(
23        self, freq: str, tweet_name: str, start_time: int, end_time: int
24    ) -> List[int]:
25        """
26        Retrieve the count of tweets in each frequency interval.
27
28        :param freq: Frequency interval ('minute', 'hour', 'day')
29        :param tweet_name: The name of the tweet for which to retrieve counts
30        :param start_time: Start timestamp for the range
31        :param end_time: End timestamp for the range
32        :return: A list containing the number of tweets in each interval
33        """
34        # Convert frequency to seconds using the pre-defined dictionary
35        frequency_in_seconds = self.frequency_to_seconds[freq]
36        # Retrieve the list of tweet timestamps for the given tweet name
37        tweets = self.tweets_data[tweet_name]
38        current_time = start_time
39        counts = []
40
41        # Loop over each interval
42        while current_time <= end_time:
43            # Find the starting index for this interval
44            left_index = tweets.bisect_left(current_time)
45            # Find the ending index for this interval, ensuring it does not exceed `end_time`
46            right_index = tweets.bisect_left(min(current_time + frequency_in_seconds, end_time + 1))
47            # Append the count of tweets in the current interval to the results list
48            counts.append(right_index - left_index)
49            # Move to the next interval
50            current_time += frequency_in_seconds
51
52        return counts
53
54# Demonstrate how the TweetCounts class could be instantiated and used
55# obj = TweetCounts()
56# obj.record_tweet(tweet_name, time)
57# tweet_counts = obj.get_tweet_counts_per_frequency(freq, tweet_name, start_time, end_time)
58
1import java.util.ArrayList;
2import java.util.HashMap;
3import java.util.List;
4import java.util.Map;
5import java.util.TreeMap;
6
7/**
8 * The TweetCounts class tracks the number of tweets that occur and supports
9 * querying the number of tweets in a specified time range and frequency.
10 */
11class TweetCounts {
12    // Map storing tweet names as keys and another TreeMap as values,
13    // which maps a timestamp to the number of tweets at that time.
14    private Map<String, TreeMap<Integer, Integer>> tweetData = new HashMap<>();
15
16    /**
17     * Constructor for TweetCounts.
18     */
19    public TweetCounts() {
20        // No initialization needed since tweetData is already initialized.
21    }
22
23    /**
24     * Record a tweet at a given time.
25     *
26     * @param tweetName the name of the tweet
27     * @param time      the timestamp the tweet occurred
28     */
29    public void recordTweet(String tweetName, int time) {
30        tweetData.putIfAbsent(tweetName, new TreeMap<>());
31        TreeMap<Integer, Integer> timeMap = tweetData.get(tweetName);
32        timeMap.put(time, timeMap.getOrDefault(time, 0) + 1);
33    }
34
35    /**
36     * Retrieves the count of tweets per frequency in a time range.
37     *
38     * @param frequency  the frequency ("minute", "hour", or "day")
39     * @param tweetName  the name of the tweet
40     * @param startTime  the start of the time range (inclusive)
41     * @param endTime    the end of the time range (inclusive)
42     * @return a list of tweet counts per frequency
43     */
44    public List<Integer> getTweetCountsPerFrequency(
45            String frequency, String tweetName, int startTime, int endTime) {
46        // Convert frequency to seconds
47        int intervalDurationInSeconds = 60; // Default to minute
48        if ("hour".equals(frequency)) {
49            intervalDurationInSeconds = 3600; // 60 minutes * 60 seconds
50        } else if ("day".equals(frequency)) {
51            intervalDurationInSeconds = 86400; // 24 hours * 60 minutes * 60 seconds
52        }
53
54        TreeMap<Integer, Integer> timeMap = tweetData.get(tweetName);
55        List<Integer> counts = new ArrayList<>(); // Holds the final counts
56      
57        // Iterate through time ranges and count tweets
58        for (int i = startTime; i <= endTime; i += intervalDurationInSeconds) {
59            int sum = 0;
60            // Calculate end time for this interval, ensuring we don't exceed endTime
61            int intervalEnd = Math.min(i + intervalDurationInSeconds, endTime + 1);
62            // Sum tweet counts within the current interval
63            for (int count : timeMap.subMap(i, intervalEnd).values()) {
64                sum += count;
65            }
66            counts.add(sum);
67        }
68        return counts;
69    }
70}
71
72/**
73 * The following code showcases how the TweetCounts class may be used.
74 * It is not part of the TweetCounts class itself.
75 */
76/*
77TweetCounts tweetCounter = new TweetCounts();
78tweetCounter.recordTweet("tweet1", 10);
79List<Integer> counts = tweetCounter.getTweetCountsPerFrequency("minute", "tweet1", 0, 59);
80*/
81
1#include <vector>
2#include <string>
3#include <unordered_map>
4#include <set>
5using namespace std;
6
7// Define the TweetCounts class to store and fetch tweet counts per frequency
8class TweetCounts {
9public:
10    // Constructor
11    TweetCounts() {
12        // No initialization needed for the unordered_map here
13    }
14
15    // Record the time at which the tweet with given name occurs
16    void recordTweet(const string& tweetName, int time) {
17        tweets[tweetName].insert(time);
18    }
19
20    // Fetch the tweet count per specified frequency for a tweet with given name in a time range
21    vector<int> getTweetCountsPerFrequency(const string& freq, const string& tweetName, int startTime, int endTime) {
22        // Frequency in seconds: Defaults to a minute
23        int freqInSeconds = 60;
24        if (freq == "hour")
25            freqInSeconds = 3600; // An hour in seconds
26        else if (freq == "day")
27            freqInSeconds = 86400; // A day in seconds
28
29        // Calculate the number of intervals in the specified time range
30        int intervalCount = (endTime - startTime) / freqInSeconds + 1;
31
32        // Prepare the answer vector with the interval count, pre-filled with zeroes
33        vector<int> ans(intervalCount, 0);
34
35        // Fetch all tweet times for the specified tweetName and within the given time range
36        auto itLow = tweets[tweetName].lower_bound(startTime); // First not less than startTime
37        auto itUp = tweets[tweetName].upper_bound(endTime); // First greater than endTime
38
39        // Count the tweets in each interval
40        for (auto it = itLow; it != itUp; ++it) {
41            // Determine the interval and increment the corresponding counter
42            int intervalIndex = (*it - startTime) / freqInSeconds;
43            ans[intervalIndex]++;
44        }
45
46        return ans;
47    }
48
49private:
50    // Data structure to hold tweet times for each tweet name
51    unordered_map<string, multiset<int>> tweets;
52};
53
54/**
55 * Your TweetCounts object will be instantiated and called as such:
56 * TweetCounts* obj = new TweetCounts();
57 * obj->recordTweet(tweetName, time);
58 * vector<int> param_2 = obj->getTweetCountsPerFrequency(freq, tweetName, startTime, endTime);
59 */
60
1type TweetName = string;
2type Time = number;
3type Frequency = "minute" | "hour" | "day";
4
5// Store tweet times for each tweet name
6let tweets = new Map<TweetName, Set<Time>>();
7
8// Record the time at which the tweet with given name occurs
9function recordTweet(tweetName: TweetName, time: Time): void {
10    if (!tweets.has(tweetName)) {
11        tweets.set(tweetName, new Set<Time>());
12    }
13    tweets.get(tweetName)!.add(time);
14}
15
16// Fetch the tweet count per specified frequency for a tweet with given name in a time range
17function getTweetCountsPerFrequency(freq: Frequency, tweetName: TweetName, startTime: Time, endTime: Time): number[] {
18    // Convert frequency to seconds
19    const freqInSeconds = convertFrequencyToSeconds(freq);
20
21    // Calculate the number of intervals
22    const intervalCount = Math.floor((endTime - startTime) / freqInSeconds) + 1;
23
24    // Prepare the answer array with the interval count, pre-filled with zeroes
25    const ans: number[] = new Array(intervalCount).fill(0);
26
27    // Handle the case where there are no tweets recorded with the given name
28    if (!tweets.has(tweetName)) {
29        return ans;
30    }
31
32    // Fetch all tweet times for the specified tweetName and within the given time range
33    const allTweetTimes = Array.from(tweets.get(tweetName)!);
34    const relevantTweetTimes = allTweetTimes.filter(time => time >= startTime && time <= endTime);
35
36    // Count the tweets in each interval
37    relevantTweetTimes.forEach(time => {
38        const intervalIndex = Math.floor((time - startTime) / freqInSeconds);
39        ans[intervalIndex]++;
40    });
41
42    return ans;
43}
44
45// Helper function to convert frequency string to seconds
46function convertFrequencyToSeconds(freq: Frequency): number {
47    switch (freq) {
48        case "minute": return 60;
49        case "hour": return 3600;
50        case "day": return 86400;
51        default: return 60;
52    }
53}
54
55// Example of usage:
56// recordTweet("tweet1", 10);
57// recordTweet("tweet1", 20);
58// getTweetCountsPerFrequency("minute", "tweet1", 0, 59); // Should return [2]
59

Time and Space Complexity

Time Complexity:

__init__ Method:

  • Time Complexity is O(1) for initializing the dictionary and other variables.

recordTweet Method:

  • Inserting an element into a SortedList costs O(log n) time, where n is the number of elements currently in the list for the respective tweetName. This is due to the binary search used to find the correct position and the insertion operation which on average takes O(log n) time in a balanced binary search tree.

getTweetCountsPerFrequency Method:

  • The time spent in this method is dominated by the while loop and the bisect operations.
  • The while loop runs once for each interval of size f = self.d[freq] in the range from startTime to endTime. There are (endTime - startTime) / f + 1 such intervals.
  • Within the loop, two bisect operations (actually bisect_left) are performed. Each bisect_left takes O(log m) time, where m is the number of recorded times for the tweetName.
  • Hence, the total time complexity for the getTweetCountsPerFrequency method is O((endTime - startTime)/f * log m).

Space Complexity:

__init__ Method:

  • Space Complexity is O(1) for initializing the dictionary and other variables that do not depend on the input size.

recordTweet Method:

  • The space complexity is dominated by the space used to store the tweet times in the SortedList. In the worst case, if all n recordTweet operations are for the same tweetName, the space complexity would be O(n).

getTweetCountsPerFrequency Method:

  • The space complexity is O(k), where k is the number of intervals. This is because we need to store a count for each interval.

Combining the space complexities from the different parts of the class, the overall space complexity of the TweetCounts class is O(n + k), where n is the total number of tweet times recorded, and k is the number of intervals calculated in a single call to getTweetCountsPerFrequency.

Learn more about how to find time and space complexity quickly using problem constraints.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

How does merge sort divide the problem into subproblems?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!


Load More