A clear explanation of finding the fraction of players retained the day after their first login using self-join and window functions.
Problem Restatement
We have an Activity(player_id, device_id, event_date, games_played) table.
For each first login date, compute the fraction of players who logged in again the very next day.
Return the result rounded to 2 decimal places.
Edge Cases
- Match the aggregate to the question: use raw
COUNTorSUMfor rows/amounts, andCOUNT(DISTINCT ...)only for unique entities. - Tie cases should return every qualifying row unless the statement asks for a single row.
- For outer joins, keep filters on the joined table in the
ONclause when unmatched rows must remain visible.
Common Pitfalls
- Do not put aggregate filters in
WHERE; useHAVINGafterGROUP BY. - When the answer needs all tied winners, avoid
LIMIT 1unless the statement explicitly asks for one row.
Implementation
WITH first_logins AS (
SELECT player_id, MIN(event_date) AS first_date
FROM Activity
GROUP BY player_id
),
retained AS (
SELECT f.first_date AS install_dt,
COUNT(DISTINCT a.player_id) AS retained_count
FROM first_logins f
LEFT JOIN Activity a
ON f.player_id = a.player_id
AND a.event_date = DATE_ADD(f.first_date, INTERVAL 1 DAY)
GROUP BY f.first_date
),
total AS (
SELECT first_date AS install_dt,
COUNT(*) AS total_count
FROM first_logins
GROUP BY first_date
)
SELECT t.install_dt,
t.total_count AS installs,
ROUND(COALESCE(r.retained_count, 0) / t.total_count, 2) AS Day1_retention
FROM total t
LEFT JOIN retained r ON t.install_dt = r.install_dt;Code Explanation
first_logins finds the first login date for each player.
retained counts players who logged in on first_date + 1 day (using a LEFT JOIN to count only those who came back).
total counts new installs per date.
The final query divides retained by total installs per date.