Q41👥Cohort & Retention★★★☆☆Window/CTESafe to Run

Classic Acquisition Cohort

Dialect:bigquerypostgresqlsnowflakemysql
#cohort#acquisition#retention#classic

Kapan Pakai Query Ini

Ini adalah query dasar cohort analysis yang wajib kamu kuasai. Pakai query ini ketika kamu mau tahu: dari semua user yang signup di bulan X, berapa persen yang balik lagi di bulan ke-1, ke-2, dan seterusnya? Cocok untuk laporan retention bulanan, evaluasi health produk, atau benchmarking antar cohort.

Yang Perlu Diganti

  • your_dataset.users → nama tabel user (berisi user_id, created_at)
  • your_dataset.events → nama tabel aktivitas (berisi user_id, event_date)
  • event_date → kolom tanggal aktivitas kamu
  • created_at → kolom tanggal signup user

Query

WITH user_cohorts AS (
  -- Step 1: tentukan cohort bulan signup setiap user
  SELECT
    user_id,
    DATE_TRUNC(created_at, MONTH) AS cohort_month
  FROM your_dataset.users
),

user_activities AS (
  -- Step 2: join aktivitas dengan cohort user
  SELECT
    uc.user_id,
    uc.cohort_month,
    DATE_DIFF(DATE_TRUNC(e.event_date, MONTH), uc.cohort_month, MONTH) AS period_number
  FROM user_cohorts uc
  JOIN your_dataset.events e
    ON uc.user_id = e.user_id
  WHERE e.event_date >= uc.cohort_month
),

cohort_sizes AS (
  -- Step 3: hitung ukuran setiap cohort (total user saat signup)
  SELECT
    cohort_month,
    COUNT(DISTINCT user_id) AS cohort_size
  FROM user_cohorts
  GROUP BY cohort_month
),

cohort_activity AS (
  -- Step 4: hitung user aktif per cohort per periode
  SELECT
    cohort_month,
    period_number,
    COUNT(DISTINCT user_id) AS active_users
  FROM user_activities
  GROUP BY cohort_month, period_number
)

-- Step 5: gabungkan dan hitung retention rate
SELECT
  ca.cohort_month,
  cs.cohort_size,
  ca.period_number,
  ca.active_users,
  ROUND(ca.active_users / cs.cohort_size * 100, 1) AS retention_rate
FROM cohort_activity ca
JOIN cohort_sizes cs
  ON ca.cohort_month = cs.cohort_month
WHERE ca.period_number <= 12
ORDER BY ca.cohort_month, ca.period_number;

Contoh Output

cohort_monthcohort_sizeperiod_numberactive_usersretention_rate
2024-01-013,84003,840100.0
2024-01-013,84011,49739.0
2024-01-013,840299826.0
2024-02-014,12004,120100.0
2024-02-014,12011,68941.0

Variasi

  • Ganti unit ke WEEK di DATE_TRUNC dan DATE_DIFF untuk cohort mingguan. Berguna kalau produk kamu punya weekly engagement cycle
  • Filter WHERE period_number = 0 untuk hanya lihat ukuran cohort tanpa retention
  • Tambah WHERE cohort_month >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 MONTH) untuk batasi ke 12 bulan terakhir

Variasi Dialect

PostgreSQL:

-- DATE_TRUNC pakai string, DATE_DIFF tidak ada — pakai subtraction
DATE_TRUNC('month', created_at) AS cohort_month,
(DATE_TRUNC('month', e.event_date)::date - DATE_TRUNC('month', uc.created_at)::date) / 30 AS period_number
-- Catatan: ini approx. Lebih akurat: EXTRACT(YEAR FROM AGE(...))*12 + EXTRACT(MONTH FROM AGE(...))

Snowflake:

DATE_TRUNC('MONTH', created_at) AS cohort_month,
DATEDIFF('month', uc.cohort_month, DATE_TRUNC('MONTH', e.event_date)) AS period_number

MySQL:

DATE_FORMAT(created_at, '%Y-%m-01') AS cohort_month,
TIMESTAMPDIFF(MONTH, uc.cohort_month, DATE_FORMAT(e.event_date, '%Y-%m-01')) AS period_number
Classic Acquisition Cohort — Analyst SQL Vault | NgulikSQL