Q01🔍Eksplorasi & Profiling Data★☆☆☆☆Copy-pasteSafe to Run
Row Count & Basic Stats per Tabel
Dialect:bigquerypostgresqlsnowflakemysql
#row-count#profiling#basic-stats#exploration
Kapan Pakai Query Ini
Query pertama yang kamu jalanin setiap kali sentuh tabel baru. Kasih gambaran awal: seberapa besar tabelnya, date range-nya sampai mana, dan ada nggak ID yang duplikat. Lima detik untuk tahu apakah data ini "masuk akal" sebelum kamu lanjut deeper.
Yang Perlu Diganti
your_schema.your_table→ nama tabel kamuid→ kolom primary key atau ID unik di tabelcreated_at→ kolom timestamp atau tanggal tabel dibuat / event terjadi
Query
SELECT COUNT(*) AS total_rows, COUNT(DISTINCT id) AS unique_ids, COUNT(*) - COUNT(DISTINCT id) AS duplicate_id_count, MIN(created_at) AS earliest_record, MAX(created_at) AS latest_record, DATE_DIFF(DATE(MAX(created_at)), DATE(MIN(created_at)), DAY) AS date_span_days FROM your_schema.your_table;
Contoh Output
| total_rows | unique_ids | duplicate_id_count | earliest_record | latest_record | date_span_days |
|---|---|---|---|---|---|
| 1,250,000 | 1,248,932 | 1,068 | 2023-01-01 00:00:00 | 2024-12-31 23:59:59 | 730 |
Variasi
- Hapus kolom
date_span_dayskalau tabel kamu tidak punya kolom tanggal - Ganti
COUNT(DISTINCT id)dengan kolom ID yang relevan. Kalau belum tahu mana ID-nya, cobaorder_id,user_id, atautransaction_id - Tambah
WHERE created_at >= CURRENT_DATE - 30untuk scope hanya 30 hari terakhir - Kalau tabel besar dan query lambat, tambah
TABLESAMPLE SYSTEM (10 PERCENT)setelah nama tabel untuk estimasi cepat
Variasi Dialect
PostgreSQL
SELECT COUNT(*) AS total_rows, COUNT(DISTINCT id) AS unique_ids, COUNT(*) - COUNT(DISTINCT id) AS duplicate_id_count, MIN(created_at) AS earliest_record, MAX(created_at) AS latest_record, (MAX(created_at)::DATE - MIN(created_at)::DATE) AS date_span_days FROM your_schema.your_table;
Snowflake
SELECT COUNT(*) AS total_rows, COUNT(DISTINCT id) AS unique_ids, COUNT(*) - COUNT(DISTINCT id) AS duplicate_id_count, MIN(created_at) AS earliest_record, MAX(created_at) AS latest_record, DATEDIFF('day', MIN(created_at), MAX(created_at)) AS date_span_days FROM your_schema.your_table;
MySQL
SELECT COUNT(*) AS total_rows, COUNT(DISTINCT id) AS unique_ids, COUNT(*) - COUNT(DISTINCT id) AS duplicate_id_count, MIN(created_at) AS earliest_record, MAX(created_at) AS latest_record, DATEDIFF(MAX(created_at), MIN(created_at)) AS date_span_days FROM your_schema.your_table;