Ep 3: the life of a corporate girl.sql 💾
Just passed the DataCamp SQL Associate Exam




Exam Overview
The certification has two parts:
Theory Exam: 35 multiple-choice questions covering SQL fundamentals and analytics.
Practical Exam: A hands-on project.
Let’s walk through each concept.
1. SQL Fundamentals
You should be comfortable with:
Selecting and filtering data using SELECT, WHERE, ORDER BY, and LIMIT.
Aliasing columns and tables with AS to make queries readable.
Using comparison and logical operators (=, >, <, IN, BETWEEN, LIKE, AND, OR, NOT).
2. Handling Missing Data (NULLs)
NULL represents missing or unknown values.
It doesn’t equal zero or an empty string.
Key Concepts
COALESCE() returns the first non-NULL value.
Aggregates like AVG, SUM, and COUNT(col) ignore NULLs.
Use COUNT(*) if you want to count all rows, including NULLs.
SELECT COALESCE(audience_score, 0) AS score_filled FROM Movies;
3. Aggregation and Grouping
Learn how to summarize data using GROUP BY and HAVING.
Common Functions
AVG(), SUM(), MIN(), MAX(), COUNT()
Remember
Every column in the SELECT must either be aggregated or included in the GROUP BY.
HAVING filters after aggregation, while WHERE filters before.
SELECT genre, AVG(audience_score) AS avg_score FROM Movies GROUP BY genre HAVING AVG(audience_score) > 75
4. Subqueries and Derived Tables
Subqueries are queries inside queries.
They can help you filter, calculate, or compare dynamically.
Types
Scalar subquery: returns one value.
Correlated subquery: references the outer query (can be slower).
Derived table: a subquery used as a temporary table with an alias.
SELECT * FROM Movies m WHERE audience_score > (SELECT AVG(audience_score) FROM Movies)
Learn when to use subqueries — and when a window function or join is more efficient.
5. Window Functions
Window (analytic) functions calculate metrics across sets of rows without collapsing them.
They’re essential for ranking, averages, and percentiles in analytics work.
Common Window Functions
AVG() OVER ()
RANK() OVER (ORDER BY ...)
NTILE(4) OVER (ORDER BY ...) for quartiles
ROW_NUMBER() OVER (PARTITION BY ...)
SELECT movie_name, genre, audience_score, RANK() OVER (PARTITION BY genre ORDER BY audience_score DESC) AS rank_in_genre FROM Movies
6. Data Types, Casting, and Precision
SQL engines often infer data types — but as an analyst, you need to control them.
You Should Know
How to convert data types using CAST() or vendor-specific syntax.
How integer division truncates results (use CAST to DECIMAL for accuracy).
How to use ROUND() for cleaner outputs.
SELECT ROUND(AVG(audience_score), 1) AS avg_score_1dp FROM Movies
7. Sorting and Limiting Results
You’ll frequently sort or trim your results.
SELECT * FROM Movies ORDER BY audience_score DESC FETCH FIRST 5 ROWS ONLY
Note: Syntax differs slightly by database — for example, LIMIT 5 in PostgreSQL/MySQL or TOP(5) in SQL Server.
8. Data Cleaning and Validation
Analysts often face messy data. You should know how to:
Trim whitespace with TRIM()
Standardize text with UPPER() or LOWER()
Fix invalid values with CASE expressions
SELECT TRIM(movie_name) AS clean_name, CASE WHEN audience_score BETWEEN 0 AND 100 THEN audience_score ELSE NULL END AS valid_score FROM Movies
9. Joins and Relationships
Understand how to combine data from multiple tables.
Key Join Types
INNER JOIN → rows with matches in both tables
LEFT JOIN → all from the left, matched from the right
RIGHT JOIN and FULL JOIN → less common but useful for completeness
SELECT m.movie_name, g.name AS genre_name FROM Movies m JOIN Genres g ON m.genre = g.name
10. Analytical Patterns to Know
While not explicitly tested, being fluent with common analysis queries helps in the project portion.
Top-N analysis: using ROW_NUMBER() or RANK()
Percentiles and quartiles: using NTILE()
Z-score normalization: comparing values to global averages
Conditional aggregation: using CASE WHEN inside SUM() or AVG()
SELECT genre, AVG(CASE WHEN audience_score >= 80 THEN 1 ELSE 0 END) AS pct_high_rated FROM Movies GROUP BY genre
11. The Practical Project Mindset
The project tests your ability to think through a real scenario — not just run queries.
You should be able to:
Clean and standardize data using CASE, TRIM, UPPER
Validate numerical ranges and missing values
Use GROUP BY and JOIN to summarize data
Identify trends, outliers, and performance patterns