SQL Aggregation Magic: GROUP BY, ROLLUP, GROUPING SETS, and CUBE

Ever wondered how to extract multi-level insights from your data in a single query? Let's explore powerful SQL aggregation techniques using a simple film database example.

The Setup: Two Tables

We're working with a classic film database containing two tables:

film_actor table: Links actors to films

actor_id: The actor's unique identifier
film_id: The film's unique identifier

film table: Contains film details

film_id: The film's unique identifier
rating: Film rating (G, PG, PG-13, R, NC-17)

These tables are joined on film_id to analyze which actors appeared in which films and their ratings.

Question 1: How many movies has each actor acted in?

Simple answer: Use a basic GROUP BY on actor_id

SELECT
    fa.actor_id,
    COUNT(*) AS film_count
FROM
    film_actor fa
INNER JOIN
    film f ON fa.film_id = f.film_id
GROUP BY
    fa.actor_id;

Result:

actor_id	film_count
1	19
2	25
3	22
...	...

This gives you one row per actor with their total film count

Question 2: How many movies of each rating has each actor acted in?

Answer: Group by both actor_id AND rating

SELECT
    fa.actor_id,
    f.rating,
    COUNT(*) AS film_count
FROM
    film_actor fa
INNER JOIN
    film f ON fa.film_id = f.film_id
GROUP BY
    fa.actor_id,
    f.rating
ORDER BY
    fa.actor_id,
    f.rating;

Result:

actor_id	rating	film_count
1	G	4
1	PG	6
1	PG-13	1
1	R	3
1	NC-17	5
2	G	7
2	PG	6
...	...	...

Now you see the breakdown per actor, per rating.

Question 3: Can I also get the total movies per actor across all ratings?

Answer: Yes! Use WITH ROLLUP

SELECT
    fa.actor_id,
    f.rating,
    COUNT(*) AS film_count
FROM
    film_actor fa
INNER JOIN
    film f ON fa.film_id = f.film_id
GROUP BY
    fa.actor_id,
    f.rating WITH ROLLUP
ORDER BY
    fa.actor_id,
    f.rating;

Result:

actor_id	rating	film_count
1	G	4
1	PG	6
1	PG-13	1
1	R	3
1	NC-17	5
1	NULL	19 ← Total for actor 1
2	G	7
2	PG	6
2	PG-13	2
2	R	2
2	NC-17	8
2	NULL	25 ← Total for actor 2v
...	...	...
NULL	NULL	5462 ← Grand total

ROLLUP adds subtotal rows where rating is NULL (total per actor) and a grand total row where both columns are NULL.

Key insight: ROLLUP follows the hierarchy defined by column order. It rolls up from right to left through your GROUP BY columns.

Question 4: What if I want totals per rating across ALL actors?

The problem: ROLLUP with GROUP BY actor_id, rating only gives you totals per actor, not per rating.

The solution: Use GROUPING SETS to specify exactly which groupings you want.

SELECT
    fa.actor_id,
    f.rating,
    COUNT(*) AS film_count
FROM
    film_actor fa
INNER JOIN
    film f ON fa.film_id = f.film_id
GROUP BY
    GROUPING SETS (
        (fa.actor_id, f.rating),  -- per actor, per rating
        (fa.actor_id),             -- total per actor
        (f.rating),                -- total per rating
        ()                         -- grand total
    )
ORDER BY
    fa.actor_id,
    f.rating;

Result:

actor_id	rating	film_count
1	G	4
1	PG	6
1	NULL	19 ← Total for actor 1
2	G	7
2	NULL	25 ← Total for actor 2
...	...	...
NULL	G	178 ← All G-rated films
NULL	PG	194 ← All PG-rated films
NULL	R	195 ← All R-rated films
NULL	NULL	5462 ← Grand total

GROUPING SETS gives you complete control—you get totals per actor, totals per rating, and the grand total, all in one query.

Bonus: What about CUBE?

Available in: SQL Server, PostgreSQL, Oracle, and other databases

CUBE is like GROUPING SETS on autopilot—it automatically generates all possible combinations of your grouped columns:

SELECT
    fa.actor_id,
    f.rating,
    COUNT(*) AS film_count
FROM
    film_actor fa
INNER JOIN
    film f ON fa.film_id = f.film_id
GROUP BY
    CUBE (fa.actor_id, f.rating)
ORDER BY
    fa.actor_id,
    f.rating;

This is equivalent to:

GROUP BY
    GROUPING SETS (
        (fa.actor_id, f.rating),
        (fa.actor_id),
        (f.rating),
        ()
    )

With 2 columns, CUBE generates 4 grouping combinations 2^2. With 3 columns, you get 8 combinations 2^3, and so on.

When to use CUBE: Perfect for OLAP and multi-dimensional reporting where you need every possible cross-tabulation.