Contents

SQL Joins

As we try to make sense of the data stored in a database, we may need to look at multiple tables that hold different pieces of information. For example, we may have one table with customer information, another with product information, and yet another with order details. In order to see the bigger picture and understand how everything fits together, we use SQL joins to combine the data from these tables. This allows us to get a more complete view of the data and draw valuable insights from it.

In SQL, a join combines rows from two or more tables based on a related column between them. There are several types of joins in SQL, including Inner Join, Outer Join, Cross Join, Self-Join, and Subqueries.

In this post, we will use the Sakila sample database provided by MySQL to demonstrate each type of join and provide SQL samples for each one. For each join, we will show sample data of about 3 to 5 rows before the join and after the join data samples.

Inner Join

An inner join returns only the rows that have matching values in both tables being joined. Here’s an example:

1
2
3
SELECT *
FROM customer c
INNER JOIN rental r ON c.customer_id = r.customer_id

Before the join:

customer_id first_name last_name
1 MARY SMITH
2 PATRICIA JOHNSON
3 LINDA WILLIAMS
rental_id rental_date customer_id return_date
1 2005-05-24 22:54:33 1 2005-05-26 22:04:30
2 2005-05-24 23:03:39 1 2005-05-28 19:40:33
3 2005-05-25 00:00:15 2 2005-06-01 22:12:39

After the join:

customer_id first_name last_name rental_id rental_date return_date
1 MARY SMITH 1 2005-05-24 22:54:33 2005-05-26 22:04:30
1 MARY SMITH 2 2005-05-24 23:03:39 2005-05-28 19:40:33
2 PATRICIA JOHNSON 3 2005-05-25 00:00:15 2005-06-01 22:12:39

Outer Join

An outer join returns all the rows from one table and the matching rows from the other table. If there is no match, the result will have NULL values for the columns from the other table. Here’s an example:

1
2
3
SELECT *
FROM customer c
LEFT OUTER JOIN rental r ON c.customer_id = r.customer_id

Before the join:

customer_id first_name last_name
1 MARY SMITH
2 PATRICIA JOHNSON
3 LINDA WILLIAMS
rental_id rental_date customer_id return_date
1 2005-05-24 22:54:33 1 2005-05-26 22:04:30
2 2005-05-24 23:03:39 1 2005-05-28 19:40:33
3 2005-05-25 00:00:15 2 2005-06-01 22:12:39

After the join:

customer_id first_name last_name rental_id rental_date return_date
1 MARY SMITH 1 2005-05-24 22:54:33 2005-05-26 22:04:30
1 MARY SMITH 2 2005-05-24 23:03:39 2005-05-28 19:40:33
2 PATRICIA JOHNSON 3 2005-05-25 00:00:15 2005-06-01 22:12:39
3 LINDA WILLIAMS NULL NULL NULL

In this example, we use a LEFT OUTER JOIN to return all the rows from the customer table and the matching rows from the rental table based on the customer_id column. The result includes all the customers, even if they did not rent any movies. If a customer did not rent any movies, the rental_id, rental_date, and return_date columns will have NULL values.

We can also use a RIGHT OUTER JOIN to return all the rows from the rental table and the matching rows from the customer table. A FULL OUTER JOIN will return all the rows from both tables, with NULL values for the columns that do not have a matching row in the other table.

Cross Join

A cross join returns the Cartesian product of the two tables, which means it returns all possible combinations of rows between the two tables. Here’s an example:

1
2
3
SELECT *
FROM actor
CROSS JOIN film

Before the join:

actor_id first_name last_name
1 PENELOPE GUINESS
2 NICK WAHLBERG
3 ED CHASE
film_id title release_year length rating
1 ACADEMY DINOSAUR 2006 86 PG
2 ACE GOLDFINGER 2006 48 G
3 ADAPTATION HOLES 2006 50 NC-17

After the join:

actor_id first_name last_name film_id title release_year length rating
1 PENELOPE GUINESS 1 ACADEMY DINOSAUR 2006 86 PG
1 PENELOPE GUINESS 2 ACE GOLDFINGER 2006 48 G
1 PENELOPE GUINESS 3 ADAPTATION HOLES 2006 50 NC-17
2 NICK WAHLBERG 1 ACADEMY DINOSAUR 2006 86 PG
2 NICK WAHLBERG 2 ACE GOLDFINGER 2006 48 G
2 NICK WAHLBERG 3 ADAPTATION HOLES 2006 50 NC-17
3 ED CHASE 1 ACADEMY DINOSAUR 2006 86 PG
3 ED CHASE 2 ACE GOLDFINGER 2006 48 G
3 ED CHASE 3 ADAPTATION HOLES 2006 50 NC-17

In this example, we use a CROSS JOIN to return all possible combinations of rows between the actor and film tables. The result includes all actors and all films, resulting in 3 x 3 = 9 rows.

Cross joins can be useful in certain scenarios, but they can also generate a large number of rows and cause performance issues. Therefore, it’s important to use them carefully and selectively.

Self Join

A self join is when a table is joined with itself. It’s useful when we want to compare two rows within the same table. Here’s an example:

1
2
3
SELECT a1.actor_id, a1.first_name, a1.last_name, a2.first_name, a2.last_name
FROM actor a1
JOIN actor a2 ON a1.actor_id <> a2.actor_id

Before the join:

actor_id first_name last_name
1 PENELOPE GUINESS
2 NICK WAHLBERG
3 ED CHASE

After the join:

actor_id first_name last_name first_name last_name
1 PENELOPE GUINESS NICK WAHLBERG
1 PENELOPE GUINESS ED CHASE
2 NICK WAHLBERG PENELOPE GUINESS
2 NICK WAHLBERG ED CHASE
3 ED CHASE PENELOPE GUINESS
3 ED CHASE NICK WAHLBERG

In this example, we use a self join on the actor table to return all pairs of actors that are not the same person. The result includes 3 x 2 = 6 rows.

Self joins can be useful for analysing hierarchical data or comparing rows within the same table. However, like cross joins, they can also generate a large number of rows and cause performance issues, so they should be used with caution.

In next post, I’ll discuss in details about sub queries in SQL, different types of subqueries and when to use sub queries