SQL Joins
As we try to make sense of the data stored in a database, we may need to look at multiple tables that hold different pieces of information. For example, we may have one table with customer information, another with product information, and yet another with order details. In order to see the bigger picture and understand how everything fits together, we use SQL joins to combine the data from these tables. This allows us to get a more complete view of the data and draw valuable insights from it.
In SQL, a join combines rows from two or more tables based on a related column between them. There are several types of joins in SQL, including Inner Join, Outer Join, Cross Join, Self-Join, and Subqueries.
In this post, we will use the Sakila sample database provided by MySQL to demonstrate each type of join and provide SQL samples for each one. For each join, we will show sample data of about 3 to 5 rows before the join and after the join data samples.
Inner Join
An inner join returns only the rows that have matching values in both tables being joined. Here’s an example:
|
|
Before the join:
customer_id | first_name | last_name |
---|---|---|
1 | MARY | SMITH |
2 | PATRICIA | JOHNSON |
3 | LINDA | WILLIAMS |
rental_id | rental_date | customer_id | return_date |
---|---|---|---|
1 | 2005-05-24 22:54:33 | 1 | 2005-05-26 22:04:30 |
2 | 2005-05-24 23:03:39 | 1 | 2005-05-28 19:40:33 |
3 | 2005-05-25 00:00:15 | 2 | 2005-06-01 22:12:39 |
After the join:
customer_id | first_name | last_name | rental_id | rental_date | return_date |
---|---|---|---|---|---|
1 | MARY | SMITH | 1 | 2005-05-24 22:54:33 | 2005-05-26 22:04:30 |
1 | MARY | SMITH | 2 | 2005-05-24 23:03:39 | 2005-05-28 19:40:33 |
2 | PATRICIA | JOHNSON | 3 | 2005-05-25 00:00:15 | 2005-06-01 22:12:39 |
Outer Join
An outer join returns all the rows from one table and the matching rows from the other table. If there is no match, the result will have NULL values for the columns from the other table. Here’s an example:
|
|
Before the join:
customer_id | first_name | last_name |
---|---|---|
1 | MARY | SMITH |
2 | PATRICIA | JOHNSON |
3 | LINDA | WILLIAMS |
rental_id | rental_date | customer_id | return_date |
---|---|---|---|
1 | 2005-05-24 22:54:33 | 1 | 2005-05-26 22:04:30 |
2 | 2005-05-24 23:03:39 | 1 | 2005-05-28 19:40:33 |
3 | 2005-05-25 00:00:15 | 2 | 2005-06-01 22:12:39 |
After the join:
customer_id | first_name | last_name | rental_id | rental_date | return_date |
---|---|---|---|---|---|
1 | MARY | SMITH | 1 | 2005-05-24 22:54:33 | 2005-05-26 22:04:30 |
1 | MARY | SMITH | 2 | 2005-05-24 23:03:39 | 2005-05-28 19:40:33 |
2 | PATRICIA | JOHNSON | 3 | 2005-05-25 00:00:15 | 2005-06-01 22:12:39 |
3 | LINDA | WILLIAMS | NULL | NULL | NULL |
In this example, we use a LEFT OUTER JOIN to return all the rows from the customer table and the matching rows from the rental table based on the customer_id
column. The result includes all the customers, even if they did not rent any movies. If a customer did not rent any movies, the rental_id
, rental_date
, and return_date
columns will have NULL values.
We can also use a RIGHT OUTER JOIN to return all the rows from the rental table and the matching rows from the customer table. A FULL OUTER JOIN will return all the rows from both tables, with NULL
values for the columns that do not have a matching row in the other table.
Cross Join
A cross join returns the Cartesian product of the two tables, which means it returns all possible combinations of rows between the two tables. Here’s an example:
|
|
Before the join:
actor_id | first_name | last_name |
---|---|---|
1 | PENELOPE | GUINESS |
2 | NICK | WAHLBERG |
3 | ED | CHASE |
film_id | title | release_year | length | rating |
---|---|---|---|---|
1 | ACADEMY DINOSAUR | 2006 | 86 | PG |
2 | ACE GOLDFINGER | 2006 | 48 | G |
3 | ADAPTATION HOLES | 2006 | 50 | NC-17 |
After the join:
actor_id | first_name | last_name | film_id | title | release_year | length | rating |
---|---|---|---|---|---|---|---|
1 | PENELOPE | GUINESS | 1 | ACADEMY DINOSAUR | 2006 | 86 | PG |
1 | PENELOPE | GUINESS | 2 | ACE GOLDFINGER | 2006 | 48 | G |
1 | PENELOPE | GUINESS | 3 | ADAPTATION HOLES | 2006 | 50 | NC-17 |
2 | NICK | WAHLBERG | 1 | ACADEMY DINOSAUR | 2006 | 86 | PG |
2 | NICK | WAHLBERG | 2 | ACE GOLDFINGER | 2006 | 48 | G |
2 | NICK | WAHLBERG | 3 | ADAPTATION HOLES | 2006 | 50 | NC-17 |
3 | ED | CHASE | 1 | ACADEMY DINOSAUR | 2006 | 86 | PG |
3 | ED | CHASE | 2 | ACE GOLDFINGER | 2006 | 48 | G |
3 | ED | CHASE | 3 | ADAPTATION HOLES | 2006 | 50 | NC-17 |
In this example, we use a CROSS JOIN to return all possible combinations of rows between the actor and film tables. The result includes all actors and all films, resulting in 3 x 3 = 9 rows.
Cross joins can be useful in certain scenarios, but they can also generate a large number of rows and cause performance issues. Therefore, it’s important to use them carefully and selectively.
Self Join
A self join is when a table is joined with itself. It’s useful when we want to compare two rows within the same table. Here’s an example:
|
|
Before the join:
actor_id | first_name | last_name |
---|---|---|
1 | PENELOPE | GUINESS |
2 | NICK | WAHLBERG |
3 | ED | CHASE |
After the join:
actor_id | first_name | last_name | first_name | last_name |
---|---|---|---|---|
1 | PENELOPE | GUINESS | NICK | WAHLBERG |
1 | PENELOPE | GUINESS | ED | CHASE |
2 | NICK | WAHLBERG | PENELOPE | GUINESS |
2 | NICK | WAHLBERG | ED | CHASE |
3 | ED | CHASE | PENELOPE | GUINESS |
3 | ED | CHASE | NICK | WAHLBERG |
In this example, we use a self join on the actor table to return all pairs of actors that are not the same person. The result includes 3 x 2 = 6 rows.
Self joins can be useful for analysing hierarchical data or comparing rows within the same table. However, like cross joins, they can also generate a large number of rows and cause performance issues, so they should be used with caution.
In next post, I’ll discuss in details about sub queries in SQL, different types of subqueries and when to use sub queries