Contents

SQL Subqueries

SQL subqueries are queries that are embedded within another query. They are used to retrieve data that will be used in the main query, either as a filter condition, a column value, or a table source. Subqueries can be used in multiple contexts like SELECT, INSERT, DELETE etc

In this post, we will use the Sakila sample database provided by MySQL

Common uses cases

Here are some of the most common use cases of subqueries in a SELECT context:

Filtering results:

To find all the customers who have placed orders in the month of July 2005, you could use a subquery in the WHERE clause like this:

1
2
3
4
5
6
7
SELECT *
FROM customer
WHERE customer_id IN (
  SELECT customer_id
  FROM rental
  WHERE rental_date BETWEEN '2005-07-01 00:00:00' AND '2005-08-01 00:00:00'
);

This query retrieves all the customer records where the customer ID is included in the result of the subquery, which finds all the rental records from the month of July 2005.

Calculating aggregate values:

To find the average order value for customers who have placed at least five orders, you could use a subquery to calculate the number of orders per customer, and then filter the results based on that count, like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT customer_id, AVG(amount) AS avg_order_value
FROM payment
WHERE customer_id IN (
  SELECT customer_id
  FROM (
    SELECT customer_id, COUNT(*) AS order_count
    FROM rental
    GROUP BY customer_id
  ) AS subquery
  WHERE order_count >= 5
)
GROUP BY customer_id;

This query first uses a subquery to count the number of orders per customer, and then filters the results to include only customers with at least five orders. It then uses the resulting list of customer IDs as a filter condition in the outer query, which calculates the average payment amount for each customer.

Joining tables:

To find all the orders placed by customers who live in a specific region, you could use a subquery as a source for a join operation, like this:

1
2
3
4
5
6
7
8
SELECT order_id, customer_id, rental_date
FROM rental
JOIN (
  SELECT customer_id
  FROM address
  WHERE postal_code LIKE '7%'
) AS subquery
ON rental.customer_id = subquery.customer_id;

This query first uses a subquery to find all the customer IDs associated with addresses that have a postal code starting with ‘7’. It then uses that list of customer IDs as a source for a join operation with the rental table, to retrieve all the rental records for those customers.

Simplifying complex queries:

To find the rental records for the top 10 customers by total rental amount, you could use a subquery to calculate the total rental amount per customer, and then sort the results based on that amount, like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT *
FROM rental
WHERE customer_id IN (
  SELECT customer_id
  FROM (
    SELECT customer_id, SUM(amount) AS total_amount
    FROM payment
    GROUP BY customer_id
    ORDER BY total_amount DESC
    LIMIT 10
  ) AS subquery
);

This query first uses a subquery to calculate the total payment amount per customer, sorts the results based on that amount, and selects only the top 10 customers. It then uses the resulting list of customer IDs as a filter condition in the outer query, which retrieves all the rental records for those customers.

Types of Subqueries

Subqueries in SQL can be broadly classified into three types based on their result set: scalar subqueries, column subqueries, and table subqueries.

Scalar Subquery

A scalar subquery returns a single value as its result. It can be used anywhere a single value is expected, such as in the SELECT clause or WHERE clause.

Example: Suppose we want to find the name of the customer who made the highest payment. This can be done using a scalar subquery as follows:

1
2
3
4
5
6
7
8
SELECT first_name, last_name
FROM customer
WHERE customer_id = (
  SELECT customer_id
  FROM payment
  ORDER BY amount DESC
  LIMIT 1
);

Here, the subquery returns the customer ID with the highest payment amount, which is then used to retrieve the corresponding customer name.

Column Subquery

A column subquery returns a single column of multiple rows as its result. It can be used in the SELECT or WHERE clause to retrieve a list of values for comparison.

The first example in common uses cases(filtering results) is an example of a column subquery where the subquery gives a column of customer IDs

Table Subquery

A table subquery returns a table or set of rows as its result. It can be used in the FROM clause of a query like a regular table.

Example: The example in the common use cases(simplifying complex queries) is an example of table subquery where the inner most subquery gives 10 rows of customer_id and total_amount column values

Correlated subquery - Subqueries that depend on the outer query

In most of the examples we have seen till now, all the subqueries get executed as a standalone query and return a scalar/column/table as a result. But what if my inner subquery refers to data of outer subquery? This is what we have seen in joins example.

Here’s is one more example

1
2
3
4
5
6
7
SELECT customer_id, rental_date
FROM rental AS outer_query
WHERE rental_date = (
  SELECT MAX(rental_date)
  FROM rental AS inner_query
  WHERE inner_query.customer_id = outer_query.customer_id
);

In this example, the subquery is correlated because it refers to the customer_id column from the outer query. The subquery finds the maximum rental date for each customer, and the outer query returns the customer_id and rental_date for each rental with the maximum rental date.

When to use joins or subqueries or other methods?

The decision to use subqueries versus joins or other methods in SQL depends on the specific requirements of the query and the underlying data model. Here are some general guidelines to consider:

  • Use subqueries when the result set of the subquery is required to filter or modify the result set of the main query. For example, when calculating aggregate values or filtering rows based on a condition that depends on data from another table
  • Use joins when combining data from multiple tables is required, particularly when the join involves large tables or tables with complex relationships. Joins can be more efficient than subqueries in many cases
  • Use other methods, such as common table expressions (CTEs) or window functions, when more complex logic is required, such as recursive queries or ranking functions. These methods can provide more flexibility and efficiency in certain situations

Coming up in my next post, I will take a closer look at window functions - when to utilize them and essential considerations to bear in mind while doing so