SQL Subqueries
SQL subqueries are queries that are embedded within another query. They are used to retrieve data that will be used in the main query, either as a filter condition, a column value, or a table source. Subqueries can be used in multiple contexts like SELECT, INSERT, DELETE etc
In this post, we will use the Sakila sample database provided by MySQL
Common uses cases
Here are some of the most common use cases of subqueries in a SELECT context:
Filtering results:
To find all the customers who have placed orders in the month of July 2005, you could use a subquery in the WHERE clause like this:
|
|
This query retrieves all the customer records where the customer ID is included in the result of the subquery, which finds all the rental records from the month of July 2005.
Calculating aggregate values:
To find the average order value for customers who have placed at least five orders, you could use a subquery to calculate the number of orders per customer, and then filter the results based on that count, like this:
|
|
This query first uses a subquery to count the number of orders per customer, and then filters the results to include only customers with at least five orders. It then uses the resulting list of customer IDs as a filter condition in the outer query, which calculates the average payment amount for each customer.
Joining tables:
To find all the orders placed by customers who live in a specific region, you could use a subquery as a source for a join operation, like this:
|
|
This query first uses a subquery to find all the customer IDs associated with addresses that have a postal code starting with ‘7’. It then uses that list of customer IDs as a source for a join operation with the rental table, to retrieve all the rental records for those customers.
Simplifying complex queries:
To find the rental records for the top 10 customers by total rental amount, you could use a subquery to calculate the total rental amount per customer, and then sort the results based on that amount, like this:
|
|
This query first uses a subquery to calculate the total payment amount per customer, sorts the results based on that amount, and selects only the top 10 customers. It then uses the resulting list of customer IDs as a filter condition in the outer query, which retrieves all the rental records for those customers.
Types of Subqueries
Subqueries in SQL can be broadly classified into three types based on their result set: scalar subqueries, column subqueries, and table subqueries.
Scalar Subquery
A scalar subquery returns a single value as its result. It can be used anywhere a single value is expected, such as in the SELECT clause or WHERE clause.
Example: Suppose we want to find the name of the customer who made the highest payment. This can be done using a scalar subquery as follows:
|
|
Here, the subquery returns the customer ID with the highest payment amount, which is then used to retrieve the corresponding customer name.
Column Subquery
A column subquery returns a single column of multiple rows as its result. It can be used in the SELECT or WHERE clause to retrieve a list of values for comparison.
The first example in common uses cases(filtering results) is an example of a column subquery where the subquery gives a column of customer IDs
Table Subquery
A table subquery returns a table or set of rows as its result. It can be used in the FROM clause of a query like a regular table.
Example: The example in the common use cases(simplifying complex queries) is an example of table subquery where the inner most subquery gives 10 rows of customer_id
and total_amount
column values
Correlated subquery - Subqueries that depend on the outer query
In most of the examples we have seen till now, all the subqueries get executed as a standalone query and return a scalar/column/table as a result. But what if my inner subquery refers to data of outer subquery? This is what we have seen in joins example.
Here’s is one more example
|
|
In this example, the subquery is correlated because it refers to the customer_id
column from the outer query. The subquery finds the maximum rental date for each customer, and the outer query returns the customer_id
and rental_date
for each rental with the maximum rental date.
When to use joins or subqueries or other methods?
The decision to use subqueries versus joins or other methods in SQL depends on the specific requirements of the query and the underlying data model. Here are some general guidelines to consider:
- Use subqueries when the result set of the subquery is required to filter or modify the result set of the main query. For example, when calculating aggregate values or filtering rows based on a condition that depends on data from another table
- Use joins when combining data from multiple tables is required, particularly when the join involves large tables or tables with complex relationships. Joins can be more efficient than subqueries in many cases
- Use other methods, such as common table expressions (CTEs) or window functions, when more complex logic is required, such as recursive queries or ranking functions. These methods can provide more flexibility and efficiency in certain situations
Coming up in my next post, I will take a closer look at window functions - when to utilize them and essential considerations to bear in mind while doing so