Lesson 7: Subqueries in SQL
Learn how to construct nested queries.
Estimated Read Time: 2 - 3 Hours
Learning Goals
In this lesson, you will learn to:
- Construct subqueries in SELECT, FROM, and WHERE clauses to create calculated columns, filter datasets, and build temporary result sets
- Differentiate between correlated and non-correlated subqueries and apply each appropriately
- Apply the IN, EXISTS, and NOT IN operators to efficiently filter data using subquery results
- Build derived tables to isolate complex logic and create reusable result sets
1. Introduction: The Power of Nested Logic
Over the past six lessons, you have built a formidable toolkit of SQL capabilities. You can extract and filter data, aggregate metrics, and join multiple tables to create comprehensive analytical views. These skills enable you to answer the vast majority of straightforward business questions that analysts encounter daily.
However, there exists a class of business questions that cannot be answered through simple, linear SQL queries—questions that require iterative logic, dynamic filtering based on calculated results, or comparisons against aggregated subsets of data. Consider these scenarios that a data analyst at Northwind Traders might face:
- “Which products have unit prices higher than the average price in their category?”
- “Who are our top 5 customers by revenue, and what percentage of total company revenue do they represent?”
- “Which employees have processed more orders than the average employee in their region?”
- “Show me all orders that contain products from our most profitable category.”
Each of these questions contains an implicit two-step logic:
- Calculate something (an average, a ranking, a filtered subset)
- Use that calculated result to filter or enhance the main query
This is where subqueries become essential. A subquery—also called a nested query or inner query—is a SELECT statement embedded within another SQL statement (the outer query or main query). The subquery executes first, produces a temporary result set, and then feeds that result into the outer query.
Subqueries enable you to:
- Create calculated columns based on aggregated data from other tables
- Filter records dynamically using computed values rather than static criteria
- Build complex comparison logic that would be impossible with WHERE clauses alone
- Decompose intricate business logic into readable, step-by-step operations
By the end of this lesson, you will be able to construct queries that solve business problems requiring multiple levels of logic, positioning you to tackle analyst responsibilities that demand advanced SQL proficiency.
2. Subquery Fundamentals: Structure and Execution
Before exploring different types of subqueries and their applications, we must establish a clear mental model of how subqueries work.
2.1. Anatomy of a Subquery
A subquery follows this fundamental structure:
SELECT column1, column2
FROM table1
WHERE column3 OPERATOR (
SELECT column4
FROM table2
WHERE condition
);
Key components:
- Parentheses: Subqueries must always be enclosed in parentheses. This distinguishes them from the main query and signals to the database engine that the enclosed SELECT statement should be executed independently.
- Independence: In most cases, a subquery can be executed as a standalone query. You can test and verify subquery logic by running it separately before embedding it in the outer query.
- Execution Order: The database engine evaluates the subquery first (from innermost to outermost if there are nested levels), then uses its results in the outer query.
- Return Value Types: Subqueries can return:
- A single value (scalar subquery): Used with comparison operators (=, >, <, etc.)
- A single column of values (column subquery): Used with IN, EXISTS
- A complete result set (table subquery): Used in FROM clauses to create derived tables
2.2. Correlated vs. Non-Correlated Subqueries
Subqueries fall into two fundamental categories based on their relationship with the outer query:
Non-Correlated Subqueries (Independent):
A non-correlated subquery can execute completely independently of the outer query. It has no reference to columns from the outer query’s tables.
-- Example: Non-correlated subquery
SELECT product_name, unit_price
FROM products
WHERE unit_price > (
SELECT AVG(unit_price)
FROM products
);
The inner query SELECT AVG(unit_price) FROM products executes once, calculates the average, and returns a single value. The outer query then uses that value for comparison.
Execution characteristics:
- Executes once before the outer query runs
- More efficient for large datasets
- Preferred when possible
Correlated Subqueries (Dependent):
A correlated subquery references columns from the outer query, creating a dependency relationship. It must re-execute for each row processed by the outer query.
-- Example: Correlated subquery
SELECT p.product_name, p.unit_price, c.category_name
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
WHERE p.unit_price > (
SELECT AVG(p2.unit_price)
FROM products p2
WHERE p2.category_id = p.category_id -- References outer query
);
The inner query references p.category_id from the outer query’s products table. This means the subquery must execute repeatedly—once for each product—calculating the average price for that specific product’s category each time.
Execution characteristics:
- Executes repeatedly (once per row in outer query)
- More resource-intensive
- Necessary for row-by-row comparative logic
Performance Note: Correlated subqueries can be slow on large datasets because they execute repeatedly. For tables with millions of rows, consider joins or window functions (covered in later Lessons) as alternatives.
3. Subqueries in the WHERE Clause: Dynamic Filtering
The WHERE clause is the most common location for subqueries, enabling dynamic filtering based on calculated criteria. This approach is invaluable when filter conditions depend on aggregated values or complex logic.
3.1. Scalar Subqueries with Comparison Operators
A scalar subquery returns exactly one value (one row, one column). This value can be used with standard comparison operators: =, >, <, >=, <=, !=.
Business Case: Products Priced Above Average
The product management team wants to identify products priced above the company-wide average to understand our premium product portfolio.
-- Find products priced above the overall average
-- Business purpose: Identify premium products for strategic positioning
SELECT
product_id,
product_name,
unit_price,
units_in_stock,
discontinued
FROM products
WHERE unit_price > (
SELECT AVG(unit_price)
FROM products
)
ORDER BY unit_price DESC;
Query Analysis:
- Inner Query Execution: SELECT AVG(unit_price) FROM products calculates the average unit price across all products (returns 28.87).
- Outer Query Execution: The database then executes WHERE unit_price > 28.87, filtering products to show only those exceeding this average.
- Business Value: This immediately identifies the 43 products that are positioned as premium offerings, helping pricing strategy and inventory management decisions.
Important Constraint: If the subquery returns multiple rows when you use =, >, < operators, the query will fail. The database expects exactly one value for comparison.
3.2. Using IN with Column Subqueries
When your subquery returns multiple values (a column of results), you must use the IN operator instead of = .
Business Case: Products from Top Revenue Categories
Finance wants to focus marketing efforts on products from the top 3 revenue-generating categories.
-- Identify products in top 3 revenue-generating categories
-- Business purpose: Target marketing budget on highest-performing categories
SELECT
p.product_name,
c.category_name,
p.unit_price,
p.units_on_order
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
WHERE p.category_id IN (
SELECT cat.category_id
FROM categories cat
INNER JOIN products prod ON cat.category_id = prod.category_id
INNER JOIN order_details od ON prod.product_id = od.product_id
GROUP BY cat.category_id
ORDER BY SUM(od.unit_price * od.quantity * (1 - od.discount)) DESC
LIMIT 3
)
ORDER BY c.category_name, p.product_name;
Query Breakdown:
Subquery Logic (innermost):
- Joins categories → products → order_details
- Calculates total revenue per category
- Sorts categories by revenue descending
- Returns the category_id values for the top 3 categories (e.g., categories 1, 4, and 8)
Outer Query Logic:
- Selects product information
- Filters to only products WHERE category_id IN (1, 4, 8)
- The IN operator allows matching against multiple values
Business Application: Marketing can now focus their campaigns on products within the top-performing categories, optimizing advertising ROI.
3.3. Using NOT IN for Exclusion Logic
The inverse operator NOT IN identifies records that do not match any value in the subquery results.
Business Case: Customers Who Have Never Ordered Certain Products
The marketing team wants to run a targeted campaign for customers who have never purchased products from the Condiments category, offering them a first-time discount.
-- Find customers who have never ordered condiments
-- Business purpose: Target non-customers of specific category for acquisition campaign
SELECT
customer_id,
company_name,
contact_name,
city,
country
FROM customers
WHERE customer_id NOT IN (
SELECT DISTINCT o.customer_id
FROM orders o
INNER JOIN order_details od ON o.order_id = od.order_id
INNER JOIN products p ON od.product_id = p.product_id
INNER JOIN categories c ON p.category_id = c.category_id
WHERE c.category_name = 'Condiments'
AND o.customer_id IS NOT NULL -- Important: handle NULLs
)
ORDER BY country, company_name;
Subquery Purpose: Returns a list of all customer IDs who have purchased condiments at least once.
Outer Query Purpose: Shows all customers NOT in that list—customers who are untapped for this product category.
Critical NULL Handling: NOT IN with NULL values can produce unexpected results. If the subquery returns any NULL values, the entire NOT IN condition evaluates to NULL (not TRUE or FALSE), and no rows will be returned. Always filter out NULLs in NOT IN subqueries using IS NOT NULL, or use NOT EXISTS instead (covered in section 3.5).
3.4. Correlated Subqueries in WHERE Clause
Correlated subqueries enable row-by-row comparison logic that would be impossible with non-correlated subqueries.
Business Case: Products Priced Above Their Category Average
Product managers want to identify products that are premium-positioned within their specific category (not company-wide), to evaluate category-level pricing strategies.
-- Find products priced above their own category's average
-- Business purpose: Identify category-specific premium products
SELECT
p.product_id,
p.product_name,
c.category_name,
p.unit_price,
(
SELECT ROUND(AVG(p2.unit_price)::numeric, 2)
FROM products p2
WHERE p2.category_id = p.category_id
) AS category_avg_price
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
WHERE p.unit_price > (
SELECT AVG(p2.unit_price)
FROM products p2
WHERE p2.category_id = p.category_id -- Correlated reference
)
ORDER BY c.category_name, p.unit_price DESC;
How it works: For each product, the subquery calculates that product’s category average price, then the WHERE clause compares the product price against it. This process repeats for every product row.
Performance Note: On large datasets, correlated subqueries can be slow since they execute repeatedly. We’ll learn more efficient alternatives in future lessons.
Business Insight: The results show products like “Côte de Blaye” (wine) priced at $263.50 against a Beverages category average of $37.98, indicating extreme premium positioning that may warrant review.
3.5. EXISTS and NOT EXISTS: Checking for Presence
The EXISTS operator tests whether a subquery returns any rows at all. It doesn’t care about the actual values—only whether results exist.
Business Case: Identify Customers with Recent Activity
The sales team wants to focus account management efforts on customers who have placed orders in the last 90 days, indicating active engagement.
-- Find customers with orders in the last 90 days
-- Business purpose: Prioritize relationship management for active accounts
SELECT
c.customer_id,
c.company_name,
c.contact_name,
c.city,
c.country
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= CURRENT_DATE - INTERVAL '90 days'
)
ORDER BY c.country, c.company_name;
How EXISTS Works:
- For each customer, the subquery checks: “Are there any orders for this customer in the last 90 days?”
- If yes (even one order exists), the customer is included in results
- If no, the customer is excluded
- Performance advantage: EXISTS stops searching as soon as it finds the first matching row—it doesn’t need to retrieve all matches
EXISTS vs. IN:
- EXISTS is generally faster for large datasets because it stops at the first match
- IN must retrieve all matching values and then perform comparisons
- EXISTS handles NULLs more gracefully than IN
NOT EXISTS for Gap Analysis:
Conversely, NOT EXISTS identifies records where no related records exist.
Business Case: Categories with No Discontinued Products
Quality assurance wants to identify product categories where we’ve never had to discontinue any products, indicating reliable supplier relationships.
-- Find categories with no discontinued products
-- Business purpose: Identify high-reliability product categories
SELECT
c.category_id,
c.category_name,
c.description
FROM categories c
WHERE NOT EXISTS (
SELECT 1
FROM products p
WHERE p.category_id = c.category_id
AND p.discontinued = 1
)
ORDER BY c.category_name;
This returns categories where the entire product lineup is currently active—no discontinuations ever occurred.
4. Subqueries in the SELECT Clause: Calculated Columns
Subqueries in the SELECT clause create calculated columns based on related data. Each subquery must return a single scalar value and executes once per row in the result set.
Business Case: Product Comparison to Category Average
Product managers want a report showing each product alongside its category’s average price for quick visual comparison.
-- Show products with their category average price for comparison
-- Business purpose: Enable quick pricing strategy evaluation
SELECT
p.product_id,
p.product_name,
c.category_name,
p.unit_price AS product_price,
(
SELECT ROUND(AVG(p2.unit_price)::numeric, 2)
FROM products p2
WHERE p2.category_id = p.category_id
) AS category_avg_price,
ROUND((p.unit_price / (
SELECT AVG(p2.unit_price)
FROM products p2
WHERE p2.category_id = p.category_id
) * 100)::numeric, 1) AS percent_of_category_avg
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
WHERE p.discontinued = 0
ORDER BY c.category_name, p.unit_price DESC;
Query Breakdown:
- First subquery (in SELECT): Calculates the average price for each product’s category
- Second subquery (in calculation): Used to compute the percentage comparison
- Execution: Both subqueries run once for each product row
Performance Warning: Notice that we’ve calculated the category average twice in this query. Each calculation requires a separate subquery execution. For better performance on larger datasets, use CTEs (Common Table Expressions, covered in Lesson 8) to calculate once and reference multiple times.
Business Insight: The results immediately reveal products priced at 150-250% of their category average, highlighting aggressive premium positioning that may require justification or repricing.
5. Subqueries in the FROM Clause: Derived Tables
When you place a subquery in the FROM clause, you create a derived table (also called an inline view)—a temporary result set that the outer query treats as if it were an actual table.
Requirements for derived tables:
- Must be enclosed in parentheses
- Must have an alias (required in PostgreSQL and most databases)
- Can include any valid SELECT statement with its own JOINs, filters, and aggregations
Business Case: Top Customers Revenue Analysis
The executive team wants to analyze the characteristics of the top 10 customers by revenue—where they’re located, which employees manage them, and their average order value.
-- Analyze top revenue customers in detail
-- Business purpose: Understand high-value customer profiles
SELECT
top_customers.customer_id,
top_customers.company_name,
top_customers.total_revenue,
top_customers.order_count,
ROUND((top_customers.total_revenue / top_customers.order_count)::numeric, 2) AS avg_order_value,
c.city,
c.country,
c.contact_name,
COUNT(DISTINCT e.employee_id) AS employees_serving
FROM (
-- Subquery: Calculate revenue and order metrics for all customers
SELECT
c.customer_id,
c.company_name,
COUNT(DISTINCT o.order_id) AS order_count,
SUM(od.unit_price * od.quantity * (1 - od.discount)) AS total_revenue
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN order_details od ON o.order_id = od.order_id
GROUP BY c.customer_id, c.company_name
ORDER BY total_revenue DESC
LIMIT 10
) AS top_customers -- Derived table alias is required
INNER JOIN customers c ON top_customers.customer_id = c.customer_id
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN employees e ON o.employee_id = e.employee_id
GROUP BY
top_customers.customer_id,
top_customers.company_name,
top_customers.total_revenue,
top_customers.order_count,
c.city,
c.country,
c.contact_name
ORDER BY top_customers.total_revenue DESC;
Execution Flow:
- Inner Subquery executes first:
- Calculates total revenue and order count for each customer
- Ranks customers by revenue descending
- Returns only the top 10
- This creates a temporary table called
top_customers
- Outer Query then executes:
- Treats
top_customersas a regular table - Joins it back to the actual customer, order, and employee tables
- Enriches the top 10 with additional contextual information
- Treats
Why Use This Approach?
Without the subquery, calculating “top 10” after joining to employees would give incorrect results—the join would multiply rows, and LIMIT 10 would return 10 order-employee combinations, not 10 customers.
The subquery isolates the ranking logic, ensuring we genuinely get the top 10 customers before enriching with additional details.
Note: Derived tables can also filter on aggregated results. While HAVING within a single query often works too, derived tables provide more flexibility for complex transformations.
6. Practical Guidelines and Best Practices
6.1. When to Choose Subqueries
Use subqueries when:
- You need a calculated value for filtering
- “Products priced above the company average”
- “Customers who ordered more than the median order count”
- Logic is clearer with step-by-step decomposition
- “Find top 5 customers, then show their order details”
- Subqueries make the intent explicit
- You’re checking for existence/absence of relationships
- “Customers who have never ordered”
- EXISTS is purpose-built for this
- You need to filter on aggregated results
- “Categories where average product price > $50”
Avoid subqueries when:
- A simple JOIN accomplishes the same goal more efficiently
- The subquery is correlated and processes millions of rows
- You’re calculating the same thing multiple times (use CTEs instead – Lesson 8)
6.2. Formatting Best Practices
1. Use clear indentation to show hierarchy:
-- GOOD: Clear structure
SELECT product_name, unit_price
FROM products
WHERE category_id IN (
SELECT category_id
FROM categories
WHERE category_name LIKE '%Food%'
);
-- BAD: Difficult to read
SELECT product_name, unit_price FROM products WHERE category_id IN (SELECT category_id FROM categories WHERE category_name LIKE '%Food%');
2. Add comments explaining subquery purpose:
SELECT product_name, unit_price
FROM products
WHERE unit_price > (
-- Calculate company-wide average as pricing benchmark
SELECT AVG(unit_price)
FROM products
);
6.3. Debugging Subqueries
1. Test subqueries independently first:
Before embedding a subquery, run it standalone to verify it returns expected results:
-- Test subquery first
SELECT AVG(unit_price) FROM products;
-- Returns: 28.87
-- Then use in main query
SELECT product_name, unit_price
FROM products
WHERE unit_price > 28.87;
2. Verify return value type matches operator:
- Single value needed for =, >, < operators
- Multiple values needed for IN, EXISTS
- Check if subquery returns what you expect
Summary
This lesson has equipped you with the ability to construct multi-layered analytical queries using subqueries. You now understand:
- Subquery Fundamentals: How nested queries execute within outer queries, the distinction between correlated and non-correlated subqueries, and different return value types.
- WHERE Clause Subqueries: Using scalar subqueries with comparison operators, IN for multiple-value matching, NOT IN for exclusions (with NULL handling), correlated subqueries for row-by-row comparisons, and EXISTS for checking record presence.
- SELECT Clause Subqueries: Creating calculated columns based on aggregate values and related table data, with awareness of performance implications.
- FROM Clause Subqueries: Building derived tables that act as temporary result sets, enabling complex filtering and ranking patterns.
- Best Practices: When to use subqueries vs. alternatives, formatting for readability, and basic debugging approaches.
Subqueries are essential tools for intermediate SQL analysis. While powerful, they are not always the optimal solution. In the next lesson, you will learn about Common Table Expressions (CTEs), which often provide a more readable and maintainable alternative to complex subqueries, particularly for multi-step logic.
You are now capable of answering business questions that require iterative logic, dynamic filtering, and comparative analysis against aggregated benchmarks—analytical capabilities that distinguish intermediate SQL practitioners from beginners.
Suggested References
Exercise: SQL Subqueries
Using the NYC Yellow Taxi Trip Dataset
Exercise Set Overview
These exercises use the NYC Taxi and Limousine Commission’s Yellow Taxi trip data. You’ll be analyzing real-world transportation data to answer business questions that taxi fleet managers, city planners, and operations teams face daily.
Database Schema Reference:
- fact_yellow_trips_raw_2025 – Main trip records table
- dim_taxi_zones – Location information (pickup/dropoff zones)
- dim_rate_codes – Rate code descriptions
- dim_payment_types – Payment method descriptions
- dim_vendors – Vendor/provider information
- dim_date – Date dimension for time-based analysis
Important:
For each of the tasks below:
- write the appropriate query to extract information from the database
- Execute the query and capture results
- generate insights and answer the business question
Exercise 1: Scalar Subquery – Trips Above Average Fare
Business Question: “The operations team wants to analyze premium trips. Identify all trips with total fares exceeding the system-wide average fare. Show vendor name, pickup zone, dropoff zone, trip distance, and total fare.”
Understanding the characteristics of high-fare trips helps optimize pricing strategies and identify premium service opportunities.
- Use a scalar subquery to calculate the average total fare
- Filter trips exceeding this average
- Join to vendor and zone dimension tables
- Order by total fare descending
- Limit to 100 rows for readability
- vendor_name
- pickup_zone
- dropoff_zone
- trip_distance
- total_amount
- fare_above_average (calculate the difference)
Scalar subquery in WHERE clause, multi-table joins, calculated columns
Exercise 2: IN Operator – Trips from High-Volume Zones
Business Question: “Identify trips that originated from the top 5 busiest pickup zones (by trip count). Show the first 500 trips with pickup zone, dropoff zone, pickup time, fare, and tip.”
High-volume zones represent critical demand centers. Analyzing trips from these zones helps with fleet allocation and surge pricing strategies.
- Subquery to identify top 5 zones by pickup count
- Use IN operator to filter main query
- Join to dim_taxi_zones for zone names
- Filter to January 2025
- Order by pickup datetime
- pickup_datetime
- pickup_zone
- pickup_borough
- dropoff_zone
- dropoff_borough
- fare_amount
- tip_amount
- total_amount
Column subquery with IN, aggregation in subquery, multi-dimensional joins
Exercise 3: NOT IN – Zones with No Airport Pickups
Business Question: “The logistics team wants to identify taxi zones that have never been used for airport pickups (airport_fee > 0). List these zones with their borough to understand coverage gaps.”
Bonus Challenge: Why might this query be slow if the dataset is large? Suggest an alternative using NOT EXISTS.
Zones with no airport business might indicate either geographical barriers or missed marketing opportunities for airport service.
- Subquery to find all zones with airport pickups (airport_fee > 0)
- Use NOT IN to exclude these zones
- Show zone name and borough
- Order by borough, then zone name
- Zone
- Borough
- service_zone
NOT IN with subquery, exclusion logic, data gap analysis
Exercise 4: Correlated Subquery – Trips Above Zone Average
Business Question: “For each trip, determine if its fare exceeds the average fare for trips originating from the same pickup zone. Show trips where the fare is at least 50% higher than their zone’s average.”
Identifying trips with unusually high fares relative to their pickup zone can reveal premium service patterns, surge pricing effectiveness, or potential pricing anomalies.
- Correlated subquery calculating average fare per pickup zone
- Filter for trips exceeding zone average by 50%
- Show both the trip fare and zone average for comparison
- Limit to 200 rows
- Order by the fare difference descending
- pickup_zone
- dropoff_zone
- tpep_pickup_datetime
- total_amount
- zone_avg_fare (from subquery)
- percent_above_avg (calculated: (total_amount / zone_avg – 1) * 100)
Correlated subquery, per-group comparisons, calculated metrics
Exercise 5: EXISTS – Vendors with Recent Activity
Business Question: “Identify vendors that have processed trips in the last 7 days of January 2025. Show vendor name and their total trip count for the entire month.”
Understanding which vendors are actively operating helps operations teams coordinate with active partners and identify inactive vendors for follow-up.
- EXISTS subquery to check for trips in last 7 days
- Main query aggregates all January trips
- Join to vendor dimension
- Show vendor name and trip count
- vendor_name
- total_trips_january
- has_recent_activity (calculated: ‘Yes’ since these are filtered by EXISTS)
EXISTS operator, correlated subquery, aggregation in main query
Exercise 6: Subquery in SELECT – Trip Count per Zone
Business Question: “Create a report showing each taxi zone with the total number of trips that originated from that zone in January 2025. Include zones with zero trips (showing 0, not NULL).”
(Optional) Challenge: Write two versions: one with a subquery in SELECT, one using LEFT JOIN with GROUP BY. Compare which is more readable.
This analysis identifies demand patterns and coverage gaps across the city’s taxi zones.
- Subquery in SELECT clause to count trips per zone
- Use COALESCE to handle zones with no trips
- Show zone name, borough, and trip count
- Order by trip count descending
- Zone
- Borough
- trip_count (using subquery with COALESCE for zeros)
Scalar subquery in SELECT, NULL handling with COALESCE
Exercise 7: Subquery in FROM – High-Fare Trip Analysis
Business Question: “Analyze the top 50 highest-fare trips. Create a derived table of these trips, then enhance it with vendor information, payment type descriptions, and pickup/dropoff zone details.”
Premium trips represent significant revenue opportunities. Understanding their characteristics (vendors, payment methods, locations) informs business development strategies.
- Subquery in FROM creating a derived table of top 50 trips by total_amount
- Alias the derived table appropriately
- Join to all relevant dimension tables
- Calculate trip duration in minutes
- Order by total_amount descending
- vendor_name
- pickup_zone
- dropoff_zone
- tpep_pickup_datetime
- tpep_dropoff_datetime
- trip_duration_minutes (calculate: EXTRACT(EPOCH FROM (dropoff – pickup))/60)
- trip_distance
- payment_type_desc
- total_amount
Derived table in FROM, multiple joins, datetime calculations, table aliasing
Example Submission Format:
Exercise: Scalar Subquery – Trips Above Average Fare
Query:
SELECT
v.vendor_name,
pz.Zone AS pickup_zone,
...
[Your complete query here]
Results:
[Screenshot]
Business Insight:
The analysis reveals that 42% of trips exceed the average fare of $19.47, with premium trips concentrated in Manhattan and airport routes. The average fare for above-average trips is $34.23, indicating a significant premium segment that generates disproportionate revenue. Vendors should focus premium vehicle deployment on these high-value zones during peak hours.
Submission & Resubmission Guidelines
1. When submitting your exercise, use this naming format:
YourName_Submission_Lesson#.xxx
If you need to revise and resubmit, add a version suffix:
YourName_Submission_Lesson#_v2.xxxYourName_Submission_Lesson#_v3.xxx
2. Do not overwrite or change the original evaluation entries in the rubric.
Instead, enter your updated responses or corrections in a new “v2” (or later) column in the rubric for mentor review.
Evaluation Rubric
| Criteria | Meets Expectations | Needs Improvement | Incomplete |
|---|---|---|---|
| Query Correctness |
All 7 exercises execute without errors and return logically correct results. Appropriate subquery types used (scalar, column, correlated, derived table). Challenge exercises attempted show good effort. Key Success Indicators: ✅ Exercise 1 uses scalar subquery with single value return ✅ Exercise 2 uses IN operator correctly ✅ Exercise 3 handles NULLs in NOT IN (or uses NOT EXISTS) ✅ Exercise 4 uses correlated subquery correctly ✅ Exercise 5 uses EXISTS (not IN or joins) ✅ Exercise 7 properly aliases derived table |
5-6 exercises work correctly; 1-2 have minor logical errors. Subquery types mostly appropriate. | Fewer than 5 exercises execute successfully. Major logical errors or fundamental misunderstanding of subquery mechanics. |
| Subquery Technique Application | Correctly applies scalar subqueries, IN/NOT IN, EXISTS, correlated subqueries, and derived tables where specified. Shows understanding of when each technique is appropriate. | Understands most techniques but confuses 2-3 (e.g., using IN when scalar is needed, or incorrect correlation). | Cannot distinguish between subquery types or apply them correctly. |
| NULL Handling | Properly handles NULLs in NOT IN queries (uses IS NOT NULL or EXISTS). Uses COALESCE appropriately for zero values. | Basic understanding but may miss NULL handling in 1-2 queries. | No consideration of NULL handling. Queries fail due to NULL issues. |
| Business Insights | Provides meaningful 2-3 sentence insights for each exercise that interpret results in business context with specific metrics. | Insights present but shallow or generic. Missing for 2-3 exercises. | Most insights missing or show no understanding of business application. |
Student Submissions
Check out recently submitted work by other students to get an idea of what’s required for this Exercise:
Got Feedback?
Contact
Talk to us
Have questions or feedback about Lumen? We’d love to hear from you.