Lesson 4: Filtering and Logical Operations
Learn basic SQL techniques to answer complex business questions
Estimated Read Time: 2 - 3 Hours
Learning Goals
In this lesson, you will learn to:
- Transform data using CASE statements and type conversions
- Handle missing data and duplicates in your analyses
- Combine multiple SQL techniques to answer complex business questions
In the previous lesson, you learned the basic SQL structure. Now, we’ll move towards filtering logic that will allow us to clean data and perform more complicated and meaningful operations.
1. Data Cleaning Operations
Real-world data is rarely perfect. Before analysis, you must identify and handle duplicates, inconsistencies, and missing values. SQL provides powerful tools for data cleaning.
1.1. DISTINCT: Eliminating Duplicates
The DISTINCT keyword removes duplicate rows from your result set, showing only unique values.
Basic Syntax:
SELECT DISTINCT column1, column2
FROM table_name;
Business Context: The marketing manager asks, “Which countries do we sell to? I need a unique list for our international strategy presentation.”
Without DISTINCT:
SELECT country
FROM customers;
This returns 91 rows (because we have 91 customers), with many countries appearing multiple times.
With DISTINCT:
SELECT DISTINCT country
FROM customers
ORDER BY country;
This returns only 21 rows—one for each unique country.
Multiple Column DISTINCT:
Business Context: “Show me which city-country combinations have customers.”
SELECT DISTINCT city, country
FROM customers
ORDER BY country, city;
How DISTINCT Works with Multiple Columns:
DISTINCT considers the combination of all selected columns. The result includes a row for each unique combination. For example:
- London, UK
- London, Canada
These are both included because while the city is the same, the country differs.
Performance Warning
DISTINCT requires SQL to compare every row with every other row, which can be slow on large datasets. Use it judiciously.
1.2. Logical Operators: AND, OR, NOT
Business questions rarely involve simple conditions. Usually, you need to combine multiple criteria. Logical operators let you build complex filtering conditions.
The AND Operator:
Use AND when all conditions must be true.
Business Context: “Find products that are expensive AND low in stock – these might need pricing adjustments.”
SELECT product_name, unit_price, units_in_stock
FROM products
WHERE unit_price > 50 AND units_in_stock < 20
ORDER BY units_in_stock;
The OR Operator:
Use OR when any condition can be true.
Business Context: “Show me customers from either Germany or France for our European expansion analysis.”
SELECT company_name, country, city
FROM customers
WHERE country = 'Germany' OR country = 'France'
ORDER BY country, company_name;
The NOT Operator:
Use NOT to exclude results matching a condition.
Business Context: “List all customers except those in the USA – I need to focus on international markets.”
SELECT company_name, country
FROM customers
WHERE NOT country = 'USA'
ORDER BY country;
Combining Multiple Logical Operators:
Business Context: “Find products that are either very expensive (over $50) or critically low in stock (under 10 units), but exclude discontinued products.”
SELECT product_name, unit_price, units_in_stock, discontinued
FROM products
WHERE (unit_price > 50 OR units_in_stock < 10)
AND discontinued = 0
ORDER BY units_in_stock;
Understanding Parentheses:
Parentheses control the order of evaluation, just like in mathematics:
- Without parentheses: A OR B AND C means A OR (B AND C)
- With parentheses: (A OR B) AND C means something different
Always use parentheses to make your intent explicit, even when not strictly necessary.
1.3. Pattern Matching with LIKE
Sometimes you need to search for partial text matches rather than exact values. The LIKE operator enables pattern matching using wildcards.
Wildcard Characters:
| Wildcard | Meaning | Example |
|---|---|---|
| % | Zero or more characters | ‘%son’ matches “Johnson”, “son”, “Anderson” |
| _ | Exactly one character | ‘_at’ matches “cat”, “bat”, “hat”, but not “that” |
Business Context Examples:
1. “Find all customers whose company name starts with ‘A’.”
SELECT company_name, country
FROM customers
WHERE company_name LIKE 'A%'
ORDER BY company_name;
2. “Find products with ‘Chef’ anywhere in the name.”
SELECT product_name, unit_price
FROM products
WHERE product_name LIKE '%Chef%';
3. “Find customers whose contact name ends with ‘son’.”
SELECT company_name, contact_name
FROM customers
WHERE contact_name LIKE '%son'
ORDER BY contact_name;
4. “Find products where the second character in the name is ‘h’.”
SELECT product_name
FROM products
WHERE product_name LIKE '_h%';
Case Sensitivity in Pattern Matching:
In PostgreSQL:
- LIKE is case-sensitive: ‘Chef’ != ‘chef’
- ILIKE is case-insensitive: ‘Chef’ = ‘chef’ = ‘CHEF’
Example:
-- Case-sensitive (might miss some results)
SELECT product_name
FROM products
WHERE product_name LIKE '%Chef%';
-- Case-insensitive (more inclusive)
SELECT product_name
FROM products
WHERE product_name ILIKE '%chef%';
1.4. Range Filtering with BETWEEN
When filtering numeric or date values within a range, BETWEEN provides cleaner syntax than using >= and <=.
Basic Syntax:
SELECT columns
FROM table_name
WHERE column_name BETWEEN lower_value AND upper_value;
Important: BETWEEN is inclusive—it includes both the lower and upper boundary values.
Business Context: “Show me moderately-priced products—between $10 and $30.”
Using BETWEEN:
SELECT product_name, unit_price
FROM products
WHERE unit_price BETWEEN 10 AND 30
ORDER BY unit_price;
Equivalent Query Without BETWEEN:
SELECT product_name, unit_price
FROM products
WHERE unit_price >= 10 AND unit_price <= 30
ORDER BY unit_price;
Both queries produce identical results, but BETWEEN is more readable.
BETWEEN with Dates:
Business Context: “Find all orders placed in January 1997.”
SELECT order_id, customer_id, order_date
FROM orders
WHERE order_date BETWEEN '1997-01-01' AND '1997-01-31'
ORDER BY order_date;
Critical Date Formatting
Always use the ISO format for dates in SQL: ‘YYYY-MM-DD’. This ensures consistency across different database systems and locales.
1.5. The IN Operator: Testing Multiple Values
When you need to filter for multiple specific values, the IN operator is much cleaner than chaining multiple OR conditions.
Basic Syntax:
SELECT columns
FROM table_name
WHERE column_name IN (value1, value2, value3);
Business Context: “Show me customers from our key European markets: Germany, France, Spain, and Italy.”
Using IN:
SELECT company_name, country, city
FROM customers
WHERE country IN ('Germany', 'France', 'Spain', 'Italy')
ORDER BY country, company_name;
Equivalent Query Without IN:
SELECT company_name, country, city
FROM customers
WHERE country = 'Germany'
OR country = 'France'
OR country = 'Spain'
OR country = 'Italy'
ORDER BY country, company_name;
The IN version is significantly more readable, especially with many values.
Using NOT IN:
Business Context: “Show me all customers EXCEPT those in our saturated markets.”
SELECT company_name, country
FROM customers
WHERE country NOT IN ('USA', 'Germany', 'UK')
ORDER BY country;
Pro Tip
IN is particularly powerful when combined with subqueries (which you’ll learn later). For example, you could use IN to filter for customers who have placed orders above a certain value, without knowing their IDs in advance.
1.6. Conditional Logic with CASE WHEN
The CASE statement is SQL’s equivalent of if-then-else logic found in programming languages. It allows you to transform values based on conditions, creating new categories or calculated fields in your result set.
Basic Syntax:
SELECT column1,
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN condition3 THEN result3
ELSE default_result
END AS new_column_name
FROM table_name;
Business Context: “Categorize products by price tier for our marketing materials: Budget, Standard, Premium, and Luxury.”
Query:
SELECT
product_name,
unit_price,
CASE
WHEN unit_price < 10 THEN 'Budget'
WHEN unit_price >= 10 AND unit_price < 30 THEN 'Standard'
WHEN unit_price >= 30 AND unit_price < 50 THEN 'Premium'
WHEN unit_price >= 50 THEN 'Luxury'
ELSE 'Uncategorized'
END AS price_category
FROM products
ORDER BY unit_price;
How CASE Works:
- SQL evaluates each WHEN condition in order from top to bottom
- When it finds the first TRUE condition, it returns that result
- It stops checking remaining conditions (short-circuit evaluation)
- If no conditions are TRUE, it returns the ELSE result
- If no ELSE is provided and no conditions match, it returns NULL
Multiple Column Conditions:
Business Context: “Flag products that need urgent attention—either discontinued OR low stock.”
SELECT
product_name,
units_in_stock,
discontinued,
CASE
WHEN discontinued = 1 THEN 'Discontinued - Archive'
WHEN units_in_stock = 0 THEN 'Out of Stock - Reorder Immediately'
WHEN units_in_stock < 10 THEN 'Low Stock - Monitor Closely'
WHEN units_in_stock < 30 THEN 'Adequate Stock'
ELSE 'Well Stocked'
END AS inventory_status
FROM products
ORDER BY
CASE
WHEN discontinued = 1 THEN 1
WHEN units_in_stock = 0 THEN 2
WHEN units_in_stock < 10 THEN 3
ELSE 4
END;
Notice: We can even use CASE in the ORDER BY clause to create custom sorting logic!
1.7. Handling NULL Values with COALESCE
In real-world databases, missing data is inevitable. NULL represents unknown or missing values, and it requires special handling because it behaves differently from other values.
Understanding NULL:
- NULL is NOT the same as zero, empty string, or false
- NULL means “unknown” or “not applicable”
- Comparisons with NULL using = or != don’t work as expected
- Use IS NULL or IS NOT NULL instead
The COALESCE Function:
COALESCE returns the first non-NULL value from a list of columns or values.
Syntax:
COALESCE(value1, value2, value3, ...)
Business Context: “Some customers don’t have a region specified. For reports, show ‘Not Specified’ instead of leaving it blank.”
Query:
SELECT
company_name,
COALESCE(region, 'Not Specified') AS region,
country
FROM customers
ORDER BY country, company_name;
Multiple Column Fallback:
Business Context: “Show customer contact info, preferring phone, but falling back to fax if phone is missing, and showing ‘No Contact’ if both are missing.”
SELECT
company_name,
COALESCE(phone, fax, 'No Contact Information') AS primary_contact
FROM customers;
Checking for NULL Values:
Business Context: “Find all customers where the region field is empty—these records need completion.”
SELECT company_name, country, region
FROM customers
WHERE region IS NULL
ORDER BY country;
Excluding NULL Values:
SELECT company_name, country, region
FROM customers
WHERE region IS NOT NULL
ORDER BY region;
Common NULL Mistake
-- WRONG - This will return no results
WHERE region = NULL
-- CORRECT
WHERE region IS NULL
1.8. Data Type Conversion with CAST
Databases store data in specific types (integer, text, date, etc.). Sometimes you need to convert data from one type to another for calculations, comparisons, or formatting. The CAST function handles these conversions.
Basic Syntax:
CAST(column_name AS target_data_type)
Common Data Types in PostgreSQL:
| Type | Description | Example |
|---|---|---|
| INTEGER | Whole numbers | 42, -17, 0 |
| NUMERIC(p,s) | Decimal numbers | 19.99, 100.50 |
| VARCHAR(n) | Text up to n characters | ‘Northwind’ |
| TEXT | Text of any length | Long descriptions |
| DATE | Date values | ‘1997-01-15’ |
| BOOLEAN | True/False | TRUE, FALSE |
Business Context 1: “Convert prices to text for a formatted report.”
SELECT
product_name,
unit_price,
'$' || CAST(unit_price AS VARCHAR) AS price_display
FROM products
LIMIT 10;
The || operator concatenates (joins) strings together.
Business Context 2: “Extract just the year from order dates for annual analysis.”
SELECT
order_id,
order_date,
CAST(EXTRACT(YEAR FROM order_date) AS VARCHAR) AS order_year
FROM orders
LIMIT 20;
Business Context 3: “Calculate the total value but ensure decimal precision.”
SELECT
product_name,
unit_price,
quantity_per_unit,
CAST(unit_price * 1.2 AS NUMERIC(10,2)) AS price_with_markup
FROM products
WHERE unit_price IS NOT NULL;
The NUMERIC(10,2) means: up to 10 total digits, with exactly 2 decimal places.
Alternative Syntax – PostgreSQL Shorthand:
-- These are equivalent:
CAST(unit_price AS VARCHAR)
unit_price::VARCHAR
-- Choose based on your preference; both work identically
2. Real-World Business Scenarios
Let’s apply everything you’ve learned so far to solve a business problem:
Scenario: Inventory Management Alert
Business Question: “Flag products that are expensive (over $50) but have low stock (under 20 units), categorized by risk level.”
Solution: Follow the video to understand how to answer the business question using SQL query.
Video 1 – Answer Inventory Management question using SQL
Summary
You’ve completed a comprehensive journey through SQL fundamentals. Let’s review what you now command:
- Core SQL Operations: You can apply complex filtering with logical operators (AND, OR, NOT), pattern matching (LIKE), and range queries (BETWEEN, IN); and transform data using CASE WHEN for conditional logic, COALESCE for NULL handling, and CAST for type conversion.
Suggested References
Exercise: SQL Fundamentals for Data Analysts
Using the NYC Yellow Taxi Trip Dataset
Exercise Set Overview
These exercises use the NYC Taxi and Limousine Commission’s Yellow Taxi trip data. You’ll be analyzing real-world transportation data to answer business questions that taxi fleet managers, city planners, and operations teams face daily.
Database Schema Reference:
- fact_yellow_trips_raw_2025 – Main trip records table
- dim_taxi_zones – Location information (pickup/dropoff zones)
- dim_rate_codes – Rate code descriptions
- dim_payment_types – Payment method descriptions
- dim_vendors – Vendor/provider information
- dim_date – Date dimension for time-based analysis
Important:
For each of the tasks below:
- write the appropriate query to extract information from the database
- generate insights and answer the business question
Part 1: Filtering Data
Exercise 1.1: Premium Airport Trips
Business Context: Airport trips have special fees. Identify high-value airport pickups.
Task: Find trips where:
- Airport fee is greater than 0 (indicating an airport pickup)
- Total amount exceeds $50
- Display: trip distance, fare amount, airport fee, total amount
- Sort by total amount descending
- Show top 20
Learning Objective: Practice WHERE with multiple AND conditions
Exercise 1.2: Date Range Analysis
Business Context: The operations team needs to analyze trips from a specific time period for a report.
Task: Find all trips that occurred between January 5, 2025, and January 10, 2025 (inclusive). Display:
- Pickup datetime
- Dropoff datetime
- Trip distance
- Total amount
- Sort by pickup datetime
Learning Objective: Practice BETWEEN with dates
Exercise 1.3: Rate Code Analysis
Business Context: Different rate codes apply to different trip types (standard, JFK, Newark, etc.). Find non-standard rate trips.
Task: Find all trips that used rate codes OTHER than 1 (standard rate). Show:
- RatecodeID
- Trip distance
- Fare amount
- Total amount
- Sort by RatecodeID, then by fare amount descending
- Limit to 50 records
Learning Objective: Practice WHERE with NOT EQUAL (!=) or <>
Exercise 1.4: Generous Tipper Identification
Business Context: Understanding tipping patterns helps evaluate customer satisfaction and payment preferences.
Task: Find trips where customers tipped $10 or more. Display:
- Fare amount
- Tip amount
- Total amount
- Payment type
- Sort by tip amount (highest first)
- Show top 25
Learning Objective: Practice WHERE with >= on a specific column
Part 2: Pattern Matching and Range Queries
Exercise 2.1: Zone Pattern Matching
Business Context: The planning department needs to identify all zones in Manhattan for a special analysis.
Task: From the dim_taxi_zones table, find all zones where the Borough contains “Manhattan” (case-insensitive). Display:
- LocationID
- Borough
- Zone
- Sort alphabetically by Zone
Learning Objective: Practice LIKE or ILIKE with wildcards
Exercise 2.2: Multiple Payment Type Analysis
Business Context: Finance wants to compare credit card and cash transactions specifically.
Task: Find trips paid by either credit card (payment_type = 1) or cash (payment_type = 2). Display:
- Payment type
- Trip distance
- Fare amount
- Tip amount
- Total amount
- Sort by payment type, then by total amount descending
- Limit to 100 records
Learning Objective: Practice IN operator
Exercise 2.3: Medium-Distance Trip Analysis
Business Context: Marketing wants to target promotions for “medium-distance” trips.
Task: Find trips between 5 and 15 miles (inclusive). Show:
- Trip distance
- Fare amount
- Total amount
- Passenger count
- Sort by trip distance
- Limit to 50
Learning Objective: Practice BETWEEN with numeric ranges
Exercise 2.4: Exclude Airport Trips
Business Context: For a baseline fare analysis, you need trips WITHOUT airport fees.
Task: Find trips where the airport fee is 0 (no airport involved) AND the trip distance is greater than 3 miles. Display:
- Trip distance
- Fare amount
- Airport fee (to verify it’s 0)
- Total amount
- Sort by trip distance descending
- Limit to 30
Learning Objective: Practice combining WHERE conditions with AND
Exercise 2.5: Complex Filtering – High-Value, Low-Distance
Business Context: Identify potentially suspicious trips: high charges for short distances.
Task: Find trips where:
- Trip distance is less than 1 mile
- Total amount is greater than $20
- Show: trip distance, fare amount, extra charges, tolls, total amount
- Sort by total amount descending
Learning Objective: Practice multiple WHERE conditions with AND
Part 3: Conditional Logic and Data Transformation
Exercise 3.1: Trip Distance Categorization
Business Context: Operations wants to classify trips by distance for fleet planning.
Task: Create a query that categorizes trips as:
- ‘Short Trip’ if distance < 2 miles
- ‘Medium Trip’ if distance between 2 and 10 miles
- ‘Long Trip’ if distance > 10 miles
Display: trip distance, fare amount, total amount, and the distance category. Sort by distance category, then by trip distance. Limit to 100 records.
Learning Objective: Practice CASE WHEN for categorization
Exercise 3.2: Tip Percentage Calculation and Rating
Business Context: Evaluate tipping generosity to understand customer satisfaction patterns.
Task: Calculate the tip percentage (tip_amount / fare_amount * 100) and categorize tips as:
- ‘Excellent Tipper’ if tip percentage >= 20%
- ‘Good Tipper’ if tip percentage >= 15% and < 20%
- ‘Average Tipper’ if tip percentage >= 10% and < 15%
- ‘Low Tipper’ if tip percentage < 10%
Display: fare amount, tip amount, calculated tip percentage (rounded to 2 decimals), and tip category. Only include credit card payments (payment_type = 1) where fare amount > 0. Sort by tip percentage descending. Limit to 50.
Learning Objective: Practice CASE WHEN with calculations and multiple conditions
Exercise 3.3: Surcharge Impact Analysis
Business Context: NYC implemented congestion surcharges. Analyze their impact on trip costs.
Task: Create a categorization showing surcharge impact:
- ‘High Surcharge Impact’ if congestion_surcharge > 2
- ‘Moderate Surcharge Impact’ if congestion_surcharge between 0.5 and 2
- ‘Low or No Surcharge’ if congestion_surcharge < 0.5 or is 0
Display: fare amount, congestion surcharge, total amount, and impact category. Sort by congestion surcharge descending. Limit to 75 records.
Learning Objective: Practice CASE WHEN with numeric ranges
Exercise 3.4: Payment Method Labeling
Business Context: Create a human-readable report replacing payment type codes with descriptions.
Task: Using the payment_type codes, create descriptive labels:
- 1 = ‘Credit Card’
- 2 = ‘Cash’
- 3 = ‘No Charge’
- 4 = ‘Dispute’
- All others = ‘Other/Unknown’
Display: payment type code, payment method description (using CASE), trip distance, total amount. Sort by payment method description. Limit to 100 records.
Learning Objective: Practice CASE WHEN for code translation
Exercise 3.5: Trip Value Classification
Business Context: Finance wants to segment trips by revenue value for reporting.
Task: Classify trips by total amount:
- ‘Premium’ if total_amount >= 100
- ‘High Value’ if total_amount >= 50 and < 100
- ‘Standard’ if total_amount >= 20 and < 50
- ‘Economy’ if total_amount < 20
Display: total amount, trip distance, passenger count, value classification. Sort by value classification (use a nested CASE in ORDER BY to sort Premium first, then High Value, Standard, Economy). Limit to 100.
Learning Objective: Practice CASE WHEN with ORDER BY logic
Exercise 3.6: Handling Missing Passenger Data
Business Context: Some trips have missing passenger count data. Create a cleaned report.
Task: Display trips with:
- Original passenger_count value
- A cleaned version using COALESCE that replaces NULL with 1 (assuming single passenger)
- Trip distance
- Total amount
Only show trips where passenger_count is NULL or passenger_count > 0. Sort by the cleaned passenger count. Limit to 50.
Learning Objective: Practice COALESCE for NULL handling
Exercise Submission Guidelines
For each exercise:
- Write the SQL query
- Execute it in pgAdmin
- Take a screenshot of both the query and the first 10-20 rows of results
- For aggregate exercises, include the complete result set (usually just a few summary rows)
- Write 1-2 sentences explaining what business insight the results reveal
Example Format:
Exercise 2.1: High-Value Trip Analysis
Query:
[Your SQL code here]
Results:
[Screenshot]
Business Insight:
The top 25 trips range from $100 to $350, with the highest-value trips covering distances of 15-25 miles, suggesting airport or long-distance suburban routes are the most lucrative for drivers.
Submission & Resubmission Guidelines
1. When submitting your exercise, use this naming format:
YourName_Submission_Lesson#.xxx
If you need to revise and resubmit, add a version suffix:
YourName_Submission_Lesson#_v2.xxxYourName_Submission_Lesson#_v3.xxx
2. Do not overwrite or change the original evaluation entries in the rubric.
Instead, enter your updated responses or corrections in a new “v2” (or later) column in the rubric for mentor review.
Evaluation Rubric
| Criteria | Meets Expectations | Needs Improvement | Incomplete/Off-Track |
|---|---|---|---|
| Query Execution | All 11 queries execute without errors and return correct results | 9-10 queries work correctly; 1-2 have minor errors | Fewer than 9 queries execute successfully |
| Advanced Filtering | Correctly uses AND/OR/NOT, LIKE, BETWEEN, IN operators with proper logic and syntax | Demonstrates understanding but makes logical errors in 2-3 queries (wrong operators, incorrect pattern matching, boundary issues) | Frequent misuse of logical operators, pattern matching, or range queries |
| CASE & Transformations | CASE statements properly categorize data; COALESCE and CAST used correctly | CASE logic mostly correct but 1-2 exercises have condition errors or missing ELSE; minor transformation issues | CASE statements fail or produce wrong categories; doesn’t understand NULL handling or type conversion |
| Business Insights | Insights demonstrate understanding of conditional logic and data patterns | Insights present but superficial or missing for 2-3 exercises | Most insights missing or show no understanding of categorization results |
Contact
Talk to us
Have questions or feedback about Lumen? We’d love to hear from you.