Lesson 3: Basic SQL Syntax
Learn basic SQL techniques to answer complex business questions
Estimated Read Time: 2 - 3 Hours
Learning Goals
In this lesson, you will learn to:
- Write SELECT queries to retrieve specific columns and rows from database tables
- Apply filtering conditions using WHERE clauses with multiple operators
- Sort and limit result sets to focus on relevant data
- Combine multiple SQL techniques to answer complex business questions
In the previous lesson, you explored the structure of relational databases, understood the Northwind schema through its ERD, and learned three methods for importing data into PostgreSQL. You’re now equipped with a working database containing real-world business data – orders, customers, products, suppliers, and employees – all interconnected through carefully designed relationships.
Now comes the exciting part: extracting insights from this data. This is where SQL transforms from a database management tool into your primary instrument for data analysis. In this lesson, you’ll master the fundamental SQL operations that form the foundation of every analytical query you’ll write in your career.
Think of SQL as a conversation with your database. You ask questions, and the database responds with answers. The art lies in asking the right questions in the right way. By the end of this lesson, you’ll be able to query data with precision, filter it according to business requirements, aggregate it to reveal patterns, and transform it to meet analytical needs.
Important Note: It’s possible that some queries in this lesson will result in different results when you try them out. Please take the results with a grain of salt.
1. Basic SQL Syntax
1.1. The SELECT Statement: Asking Your First Questions
Every SQL query begins with a simple question: “What data do I want to see?” The SELECT statement is your tool for answering this question.
Basic Syntax:
SELECT column1, column2, column3
FROM table_name;
Business Context: Imagine you’re a sales analyst at Northwind Traders, and your manager asks, “Can you give me a list of all our products with their prices?”
Query:
SELECT product_name, unit_price
FROM products;
This query returns exactly what was asked for—nothing more, nothing less. Notice how we specify only the columns we need. This is more efficient than retrieving everything.
Selecting All Columns:
When you need all information from a table, use the asterisk (*) wildcard:
SELECT *
FROM products;
Important:
While SELECT * is convenient during exploration, avoid it in production queries. It’s less efficient and can cause problems if the table structure changes. Always specify the columns you need.
1.2. The FROM Clause: Specifying Your Data Source
Every SELECT statement must include a FROM clause that tells SQL where to find the data. In relational databases, data lives in tables, and you must specify which table you’re querying.
Business Context: The marketing team wants to analyze customer demographics. They need to know where customers are located.
Query:
SELECT company_name, city, country
FROM customers;
Understanding Table References:
Tables in your database are like filing cabinets—each contains organized information about a specific entity. The Northwind database has tables for:
- customers: Information about companies that buy from Northwind
- products: Details about items Northwind sells
- orders: Records of customer purchases
- employees: Staff information
- suppliers: Companies that supply products to Northwind
1.3. The WHERE Clause: Filtering Your Results
Retrieving all rows from a table is rarely useful in business analysis. You typically want to focus on specific subsets of data that meet certain criteria. The WHERE clause is your filtering mechanism.
Basic Syntax:
SELECT column1, column2
FROM table_name
WHERE condition;
Business Context: The inventory manager asks, “Which products are we selling for more than $50? I need to review our premium offerings.”
Query:
SELECT product_name, unit_price, units_in_stock
FROM products
WHERE unit_price > 50;
Common Comparison Operators:
| Operator | Meaning | Example |
|---|---|---|
| = | Equal to | WHERE country = ‘USA’ |
| != or <> | Not equal to | WHERE country != ‘USA’ |
| > | Greater than | WHERE unit_price > 50 |
| < | Less than | WHERE units_in_stock < 10 |
| >= | Greater than or equal | WHERE unit_price >= 50 |
| <= | Less than or equal | WHERE units_in_stock <= 10 |
Text Comparison Example:
Business Context: “Show me all customers in Germany.”
SELECT company_name, contact_name, city
FROM customers
WHERE country = 'Germany';
Critical Note:
When comparing text values:
- Always enclose them in single quotes: ‘Germany’
- SQL is case-sensitive in some databases for text comparisons
- In PostgreSQL, ‘Germany’ and ‘germany’ are different values
1.4. The ORDER BY Clause: Sorting Your Results
Data returned from a database has no guaranteed order unless you explicitly specify one. The ORDER BY clause sorts your results to make them more meaningful.
Basic Syntax:
SELECT column1, column2
FROM table_name
ORDER BY column1 ASC; -- ASC for ascending (default)
-- OR
ORDER BY column1 DESC; -- DESC for descending
Business Context: The sales director asks, “Show me our product list sorted by price, from most to least expensive.”
Query:
SELECT product_name, unit_price
FROM products
ORDER BY unit_price DESC;
Sorting by Multiple Columns:
Business Context: “I need a customer list organized by country, and within each country, sorted by company name alphabetically.”
SELECT company_name, contact_name, country, city
FROM customers
ORDER BY country ASC, company_name ASC;
How This Works:
- SQL first sorts all customers by country (A to Z)
- Within each country group, it sorts by company name (A to Z)
Practical Tip
When sorting by multiple columns, the order matters. The first column specified is the primary sort, the second is the secondary sort (used when the first column has duplicate values), and so on.
1.5. Column Aliases with AS: Making Results Readable
When you write queries, column names in the output should be clear and professional, especially when sharing results with non-technical stakeholders. The AS keyword lets you rename columns in your result set.
Basic Syntax:
SELECT column_name AS alias_name
FROM table_name;
Business Context: You’re preparing a report for the executive team. They don’t understand database column names like “unit_price”—they expect “Product Price”.
Query:
SELECT
product_name AS "Product Name",
unit_price AS "Price (USD)",
units_in_stock AS "Stock Level"
FROM products
WHERE unit_price > 50
ORDER BY unit_price DESC;
Important Notes About Aliases:
- The AS keyword is optional, but using it makes your code more readable
- If your alias contains spaces or special characters, enclose it in double quotes
- Aliases only affect the output – they don’t change the actual column names in the database
- You cannot use aliases in WHERE clauses (they don’t exist yet at that stage of query execution)
1.6. The LIMIT Clause: Controlling Result Set Size
When exploring data or when you only need to see a sample, LIMIT restricts the number of rows returned. This is especially important with large datasets.
Basic Syntax:
SELECT column1, column2
FROM table_name
LIMIT number_of_rows;
Business Context: “Show me the top 10 most expensive products in our catalog.”
Query:
SELECT product_name, unit_price
FROM products
ORDER BY unit_price DESC
LIMIT 10;
Why LIMIT Matters:
- Performance: Returning 1 million rows when you only need 10 wastes system resources
- Quick Exploration: When learning about a new table,
LIMIT 100lets you see sample data quickly - Top-N Analysis: Combined with ORDER BY, it answers “top 10” or “bottom 5” questions
Important Order of Execution
SQL processes clauses in a specific order (not the order you write them):
- FROM – Identifies the table
- WHERE – Filters rows
- ORDER BY – Sorts the results
- LIMIT – Restricts the number of rows returned
This means LIMIT is always applied last, after sorting.
Why This Matters:
- You can’t use SELECT aliases in WHERE (they don’t exist yet)
- You CAN use SELECT aliases in ORDER BY (they exist by then)
Example:
-- This works
SELECT unit_price * 1.2 AS price_with_tax
FROM products
ORDER BY price_with_tax; -- Alias exists here
-- This doesn't work
SELECT unit_price * 1.2 AS price_with_tax
FROM products
WHERE price_with_tax > 50; -- X Alias doesn't exist yet
-- Correct version
SELECT unit_price * 1.2 AS price_with_tax
FROM products
WHERE unit_price * 1.2 > 50; -- ✓ Use the calculation
2. Writing Readable SQL
2.1. Use Consistent Formatting:
-- Good
SELECT
product_name,
unit_price,
units_in_stock
FROM products
WHERE category_id = 1
ORDER BY product_name;
-- Difficult to read
SELECT product_name,unit_price,units_in_stock FROM products WHERE category_id=1 ORDER BY product_name;
2.2. Use Meaningful Aliases:
-- Good
SELECT COUNT(*) AS total_products
FROM products;
-- Poor
SELECT COUNT(*) AS cnt
FROM products;
2.3. Comment Complex Logic:
-- Calculate inventory value for active products only
-- Excludes discontinued items from the calculation
SELECT
SUM(units_in_stock * unit_price) AS total_inventory_value
FROM products
WHERE discontinued = 0;
2.4. Break Complex Queries into Steps:
If a query becomes too complex, use CTEs (Common Table Expressions) to break it into logical steps. You’ll learn these in advanced lessons.
3. Performance Considerations
3.1. Use LIMIT During Development:
When testing queries on large tables, always use LIMIT:
-- During development
SELECT *
FROM orders
LIMIT 100; -- Just to see the structure
3.2. Be Specific in SELECT:
-- Good
SELECT product_name, unit_price
FROM products;
-- Wasteful
SELECT *
FROM products; -- Returns all columns you might not need
Summary
You’ve completed a comprehensive journey through Core SQL Operations:
- You can retrieve data using SELECT, FROM, WHERE, ORDER BY, and LIMIT
Suggested References
Exercise: SQL Fundamentals for Data Analysts
Using the NYC Yellow Taxi Trip Dataset
Exercise Set Overview
These exercises use the NYC Taxi and Limousine Commission’s Yellow Taxi trip data. You’ll be analyzing real-world transportation data to answer business questions that taxi fleet managers, city planners, and operations teams face daily.
Database Schema Reference:
- fact_yellow_trips_raw_2025 – Main trip records table
- dim_taxi_zones – Location information (pickup/dropoff zones)
- dim_rate_codes – Rate code descriptions
- dim_payment_types – Payment method descriptions
- dim_vendors – Vendor/provider information
- dim_date – Date dimension for time-based analysis
Important:
For each of the tasks below:
- write the appropriate query to extract information from the database
- generate insights and answer the business question
Part 1: Basic Data Retrieval
Exercise 1.1: Initial Data Exploration
Business Context: You’re new to the taxi operations team and need to familiarize yourself with the trip data structure.
Task: Write a query to display the first 20 trip records, showing:
- Pickup datetime
- Dropoff datetime
- Passenger count
- Trip distance
- Total amount
Learning Objective: Practice SELECT with specific columns and LIMIT
Exercise 1.2: Understanding Trip Patterns
Business Context: The operations manager wants to understand typical trip characteristics.
Task: Retrieve 50 trips showing:
- Trip distance
- Fare amount
- Tip amount
- Total amount
- Sort by trip distance from longest to shortest
Learning Objective: Practice ORDER BY with DESC
Exercise 1.3: Payment Method Analysis
Business Context: The finance team needs to understand which payment types are used.
Task: Write a query to find all unique payment types in the dataset. Display only distinct payment_type values, sorted numerically.
Learning Objective: Practice DISTINCT and basic sorting
Exercise 1.4: Vendor Identification
Business Context: You need to identify which vendors (taxi technology providers) operate in the system.
Task: List all unique VendorID values from the trips table, along with a count of how many appear in your sample. Use LIMIT 100 for your sample.
Learning Objective: Practice DISTINCT with basic exploration
Part 2: Filtering Data
Exercise 2.1: High-Value Trip Analysis
Business Context: Management wants to focus on premium trips to understand high-value customer behavior.
Task: Find all trips where the total amount exceeded $100. Display:
- Pickup datetime
- Trip distance
- Fare amount
- Tip amount
- Total amount
- Sort by total amount (highest first)
- Limit to 25 records
Learning Objective: Practice WHERE with numeric comparison and ORDER BY
Exercise 2.2: Short Trip Identification
Business Context: The pricing team is reviewing short-distance trips to ensure fare structures are appropriate.
Task: Identify trips that are:
- Less than 2 miles in distance
- Show trip distance, fare amount, and total amount
- Sort by distance (shortest first)
Learning Objective: Practice WHERE with less-than comparison
Exercise 2.3: Cash Payment Analysis
Business Context: The finance department needs to analyze cash transactions (payment_type = 2) for reconciliation purposes.
Task: Find all cash-paid trips showing:
- Trip distance
- Fare amount
- Total amount
- Payment type (to verify)
- Limit to first 30 results
- Sort by fare amount descending
Learning Objective: Practice WHERE with equality comparison
Exercise 2.4: Multi-Passenger Trip Analysis
Business Context: Operations wants to understand group travel patterns for vehicle allocation planning.
Task: Find trips with 3 or more passengers, showing:
- Passenger count
- Trip distance
- Total amount
- Sort by passenger count (highest first), then by total amount (highest first)
Learning Objective: Practice WHERE with >= and multiple ORDER BY columns
Exercise Submission Guidelines
For each exercise:
- Write the SQL query
- Execute it in pgAdmin
- Copy the query and take a screenshot of the first 10-20 rows of results
- For aggregate exercises, include the complete result set (usually just a few summary rows)
- Write 1-2 sentences explaining what business insight the results reveal
Example Format:
Exercise 2.1: High-Value Trip Analysis
Query:
[Your SQL code here]
Results:
[Screenshot]
Business Insight:
The top 25 trips range from $100 to $350, with the highest-value trips covering distances of 15-25 miles, suggesting airport or long-distance suburban routes are the most lucrative for drivers.
Submission & Resubmission Guidelines
1. When submitting your exercise, use this naming format:
YourName_Submission_Lesson#.xxx
If you need to revise and resubmit, add a version suffix:
YourName_Submission_Lesson#_v2.xxxYourName_Submission_Lesson#_v3.xxx
2. Do not overwrite or change the original evaluation entries in the rubric.
Instead, enter your updated responses or corrections in a new “v2” (or later) column in the rubric for mentor review.
Evaluation Rubric
| Criteria | Meets Expectations | Needs Improvement | Incomplete/Off-Track |
|---|---|---|---|
| Query Execution | All 8 queries execute without errors and return correct results | 6-7 queries work correctly; 1-2 have minor syntax errors | Fewer than 6 queries execute successfully |
| SQL Fundamentals | Correctly uses SELECT, FROM, WHERE, ORDER BY, LIMIT with appropriate operators and column selection | Demonstrates understanding but makes mistakes in 2-3 queries (wrong operators, missing clauses, incorrect sorting) | Frequent errors across multiple queries showing gaps in basic SQL concepts |
| Business Insights | Provides relevant 1-2 sentence insights for each exercise showing understanding of results | Insights present but generic or missing for 1-2 exercises | Most insights missing, incorrect, or show no understanding of data |
Contact
Talk to us
Have questions or feedback about Lumen? We’d love to hear from you.