Session 4 - Exploratory Data Analysis
Introduction to Data Analytics for Beginners
The true power of data analytics lies not in the formulas themselves, but in the patterns you uncover. Exploratory Data Analysis (EDA) helps you move beyond numbers and start identifying meaningful insights that can shape real business decisions.
In this session, we’ll build on our earlier descriptive analysis and learn how to explore data for trends, outliers, customer issues, and product performance. You’ll learn how to ask better questions—and answer them using Google sheets.
We will continue to use the Amazon Sales Dataset as our example, which you downloaded and imported into Google Sheets in the first session. If not, follow the link above to download.
Step 1: Investigate Low Ratings
Let’s begin by focusing on products with low customer ratings. These might indicate issues with product quality, delivery, or user expectations.
Open your Amazon dataset and locate the “Rating” column.
Sort the sheet by ratings in ascending order to bring the lowest-rated entries to the top:
Select your dataset.
Go to Data > Sort range and use advanced options.
Ensure “Data has header row” is checked, then sort by “Rating” A → Z.
This helps you quickly isolate ratings in the 2.0–2.9 range. These are the entries we want to explore further to understand what went wrong.
Step 2: Read Customer Reviews (Manually)
Once you’ve filtered the lowest ratings, it’s time to manually review customer feedback.
Why manual review matters:
Automated tools are helpful, but early in the analysis process, reading the raw comments yourself gives you direct understanding of the user experience—nuances that automation might miss.
Look at the following columns:
Review Title
Review Content
As you read:
Are the comments clear and aligned with the low rating?
Do they seem mixed, inconsistent, or possibly combined from multiple users?
Observation:
You might notice something strange—some review entries seem to include both negative and positive comments in a single cell. For example:
“Bad quality. Amazing product.”
“Very bad. Great heater. Would recommend.”
This inconsistency suggests a data quality issue: some reviews appear to be concatenated from multiple users. Unfortunately, there’s no clear delimiter (like line breaks or user IDs) to separate them.
Best Practice: Data Integrity Checks
When you notice possible merging of multiple reviews in a single cell:
Do not rely on this field for quantitative sentiment analysis.
Avoid feeding this type of text into AI models or dashboards without cleanup.
If your organization has access to raw review logs or APIs, consider pulling a cleaner version from there.
For now, we’ll manually extract these low-rated comments and examine them with a word cloud.
Step 3: Generate a Word Cloud (Using External Tool)
Word clouds can help you spot recurring themes—especially helpful when analyzing qualitative data like reviews.
Select all Review Content rows where ratings are between 2.0–2.9.
Copy these and visit a free word cloud generator (e.g., wordclouds.com).
Paste your reviews into the text box.
⚠️ Caution: Never paste sensitive or personally identifiable data into third-party tools. In this exercise, we assume these are anonymized, generic reviews without personal identifiers.
Configure your word cloud:
Set a reasonable word limit (e.g., top 50 words).
Remove common or misleading terms like “product” or “good” if they skew the results.
Analyze the result:
Which negative words stand out?
Are there repeated complaints (e.g., “battery,” “charging,” “defective,” “refund”)?
This gives you a direction: if many customers mention “battery” or “money,” you might infer dissatisfaction with battery life or perceived value.
Step 4: Expand Review Scope
To confirm patterns, try including more data:
Add reviews with a rating of 3.0–3.9 to the same word cloud tool.
Compare word frequencies.
This lets you see whether certain concerns are persistent even as ratings improve. If “battery” is still common in 3-star reviews, it’s likely a widespread issue.
Step 5: Switch Focus – Explore Product Categories
So far, we focused on customer satisfaction. Now, let’s explore the data from a product and business strategy angle.
Question:
Which product categories have the most items listed?
This can help you evaluate:
Which areas your business focuses on
Where your catalog is overloaded or underdeveloped
We’ll use pivot tables for this.
Step 6: Create a Pivot Table to Count Products by Category
Select the entire dataset.
Go to Insert > Pivot Table, and create it in a new sheet.
Rename the new sheet (e.g.,
Category_Count).Set up your pivot:
Rows: Add “Category”
Values: Add “Category” again, but use COUNTA to count entries.
This shows how many products are listed under each category.
To interpret it better:
Copy the pivot table to a new sheet.
Use Data > Sort range to sort by count (descending).
You’ll now see which categories dominate (e.g., “USB Cables” might have 200+ items), and which are underrepresented.
Step 7: Drill Deeper into High-Volume Categories
Choose a top-performing category and go back to the main dataset. Apply a filter on the Category column to isolate this group.
Now ask:
Which products in this category have the highest discounts?
Are there items with large discounts but poor ratings?
Use Sort by Discount and Sort by Rating to identify anomalies.
This analysis supports decisions like:
Reducing inventory for poor-performing items
Offering better deals for high-rated products
Reallocating focus across categories
Step 8: Compare Ratings Across Categories
Let’s now evaluate average customer satisfaction across categories.
Insert a new pivot table on the full dataset.
In the pivot table:
Rows: Category
Values: Rating (change to Average instead of Sum)
Copy the pivot table to a new sheet (e.g.,
Category_Avg_Rating), then sort it ascending by average rating.
Now you can identify:
Which categories are underperforming in customer satisfaction
Which ones customers appreciate more
You may find that some categories consistently score lower, even if they have many products listed. This insight is valuable for improving product lines, customer support, or quality control.
Final Thoughts
Exploratory Data Analysis is about getting familiar with your data—not just calculating, but interpreting. Today, we:
Identified low-rated products and analyzed customer reviews.
Used word clouds to surface common complaints.
Used pivot tables to uncover trends by category.
Compared product count and customer ratings across categories.
These insights are what businesses act on. EDA gives you the foundation for smarter, data-informed decisions.
What’s Next?
In the next session, we’ll move into data visualization, where we’ll turn these findings into visuals that communicate clearly and powerfully.
Contact
Talk to us
Have questions? We’re here to help! Whether you’re curious to learn more, want guidance on applying, or need insights to make the right decision—reach out today and take the first step toward transforming your career.