Session 4 - Exploratory Data Analysis

Introduction to Data Analytics for Beginners

The true power of data analytics lies not in the formulas themselves, but in the patterns you uncover. Exploratory Data Analysis (EDA) helps you move beyond numbers and start identifying meaningful insights that can shape real business decisions.

In this session, we’ll build on our earlier descriptive analysis and learn how to explore data for trends, outliers, customer issues, and product performance. You’ll learn how to ask better questions—and answer them using Google sheets.

We will continue to use the Amazon Sales Dataset as our example, which you downloaded and imported into Google Sheets in the first session. If not, follow the link above to download.

 

Exploratory Data Analysis

Step 1: Investigate Low Ratings

Let’s begin by focusing on products with low customer ratings. These might indicate issues with product quality, delivery, or user expectations.

  1. Open your Amazon dataset and locate the “Rating” column.

  2. Sort the sheet by ratings in ascending order to bring the lowest-rated entries to the top:

    • Select your dataset.

    • Go to Data > Sort range and use advanced options.

    • Ensure “Data has header row” is checked, then sort by “Rating” A → Z.

This helps you quickly isolate ratings in the 2.0–2.9 range. These are the entries we want to explore further to understand what went wrong.

 

Step 2: Read Customer Reviews (Manually)

Once you’ve filtered the lowest ratings, it’s time to manually review customer feedback.

 

Why manual review matters:

Automated tools are helpful, but early in the analysis process, reading the raw comments yourself gives you direct understanding of the user experience—nuances that automation might miss.

Look at the following columns:

  • Review Title

  • Review Content

 

As you read:

  • Are the comments clear and aligned with the low rating?

  • Do they seem mixed, inconsistent, or possibly combined from multiple users?

 

Observation:
You might notice something strange—some review entries seem to include both negative and positive comments in a single cell. For example:

  • “Bad quality. Amazing product.”

  • “Very bad. Great heater. Would recommend.”

This inconsistency suggests a data quality issue: some reviews appear to be concatenated from multiple users. Unfortunately, there’s no clear delimiter (like line breaks or user IDs) to separate them.

 

Best Practice: Data Integrity Checks

When you notice possible merging of multiple reviews in a single cell:

  • Do not rely on this field for quantitative sentiment analysis.

  • Avoid feeding this type of text into AI models or dashboards without cleanup.

  • If your organization has access to raw review logs or APIs, consider pulling a cleaner version from there.

 

For now, we’ll manually extract these low-rated comments and examine them with a word cloud.

 

Step 3: Generate a Word Cloud (Using External Tool)

Word clouds can help you spot recurring themes—especially helpful when analyzing qualitative data like reviews.

  1. Select all Review Content rows where ratings are between 2.0–2.9.

  2. Copy these and visit a free word cloud generator (e.g., wordclouds.com).

  3. Paste your reviews into the text box.

⚠️ Caution: Never paste sensitive or personally identifiable data into third-party tools. In this exercise, we assume these are anonymized, generic reviews without personal identifiers.

  1. Configure your word cloud:

    • Set a reasonable word limit (e.g., top 50 words).

    • Remove common or misleading terms like “product” or “good” if they skew the results.

  2. Analyze the result:

    • Which negative words stand out?

    • Are there repeated complaints (e.g., “battery,” “charging,” “defective,” “refund”)?

 

This gives you a direction: if many customers mention “battery” or “money,” you might infer dissatisfaction with battery life or perceived value.

 

Step 4: Expand Review Scope

To confirm patterns, try including more data:

  • Add reviews with a rating of 3.0–3.9 to the same word cloud tool.

  • Compare word frequencies.

This lets you see whether certain concerns are persistent even as ratings improve. If “battery” is still common in 3-star reviews, it’s likely a widespread issue.

 

Step 5: Switch Focus – Explore Product Categories

So far, we focused on customer satisfaction. Now, let’s explore the data from a product and business strategy angle.

Question:
Which product categories have the most items listed?

This can help you evaluate:

  • Which areas your business focuses on

  • Where your catalog is overloaded or underdeveloped

We’ll use pivot tables for this.

 

Step 6: Create a Pivot Table to Count Products by Category

  1. Select the entire dataset.

  2. Go to Insert > Pivot Table, and create it in a new sheet.

  3. Rename the new sheet (e.g., Category_Count).

  4. Set up your pivot:

    • Rows: Add “Category”

    • Values: Add “Category” again, but use COUNTA to count entries.

This shows how many products are listed under each category.

  1. To interpret it better:

    • Copy the pivot table to a new sheet.

    • Use Data > Sort range to sort by count (descending).

You’ll now see which categories dominate (e.g., “USB Cables” might have 200+ items), and which are underrepresented.

 

Step 7: Drill Deeper into High-Volume Categories

Choose a top-performing category and go back to the main dataset. Apply a filter on the Category column to isolate this group.

Now ask:

  • Which products in this category have the highest discounts?

  • Are there items with large discounts but poor ratings?

Use Sort by Discount and Sort by Rating to identify anomalies.

This analysis supports decisions like:

  • Reducing inventory for poor-performing items

  • Offering better deals for high-rated products

  • Reallocating focus across categories

 

Step 8: Compare Ratings Across Categories

Let’s now evaluate average customer satisfaction across categories.

  1. Insert a new pivot table on the full dataset.

  2. In the pivot table:

    • Rows: Category

    • Values: Rating (change to Average instead of Sum)

  3. Copy the pivot table to a new sheet (e.g., Category_Avg_Rating), then sort it ascending by average rating.

 

Now you can identify:

  • Which categories are underperforming in customer satisfaction

  • Which ones customers appreciate more

 

You may find that some categories consistently score lower, even if they have many products listed. This insight is valuable for improving product lines, customer support, or quality control.

 

Final Thoughts

Exploratory Data Analysis is about getting familiar with your data—not just calculating, but interpreting. Today, we:

  • Identified low-rated products and analyzed customer reviews.

  • Used word clouds to surface common complaints.

  • Used pivot tables to uncover trends by category.

  • Compared product count and customer ratings across categories.

 

These insights are what businesses act on. EDA gives you the foundation for smarter, data-informed decisions.

 

What’s Next?

In the next session, we’ll move into data visualization, where we’ll turn these findings into visuals that communicate clearly and powerfully.

Contact

Talk to us

Have questions? We’re here to help! Whether you’re curious to learn more, want guidance on applying, or need insights to make the right decision—reach out today and take the first step toward transforming your career.