Session 5 - Data Visualization
Introduction to Data Analytics for Beginners
Welcome to one of the most important parts of your analytics journey: Data Visualization. Think of it like this – numbers contain the story, but visuals help reveal and communicate that story. Without visualization, your insights can remain hidden in rows and columns. But with the right visuals, patterns come to life, decisions get easier, and your message becomes clear. There are two key reasons why data visualization should be part of every analyst’s toolkit:
To Discover Patterns:
Visuals help you instantly see trends, spikes, dips, and outliers that would be difficult to catch in tables.To Communicate Across Teams:
Your stakeholders—from marketing to finance—may not be data experts. Charts let you speak a common visual language that influences and inspires action.
We will continue using the Amazon Sales Dataset as our example, which you downloaded and imported into Google Sheets in the first session. If not, follow the link above to download.
First Things First – Choosing the Right Chart
Just because a chart looks fancy doesn’t mean it’s effective. Use visuals to clarify, not complicate.
Chart Type | Best For | Avoid When… |
---|---|---|
Line Chart | Time series trends | You’re not working with chronological data |
Column/Bar Chart | Comparing values across categories | Too many categories crowd the chart |
Histogram | Viewing distribution (e.g., rating ranges) | Categories aren’t numeric or continuous |
Scatter Plot | Checking correlation between 2 numeric fields | Data is categorical or unrelated |
Pie Chart | Showing part-to-whole ratios (rarely useful) | Values are too close together |
Geo Chart | Mapping values by location | No geo data is available |
Step-by-Step: Visualizing Product Ratings
1. Start With a Business Question
Before jumping into any chart, always ground yourself in the business context. In our case, we’re trying to answer this question:
Which product categories receive the highest and lowest customer ratings?
This isn’t just a visual exercise. You’re looking to guide decisions—maybe about inventory prioritization, quality improvement, or marketing focus.
2. Examine Your Raw Data
Your dataset contains a Category
column, but here’s the catch:
Each value looks like this:
Home & Kitchen | Appliances | Electric Grinders
That’s not a clean category – it’s a hierarchy: main category → subcategory → specific item.
If you try to visualize using this raw column, you’ll end up with hundreds of ultra-specific combinations. The result?
Charts become overcrowded
Patterns become hard to read
Stakeholders get confused
3. Clean It Up: Extract the Main Category
To bring clarity, we simplify. The insight we want is at the main category level.
We’ll use the SPLIT()
function in Google Sheets (or Excel) to separate the values based on the pipe symbol (|
), and extract only the first part – i.e., the top-level category like Home & Kitchen
, Electronics
, Toys & Games
.
Formula:
=SPLIT(A2, "|")
Why this matters:
It gives us consistent, top-level categories
We reduce complexity from 200+ specific groups → 8–10 meaningful chunks
Visualizations become cleaner, clearer, more actionable
Best Practice: When visualizing, zoom out before you zoom in. Start broad to reveal patterns. If needed, drill down later.
4. Create a Pivot Table to Summarize the Data
Now that we have a column of main categories, we want to answer:
What’s the average rating for each main category?
To do that, use a Pivot Table:
Go to
Insert
→Pivot Table
Use your cleaned sheet as input
Choose to place the pivot table in a new sheet
Then:
Rows →
Main Category
Values →
Average Rating
(Make sure it’s set to AVERAGE, not SUM)
You’ll now have a table that summarizes average customer rating for each top-level category.
5. Build Your First Chart
Select your pivot table results and go to:
Insert → Chart
Google Sheets will suggest a chart type. If it recommends a column chart, go ahead – but evaluate if a bar chart might offer better readability (especially if category names are long).
We’ll walk through these options in the next section.
Let’s Visualize
Line Chart
A line chart connects individual data points using a line. It’s best used when your data has a natural order – especially when you’re tracking changes over time (days, months, years, etc.).
Example:
Tracking monthly sales or customer support tickets over a quarter.
Tip:
Avoid using line charts for categorical comparisons. If your data isn’t time-based, a column chart is likely a better choice.
Column Chart vs. Bar Chart
Column Chart: Vertical bars. Can become cramped with long category names.
Bar Chart: Horizontal layout often reads better, especially when labels are lengthy.
Rule of Thumb: If you’re tilting labels or truncating text, consider switching from column to bar.
Sorting for Clarity: Sort by average rating (descending). This improves storytelling—your audience sees the “best” first.
◑ Pie Chart
Pie charts are often used to show how a whole is divided into parts – for example, what percentage of total sales came from each region. But they’re easily misused.
Limitations:
When there are too many slices or the differences between them are small, a pie chart becomes difficult to read. It’s also hard to compare angles accurately.
Alternatives:
Use a bar chart when you want precision and easy comparison. Use a pie chart only when you want to emphasize that the data forms parts of a whole.
Histogram
Shows distribution of a variable (e.g., ratings).
Good for spotting skewness: left-skewed = many low ratings; right-skewed = mostly high.
Scatter Plot
Use to assess relationships. Try plotting
Discount %
vs.Rating
.In our case: The scatter plot reveals no clear trend—discount and rating are not correlated.
Interpretation Tip: If dots are randomly scattered, there’s no correlation. If they form a slope, the variables are related.
Geo Chart
Paste in a “Country” column to simulate geographic data.
Use a Geo Chart to see how average ratings vary by country.
Color gradient shows sentiment: red (low) → green (high).
Optional: Use Geo Markers for added context—color + size = sentiment + frequency.
When Not to Visualize
Avoid visuals that:
Repeat the same insight shown earlier
Make minor variations look dramatic
Confuse more than they clarify
Golden Rule: Every visual should answer a question, support a decision, or reveal a pattern.
Understanding Distribution of Ratings
A histogram helps us understand how data is distributed across different ranges or “buckets.” In this case, we’re looking at the distribution of average product ratings across our dataset.
Instead of comparing categories, we now shift our focus to the shape of the data. Are most ratings clustered around 4 and 5? Are there a lot of poor reviews? Or is the distribution balanced?
Why It Matters?
Understanding the shape of your data is critical for drawing reliable conclusions:
A normal (bell-curve) distribution suggests consistency and reliability.
A right-skewed distribution (long tail on right) may indicate widespread dissatisfaction.
A left-skewed distribution (long tail on left) often reflects generally high satisfaction with occasional poor ratings.
This is the kind of nuance that a table won’t reveal — but a histogram can, instantly.
Step-by-Step: Creating the Histogram of Ratings
Go to your cleaned ratings data — specifically, the column with average ratings per product or per category.
Select the column containing those ratings.
Insert → Chart
Google Sheets may automatically suggest a histogram. If not:In the Chart Editor, switch the chart type manually to Histogram.
How to Read This Chart?
Once your histogram appears, take a moment to observe its shape.
Let’s say your histogram shows:
A peak between 4.0–4.4 average rating
Very few products with ratings below 3.5 or above 4.8
This is called a normal distribution — a balanced “bell curve.” Most products are rated well, with fewer being exceptional or disappointing.
Skewness Explained
Right-skewed: If the chart has a long tail on the right, with many low ratings (2s and 3s), it may signal deeper problems — poor quality, delivery issues, or unmet expectations.
Left-skewed: A long tail on the left with many 5-star ratings and few 2s/3s may reflect over-performance — or possibly rating inflation.
A flat histogram with no clear peak might indicate too much noise – maybe inconsistent quality or a dataset that needs more segmentation (e.g., by region or product type).
Business Insight
Let’s say your histogram shows a cluster around 4.2 with a few products down at 2.5. This tells you:
Most customers are satisfied — that’s great!
But a few products may be dragging down brand perception — worth investigating.
You could now decide to:
Deep-dive into the lowest-rated products
Read customer feedback to understand pain points
Recommend targeted fixes or strategic changes
Exploring the Relationship Between Discount and Ratings
Sometimes the questions that matter most in business aren’t about totals or averages — they’re about relationships. One such question is:
“Do higher discounts lead to better ratings?”
This is where a scatter plot becomes useful. A scatter plot lets you visualize how two continuous variables move in relation to each other. In our case:
X-axis: Discount percentage
Y-axis: Product rating
Step-by-Step: Creating the Scatter Plot
Select the two columns:
One with
discount_percent
One with
rating
These are the two variables whose relationship we want to explore.
Insert chart → Scatter plot:
Google Sheets may automatically suggest a scatter chart. If not, you can select it manually from the chart type list.
What You’ll See
The result is a cloud of data points scattered across the chart. If there’s a clear upward or downward trend (a visible slope), it may indicate correlation.
In our case:
The points are widely scattered, with no clear upward or downward pattern.
Ratings vary across both low and high discount ranges.
Some products with 0% discount still received high ratings.
Others with 70%+ discount also showed mixed reviews.
What This Means (Real-World Insight)
There’s no strong correlation between discount and rating. In fact, based on the scatter plot, the relationship is close to non-existent.
This suggests that:
Customers aren’t simply rewarding discounts with better ratings.
Product quality, user experience, or expectations may matter more than price cuts.
Analyst Takeaway
This kind of finding is subtle but powerful. Being able to say:
“There’s no statistical relationship between discounting and rating performance”
…helps shape more strategic business decisions.
Instead of blindly running promotions, the team might:
Investigate product quality issues.
Focus on improving descriptions or delivery experience.
Use discounts more selectively — not as a blanket fix.
Visualizing Ratings by Country Using Geo Charts
Why Use Geographic Visualizations?
Up to this point, we have been working primarily with product-related insights — looking at categories, subcategories, ratings, and discounts. However, in a real-world business context, one of the most valuable dimensions for analysis is geographic location. Understanding how customer satisfaction varies by country can help businesses adapt their strategy for specific markets.
Let’s suppose we now want to answer the following question:
“Are customers from certain countries more satisfied with our products than others?”
To explore this, we’ll use a Geo Chart — a map-based visualization that allows us to compare values (like ratings or revenue) across geographic regions.
Step-by-Step: Creating a Geo Chart in Google Sheets
1. Ensure you have geographic data
Before we begin, we need a column in our dataset that contains country names. In this example, we have manually added a Country
column to our dataset, matching each customer rating to a country. In a real-world dataset, this may come from the shipping destination, user profile, or billing address.
Note: The data here is hypothetical, and intended for learning purposes only.
You can download this country data from Lumen github repository.
2. Select your data
Select the two columns:
Column A:
Country
Column B:
Rating
This combination tells Google Sheets that we want to map average ratings per country.
3. Insert a Geo Chart
With the two columns selected:
Click on Insert → Chart
In the Chart Editor, change the chart type to Geo Chart
If your data is valid (country names are spelled correctly and consistently), Google Sheets will automatically generate a world map highlighting each country based on its corresponding value (average rating).
How to Interpret the Geo Chart
The map is shaded using a color gradient:
Countries with higher average ratings will appear in dark green.
Countries with average or mid-level ratings appear in gray or light green.
Countries with low average ratings will appear in red or orange tones.
This allows us to instantly spot regional differences. In our sample chart, we observe:
High average ratings in countries like:
🇲🇽 Mexico (4.8)
🇸🇦 Saudi Arabia (5.0)
🇵🇰 Pakistan (4.6)
Low average ratings in:
🇨🇦 Canada (3.2)
🇨🇴 Colombia (3.0)
🇩🇪 Germany (2.0)
What Does This Tell Us?
This visualization is more than a visual flourish. It can guide real business decisions. Here’s how:
Identify problem areas
A country with a low average rating (e.g., Germany with 2.0) might indicate issues with product quality, delivery logistics, or customer expectations in that region.Tailor your marketing or customer support strategy
If certain countries consistently give high ratings, that could indicate a good product-market fit. These could be your brand advocates. On the other hand, low-rating countries might require deeper investigation — perhaps product descriptions need localization, or packaging needs to match cultural expectations.Explore customer behavior trends by geography
Certain countries might have a tendency to give lower or higher ratings regardless of the product — this could be due to cultural differences in review habits.
Switching to a Marker-Based Geo Chart
Google Sheets also offers an alternate view of the same data, known as the Geo Chart with Markers.
To switch:
Click on the geo chart
Open the Chart Editor
Change the chart type to Geo chart with markers
What changes?
Each country is now represented by a circular marker on the map.
The color of the marker represents the average rating (similar to before).
The size of the marker represents the number of data points or observations used to compute that rating.
For example, Germany may have a small red circle, indicating few data points but a low average rating.
Mexico, on the other hand, may have a large green circle, indicating many reviews and high satisfaction.
Analyst Tip: Use Maps with Caution
While geo charts are visually powerful, they should always be supported by context. A country with a single review (positive or negative) can distort your perception if not interpreted carefully.
Best practices include:
Always show or mention sample sizes, especially if you’re presenting insights to decision-makers.
Combine geo charts with pivot tables or summary stats to check if you’re over-relying on outliers.
Watch out for country names that may not be recognized by Google Sheets (e.g., spelling variations or abbreviations like “USA” vs “United States”).
Additional Real-World Retail Scenarios
Scenario | Suggested Chart |
---|---|
Comparing sales across product lines | Column or Bar Chart |
Evaluating customer satisfaction by country | Geo Chart |
Monitoring daily traffic on a website | Line Chart |
Understanding distribution of transaction sizes | Histogram |
Correlating price and customer rating | Scatter Plot |
What’s Next?
In the next session, we’ll go one step further: How to turn these visuals into a compelling data story.
Contact
Talk to us
Have questions? We’re here to help! Whether you’re curious to learn more, want guidance on applying, or need insights to make the right decision—reach out today and take the first step toward transforming your career.