Lesson 9 - Statistical Hypothesis Testing
Conduct an inferential analysis via hypothesis testing
Estimated Read Time: 1,5 - 2 Hrs
Learning Goals
By the end of this lesson, you will be able to:
Technical & Analytical
-
Learn to set up null and alternative hypotheses and run t-tests in Excel (holiday vs. non-holiday sales, promo vs. no promo, school holiday vs. regular days).
-
Interpret p-values and decide whether observed sales differences are statistically significant.
Business Impact
-
Use hypothesis testing to validate or challenge assumptions about holidays, promotions, and school breaks.
-
Translate statistical results into staffing, inventory, and promotion decisions.
In the previous lessons, you focused on descriptive statistics, returning to the idea of data spread, together with more-advanced statistical equations such as those for variance and standard deviation.
In this Exercise, you’ll learn about another branch of statistics—inferential statistics—within which you’ll use subsets of data to draw conclusions about an entire population. Using your research hypothesis, you’ll formulate and test a different kind of hypothesis—statistical hypotheses, which are what will determine statistical significance, an aspect of analysis most project stakeholders use to inform their decisions and insights. Along the way, you’ll pick up some statistical jargon and use Excel to find and interpret p-values.
It’s time to jump back into the world of statistics!
1. Inferential Statistics
Inferential statistics refers to statistical methods that use a sample of data to draw conclusions, or infer information, about an entire population. A population is the complete group you’d ideally study — for Rossmann, that could mean all sales from all stores across all dates. In practice, analyzing an entire population is often impossible, so we use a sample — a smaller, manageable slice of that data (e.g., sales from 3 stores over 6 months) — to estimate patterns and test hypotheses about the larger group.
In this Lesson, you’re going to look in-depth specifically at sample data — as well as the kinds of conclusions you can draw from it (the key concept behind inferential statistics!). Let’s get a better idea of what this entails by looking at some everyday applications of inferential statistics.
1.1. Inferential Statistics Applications
Surveys are an intuitive example of inferential statistics. By its very definition, a survey obtains information about a subset of a population (with the exception being censuses, as they do survey an entire population). Organizations use responses from this subset to draw conclusions about the whole population. Inferential statistics is what will allow you to use a sample — e.g. a subset of children in the U.S. — to draw conclusions about all children in the U.S. Similarly, disease surveillance systems use a sample of hospitals, clinics, and medical providers to follow disease trends across an entire population.
Most manufacturing companies produce products that undergo quality testing. Testing every single product is expensive and time-consuming, which is why companies usually use inferential statistics to test a sample of the products and use those results to infer the quality of the rest of their products. The U.S., for example, requires meat-packing plants to check one in every 300 animals slaughtered for traces of the E. coli bacteria. From this sample of carcasses, they use statistics to determine whether any of the unsampled animals are likely to carry E. coli.
This sampling can be seen across all industries—even in the entertainment industry. Amazon, for instance, often tests new shows by releasing initial episodes, called “pilots,” to their Amazon Video platform and monitoring the response. The number and sentiment of this subset of viewers determines whether those shows eventually make it to Prime Video and, subsequently, a broader audience.
Inferential statistics also lets you compare samples.This happens often in medical research to determine whether a new drug or treatment protocol is more effective than the current standard of care — for instance, do knee replacement patients undergoing a new physical therapy protocol regain a greater range of motion than those undergoing traditional therapy? The answer could be found through inferential statistics!
At the end of the day, money and time contribute most heavily to the importance of inferential statistics. After all, the majority of organizations simply don’t have the capacity to gather and test data for an entire population. Your work as an analyst will almost certainly require using a sample of data to infer information about an entire population
To start, let’s take a look at one of the key ideas behind inferential statistics: hypothesis testing.
2. Hypothesis Testing
Inferential statistics is centered around the core idea of hypothesis testing. Hypothesis testing refers to the statistical testing of two sample characteristics in order to determine which one holds true about the population. In other words, it assesses whether a proposed inference makes sense. This inference, or hypothesis, is usually your research hypothesis rebranded using statistical terminology.
Let’s take a quick look at the steps included in a hypothesis test before dissecting the details in more depth:
- Form statistical hypotheses (create the null hypothesis and the alternative hypothesis).
- Identify the test statistic.
- Compute the p-value.
- Compare the p-value to the significance level (α).
Some of this terminology probably sounds a bit jargony, which can be confusing when you’re first learning about it. Don’t worry — by the time you’re through this Lesson, you should understand exactly what each of the above steps are referring to. Let’s take a look!
Step 1: Forming Statistical Hypotheses
Statistical hypotheses have already been mentioned a number of times in this Lesson, but there is another similar-sounding term called research hypothesis. These two types of hypotheses have entirely different purposes. Research hypotheses form the foundation for the project development and data selection, whereas statistical hypotheses guide the actual statistical tests you apply to said data.
The research hypothesis helped identify the intervention to test (e.g. impact of customer traffic or holidays or promotions on sales), as well as the data to use. Now, however, you need to actually test this proposed mechanism, which is where two different kinds of statistical hypotheses come into play: the null hypothesis and the alternative hypothesis.
- The null hypothesis is the characteristic that the analyst believes to be false about a population given the sample data. Think of it as the statement you’re trying to nullify.
- The alternative hypothesis is the characteristic that the analyst believes to be true about a population given the sample data. Think of it as the statement you’re trying to prove.
The null and alternative hypotheses are mutually exclusive—they can’t be the same thing. In testing for E. coli, for example, if you believed the number of E. coli cultures to be zero (alternative hypothesis), the null hypothesis would have to be anything but zero.
In hypothesis testing, you can’t prove the null hypothesis true. Instead, you prove it false. In statistical terms, you’re testing whether or not you can reject the null hypothesis. Let’s walk through this with the Rossmann example.
Suppose you want to evaluate the impact of promotions at Rossmann stores. While promotions are intended to boost sales, you want to verify whether they actually increase average daily sales beyond normal fluctuations. Therefore, you formulate the following null hypothesis (i.e., the statement you believe to be false):
Null Hypothesis (H₀)
Promotions have no effect on mean daily sales.
In statistical terms, this null hypothesis would be written as follows:
Where:
-
= average daily sales on promotion days
-
= average daily sales on non-promotion days
Next, you formulate the alternative hypothesis (i.e., the statement you believe to be true). This would be written as follows:
Alternative Hypothesis (HA)
Promotions increase mean daily sales.
In statistical terms, this alternative hypothesis would be written as follows:
Before moving on, there’s one final thing you should note about forming hypotheses. The null hypothesis indicates either a one-tailed test or a two-tailed test. A one-tailed test refers to a test where the sample mean could be higher or lower than the population mean — you’re only interested in one direction. In the Rossmann example, you only care if the promo sales are more than non-promo sales.
A two-tailed test, on the other hand, refers to a test where the sample mean could be higher and lower than the population mean. In these tests, you’re interested in both directions — above and below the population mean.
In manufacturing individual components of a machine, each component must be of a certain size. If the component is too big or too small, it won’t fit with the rest of the machine’s components. Many quality-control measures test a sample of these components to ensure they’re of the acceptable size. Suppose, for instance, that a circular component needs to have a diameter of 20 millimeters. The quality testing would look for components that are too big or too small—in other words, any diameter that isn’t 20 millimeters. This would be the alternative hypothesis (assuming that the component’s diameter isn’t 20 millimeters), while the null hypothesis would be the opposite—that the diameter does equal 20 millimeters:
Null Hypothesis:
Alternative Hypothesis:
Two-tailed tests usually have a null hypothesis with an equality sign (=), whereas one-tailed tests rely on greater than or equal to (≥) or less than or equal to (≤) signs. The null hypothesis should always include an equality check (=, ≤, ≥), while the alternative hypothesis uses only the inequality (≠, <, >).
For example, in the Rossmann promotion scenario:
-
Null Hypothesis (H₀):
→ assumes promotions have no effect on sales
-
Alternative Hypothesis (HA):
→ tests whether promotions increase sales
Notice how the null includes equality, while the alternative does not. This is a clear way to distinguish which statement is the baseline (H₀) and which is the effect you are testing (HA).
2.1.1. Independent and Dependent Variables
When formulating your research hypothesis, you need to explore the relationship between a proposed solution and an objective. The sales growth hypothesis, for example, examines the relationship between the proposed solution (promotions) and the goal (sales increase). Another way to frame this idea (and one you’ll more commonly encounter with statistical hypotheses) is by way of independent and dependent variables.
An independent variable is a variable that changes during an experiment. In the above example, the promotion variable is changed (1 / 0) to see if it had any impact on sales. The dependent variable, then, is the variable being measured or tested. In this case, the dependent variable is sales.
2.1.2. Null Hypothesis Practice
Let’s pause here to reflect on the intuitiveness of statistical notation — we’ll give you a hint: it’s not. In fact, you may find statistical notation to be one of the most unintuitive and most difficult things about this course! Statistical notation isn’t something you can pick up just by looking at a few examples: it takes practice, practice, and more practice. For that reason, let’s take a few minutes to practice creating statistical hypotheses! Read the following research hypotheses without looking at their corresponding statistical hypotheses and try to come up with what the statistical hypotheses would be. Once you’ve taken a guess, check the answers in blue:
If a patient takes drug x, their blood pressure will decrease.
- Null Hypothesis: Taking drug x will not decrease blood pressure.
- Alternative Hypothesis: Taking drug x will decrease blood pressure by 10 units.
*This is a one-tailed test because you’re only looking for a decrease in blood pressure.
More younger people (≤ 30 years) book flights on their phones than older people (> 30 years).
- Null Hypothesis: People 30 years and younger book the same or fewer flights on their phones as people older than 30 years.
- Alternative Hypothesis: People 30 years and younger book more flights on their phones than people older than 30 years.
*This is a one-tailed test because you’re assuming directionality—that younger people book more flights (not just that they book a different amount) than older people.
A news article claims that the mean birth length for babies born in Europe is 50 cm. You want to see if this holds true in your country, so you take a random sample of baby lengths and find a sample average of 60 cm.
- Null Hypothesis: The average birth length in your country is 50 cm.
- Alternative Hypothesis: The average birth length in your country isn’t 50 cm.
*This is a two-tailed test because you’re looking for whether the average length is longer or shorter than 50 cm.
Step 2: Identifying the Test Statistic
In order to test your null hypothesis, you need what’s called a test statistic. The test statistic is the random variable that helps assess the similarity between your statistical hypotheses and the sample data. Different types of data and questions employ different types of test statistics, but you’ll only be covering the most common ones below.
Let’s start with z-scores. Recall that the normal distribution refers to the distribution of data when it’s spread symmetrically around the middle, or mean, value. The z-score, meanwhile, is a measure of a data point’s distance from the mean, measured in standard deviations:
Where:
-
= sample mean
-
= population mean
-
= population standard deviation
-
= sample size
Suppose you want to test whether promotions increase daily sales (hypothetical values below).
-
Average daily sales on non-promo days (population mean, ) = €10,000
-
Sample of 10 promo days with average daily sales () = €11,200
-
Population standard deviation () = €1,500
The z-score for the sample mean is:
Interpretation
-
A z-score of 2.53 means the sample mean of €11,200 is 2.53 standard deviations above the population mean (€10,000).
-
A z-score of 0 would mean the sample mean equals the population mean.
-
The larger the z-score, the more unlikely it is that the difference between the sample and population mean happened by chance, assuming the null hypothesis is true.
This z-score becomes your test statistic, which you can then use to calculate the p-value and determine whether to reject the null hypothesis that promotions have no effect on sales.
More commonly than the z-score, you’ll use what’s called the “t-score,” a similar statistic with more flexibility. The t-score (and t-test) is very similar to the z-score (and z-test), and you’ll be exploring it later in this lesson.
2.2.1. Why Use a Test Statistic?
Now, you might be asking yourself: if test statistics compare means, then why can’t you simply compare the sample mean to a value or other sample? Why do you need to go through all these extra steps? Well, consider the following distributions in Figure 1:
The dark purple lines show the distribution of the mean for one group, and the light purple lines show the distribution of the mean for a second group. The distributions vary in height and width based on the data variance and standard deviation. Even though the means across all three variabilities are the same distance apart, the significance of that difference varies. In the bottom, low-variability scenario, there’s almost no overlap between the dark and light distributions. This varies with the high variability middle diagram, where the two distributions overlap greatly and the same difference in means isn’t as significant.
A straight comparison of the means, or just the numbers, wouldn’t capture this variance. This is why you need the test statistic, which captures the difference in means relative to the variability, or data spread:
The top part of the formula looks at the difference in means, while the bottom part accounts for the data variability.
Step 3: Computing the p-value
A z-score far from 0 is unlikely to happen by chance. Quantifying this “unlikeliness” is a way to say how certain you are that the result wasn’t random, giving a statistical certainty, or significance level, to your confidence in the results. This is where the p-value comes into play.
The p-value is a measure of the strength of the evidence against the null hypothesis. Recall that the null hypothesis is the statement you believe to be false. You want to test how strong of a case you can make for the null hypothesis to actually be false. Smaller p-values indicate more reliable, or significant, results because the p-value measures the probability of something happening by chance—in other words, a low p-value equals a low probability that your null hypothesis will be correct (and actually be true). Scratching your head yet?
Because you’re working with sample data, your data won’t always reflect the entire population. Think of it this way: in flipping a coin, you have a 50-50 chance of getting heads. If you were to flip a coin twice in a row, you might expect to get heads once and tails once (because both outcomes are equally likely); however, this isn’t actually the case. Every time you flip a coin, you still have a 50-50 chance of getting heads—it doesn’t increase if you get tails. The more times you flip a coin, the more likely the split between heads and tails outcomes will be 1:1. In smaller samples, however, you may have more heads than tails or more tails than heads. Say that you flip two heads in a row. It’s the p-value that will tell you how likely it is that those two heads are due to random chance or due to something else, such as the coin being weighted.
Let’s return to the Rossmann promotion example. Using the z-score we calculated earlier (), you now compute the p-value (you’ll learn how to do this in Excel shortly!). Suppose the p-value comes out to 0.0057. What does this mean? It means that if the null hypothesis were true (promotions have no effect and average sales really are €10,000), there would be only a 0.57% chance of observing a sample mean as high as (e.g.) €11,200 due to random chance alone.
This is a little complicated! So let’s break it down even more. Remember that the null hypothesis is what you believe to be false—you actually believe promotions increase daily sales above €10,000. And from your sample, you found that the mean sales during promotions came out to be €11,200. How do you know that this €11,200 isn’t just due to chance and is, in fact, due to promotions actually boosting sales beyond €10,000? This is where the p-value comes in. There’s only a 0.57 percent chance that the mean result of €11,200 you obtained from your sample is due to chance. This means there’s a 99.43 percent chance that the sample results are indicative of the true effect of promotions (i.e., higher than the initial average of €10,000).
In this way, a lower p-value indicates a more meaningful result because it’s less likely to be caused by random chance, or “noise.”
Step 4: Comparing the p-value to the Significance Level
The p-value helps determine the significance by relating it to the significance level (i.e., alpha, or α). The significance level is the degree of certainty you want, usually determined before any hypothesis testing begins. Because you’re using sample data, which will never exactly match population data, 100 percent certainty simply isn’t possible. This is why a more common standard is used instead: a 5 percent level of significance, or α = 0.05. This is also referred to as a 95 percent confidence level. Likewise, if you have a significance level of 0.1, or α = 0.1, you have a confidence level of 90 percent, and so on.
If the p-value is greater than the significance level, this means you’ve failed to reject your null hypothesis (the results weren’t significant enough). If the p-value is less than or equal to the significance level, however, then you’ve successfully rejected your null hypothesis (the results were significant). No matter whatever, however, you need to compare your p-value to the significance level. You’ll cover this process in more detail shortly. For now, just focus on how the p-value and significance level relates to the null hypothesis as you review the example below:
Your alternative hypothesis was set up for a one-tailed test, and you decided in advance to use a significance level (α) of 0.05.
HA: μ > 10,000
For the null hypothesis, you want to determine the chance that the true population mean is greater than what you calculated from your sample. If so, you reject the null hypothesis.
H0: μ ≤ 10,000
When comparing the p-value to the significance level, α, you find that the p-value is less than α (0.0057 < 0.05). Therefore, you reject the null hypothesis. You can say, with at least 95 percent confidence, that promotions do, in fact, increase average daily sales beyond €10,000.
Great! You’ve now run through the entire process of forming null and alternative hypotheses, identifying the test statistic (in this case, the z-score), computing the p-value, and comparing the p-value with the significance level. By following these steps, you can know whether to reject—or not reject—your null hypothesis.
To follow up, let’s take a look at test statistics in more detail.
3. T-tests and T-scores
Until now, you’ve been using the z-score (and, subsequently, the z-test) to test your hypothesis, but there are a few limitations to z-scores you should know. First, to calculate a z-score, the data ideally needs to be normally distributed, or at the very least, the sample size should be greater than 30 (from the Central Limit Theorem). With larger samples, the sampling distribution of the mean tends to approximate a normal distribution, even if the underlying sales data isn’t perfectly normal.
Second, the z-test requires knowing the population variance or standard deviation. In most real-world cases, we don’t know the population standard deviation, which makes the z-test method impractical.
If you already had full population data, you wouldn’t need inferential statistics at all — you could just calculate the exact mean and variance. But since we usually only have a sample of sales data, we need a method that accounts for this uncertainty. That’s where the t-test comes in. The t-test uses the sample standard deviation instead of the population standard deviation and adjusts for smaller sample sizes, making it ideal for analyzing real sales data from a subset of stores or months.
3.1. Student’s T-distribution
With z-tests, you needed to use the normal distribution, but with t-tests, you’ll use Student’s t-distribution. While similar to the normal distribution, the t-distribution differs in that it has wider tails. This is because the population variance is unknown (Consider Figure 2).
The larger the sample size, the more the t-distribution will look like the normal distribution.
What’s up with the name Student’s t-distribution?
Curious where the “student” comes from in Student’s t-distribution? The t-distribution was initially discovered by the statistician William Gosset back in 1908, but Gosset’s employer, Guinness, didn’t allow him to publish his results under his own name, which prompted him to publish everything under the name “Student” instead. To this day, the name has stuck and become the official name for the t-distribution.
3.2. Types of T-tests
While there are several versions of the t-test, they all share one common feature: they use the t-score (or t-statistic) to compare means. Let’s go through the three most common types using Rossmann store sales examples.
1. Two-sample t-test
The two-sample t-test compares the means of two independent groups. It comes in two types: equal and unequal variance. Assuming unequal variance is more conservative (less likely to falsely detect significance) and is generally a safe choice.
Rossmann example: Suppose we want to compare average daily sales during promotion days vs. non-promotion days across multiple stores.
-
Null Hypothesis (H₀): Average sales on promotion days = average sales on non-promotion days.
-
Alternative Hypothesis (Hₐ): Average sales on promotion days > average sales on non-promotion days.
2. One-sample t-test
The one-sample t-test compares the mean of a single group against a known or hypothesized population mean.
Rossmann example: Suppose management believes the average daily sales per store should be €10,000. We take a sample of stores to test this assumption.
-
Null Hypothesis (H₀): Average daily sales = €10,000.
-
Alternative Hypothesis (Hₐ): Average daily sales > €10,000.
3. Paired t-test
The paired t-test compares the means from the same group at two different times. This is relevant for repeated measurements or “before-and-after” comparisons.
Rossmann example: Suppose we measure average daily sales for the same stores before and after a major holiday promotion to see if the promotion increased sales.
-
Null Hypothesis (H₀): Average daily sales after promotion = average daily sales before promotion.
-
Alternative Hypothesis (Hₐ): Average daily sales after promotion > average daily sales before promotion.
3.3. T-score
The overall structure of hypothesis testing stays the same: you still need a null hypothesis, an alternative hypothesis, and a significance level. What changes is the test statistic. In z-tests, you used the z-score; in t-tests, you’ll use the t-score.
The formula for the t-score looks almost identical to the z-score, but with one important difference: instead of using the population standard deviation, it uses the sample standard deviation to calculate the standard error. The standard error accounts for both the variability in the sample and the number of observations (sample size).
Where:
-
= sample mean
-
= population mean under H₀
-
= sample standard deviation
-
= sample size
At first glance, this looks just like the z-score. The key difference is that the z-score assumes you know the true population standard deviation—which is rarely the case in practice. The t-score, by contrast, substitutes the sample’s own standard deviation, which makes it more flexible but also more uncertain. That extra uncertainty is why the t-distribution is a bit wider than the normal distribution, especially for small sample sizes.
A larger t-score means the observed mean is farther away from the null hypothesis mean in terms of standard errors—making it less likely the result is due to chance. A smaller t-score means the sample mean is closer to what you’d expect under H₀, suggesting any difference is likely just random noise.
Like the z-score, you can calculate the t-score directly in Excel, but in practice you’ll usually run the full t-test in one step instead of computing the statistic separately.
4. Using Your Results
In the Rossmann case, your research hypothesis was that promotions would increase average daily sales compared to days without promotions. The statistical test confirmed this:
-
Mean sales without promotions: ≈ €4,177
-
Mean sales with promotions: ≈ €7,767
-
t-statistic: -178 (huge in magnitude)
-
p-value: effectively 0 (well below any standard α, such as 0.05 or even 0.01)
This tells you with extremely high confidence that promotions significantly increase sales. In other words, the uplift you see isn’t due to random variation in the data—it’s a genuine effect.
For business stakeholders, this result validates the assumption that promotions drive additional revenue. The size of the uplift is also meaningful: sales nearly double on promotion days compared to non-promotion days. This is exactly the kind of evidence decision-makers look for when allocating budgets, designing campaigns, or evaluating marketing ROI.
At this point, the next steps aren’t about questioning whether promotions work (the hypothesis test has answered that convincingly), but rather:
-
How much do promotions pay off in net terms? (considering costs, margins, and cannibalization of future sales)
-
Are some types of promotions more effective than others?
-
Do all stores see this effect, or is it concentrated in certain regions or store formats?
At a significance level of α = 0.05 (95% confidence), you found a highly significant difference in sales between promotion and non-promotion days. Specifically, promotions increase average daily sales by over €3,500 per store.
4.1. What if the result had been insignificant?
If the t-test had returned a p-value greater than your significance level (e.g., p > 0.05), this would indicate that the observed difference in sales could plausibly be due to random variation. In that case:
-
The hypothesis that promotions increase sales would not be supported by the data.
-
You’d need to investigate further: maybe certain stores or time periods show no effect, or the promotion type wasn’t effective.
-
You could also consider whether additional data or other variables (seasonality, competitor actions, product availability) might reveal a hidden pattern.
Even in this case, the process is valuable—your stakeholders understand that the tested hypothesis either holds or doesn’t, and further analysis can be guided accordingly.
4.2. Communicating Findings to Stakeholders
Once you have your results, it’s crucial to present them clearly and effectively to business stakeholders. Focus on translating statistical outcomes into actionable business insights. Highlight key numbers—such as mean sales with and without promotions, the t-statistic, and the p-value—while explaining what they signify in plain language. Use visuals like simple charts or tables to illustrate the impact, emphasizing the magnitude and direction of the effect. Always contextualize findings: explain not only whether the hypothesis was supported, but also what it means for decision-making, planning, and potential next steps. Clear communication ensures that stakeholders understand the significance of your analysis and can confidently use it to guide strategy.
Consider below a sample report sent to the Rossmann leadership with the findings and recommendations.
5. Bringing it Together
The above theory is demonstrated in a video walkthrough below. You’ll see:
- how to create the Null and Alternative hypothesis and prepare data for the Rossmann sales-promo example above
- how to use Excel to generate test scores and
- how to interpret these results
Video 1: Hypothesis Testing using Excel
Summary
In this lesson you learned to:
-
Understand inferential statistics and when to use samples to draw conclusions about a population.
-
Formulate null and alternative hypotheses for one-tailed and two-tailed tests.
-
Calculate and interpret t-scores in Excel.
-
Compute p-values and compare them against significance levels to make decisions.
-
Translate statistical test results into insights about sales performance.
-
Evaluate the effect of holidays, promotions, or other events on store sales.
-
Provide evidence-based recommendations to stakeholders.
Suggested Readings & References
Exercise
Estimated Time to Complete: 1-2 hours
Scenario: Holiday Impact on Walmart Sales
Walmart wants to understand whether sales are higher during holiday weeks compared to regular weeks. This will help them plan staffing, inventory, and promotions more effectively.
Objective: Use historical weekly sales data to test whether holidays have a significant effect on sales.
Task 1 – Formulate Your Hypotheses
-
Identify the independent variable (Holiday vs. Non-Holiday) and the dependent variable (Weekly Sales).
-
Formulate null and alternative hypotheses for whether weekly sales are significantly different during holiday weeks.
-
Decide whether a one-tailed or two-tailed test is appropriate and justify your choice in one or two sentences.
Task 2 – Conduct the Statistical Test in Excel
-
Use the
walmart-salesfile and the t-test of your choice to compare sales between holiday and non-holiday weeks. -
Report the key statistics: mean sales for each group, t-statistic, p-value, and sample sizes.
-
Ensure your analysis is reproducible and organized so someone else could follow your logic.
Task 3 – Interpret and Present Your Results
-
Based on your statistical results, decide whether to reject or fail to reject the null hypothesis.
-
Summarize the business implications of your findings: staffing, inventory, or marketing decisions that could be influenced by holiday sales patterns.
-
Present your conclusions in a short paragraph suitable for a stakeholder report — no raw numbers, just insights and recommendations.
Submission Guidelines
Submit your solution as an Excel workbook and a Word/PPT/PDF:
Workbook:
- appropriate worksheets with analysis
Document:
- executive reports and reflections
Filename Format:
- YourName_Lesson9_Walmart_HypothesisTesting.xlsx
- YourName_Lesson9_Walmart_Reflections.xxx
When you’re ready, submit your completed exercise to the designated folder in OneDrive. Drop your mentor a note about submission.
Important: Please scan your files for viruses before uploading.
Submission & Resubmission Guidelines
- Initial Submission Format: YourName_Lesson9_…
- Resubmission Format:
- YourName_Lesson9_…_v2
- YourName_Lesson9_…_v3
- Rubric Updates:
- Do not overwrite original evaluation entries
- Add updated responses in new “v2” or “v3” columns
- This allows mentors to track your improvement process
Evaluation Rubric
| Task | Exceeds Expectation | Meets Expectation | Needs Improvement | Incomplete / Off-Track |
|---|---|---|---|---|
| 1. Formulate Hypotheses | Null and alternative hypotheses are correctly formulated, with nuanced justification for one-/two-tailed choice, explicitly link business context (holiday impact) to variables, and consider possible confounding factors. | Hypotheses are correct, clearly stated, aligned with scenario, and demonstrate professional-level understanding of statistical testing principles. | Hypotheses are partially correct or vague; some variable relationships or test direction unclear. | Hypotheses are missing, incorrect, or irrelevant to the scenario. |
| 2. Perform Excel Analysis | Correct t-test(s) are applied; selects most appropriate type; handles sample data with care; identifies and comments on any anomalies or outliers; clearly documents steps for reproducibility. | T-test(s) are correctly applied, sample data is properly used, and outputs (t-statistic, p-value) are correctly extracted without errors. | T-test(s) are misapplied, outputs misinterpreted, or sample selection flawed. | Analysis missing, Excel not used appropriately, or outputs irrelevant. |
| 3. Interpret and Present Results | Provides in-depth interpretation connecting statistical outcomes to actionable business decisions; identifies limitations; communicates professionally for stakeholders with clear visual or textual summary; suggests thoughtful next steps. | Interpretation is accurate, complete, and clearly communicates the impact of holidays on sales; recommendations are reasonable. | Interpretation is incomplete, partly inaccurate, or business implications unclear. | Interpretation missing, incorrect, or fails to connect results to business context. |
Got Feedback?
Contact
Talk to us
Have questions or feedback about Lumen? We’d love to hear from you.