Lesson 8 - Integrating Data Using Excel Lookups

Learn how to integrate multiple datasets in Excel and transform them into a single, analysis-ready view using lookup functions.

Estimated Read Time: 45 - 60 Min

Learning Goals

By the end of this lesson, you will be able to:

Technical & Analytical
  • Use VLOOKUP and XLOOKUP to integrate data across multiple sheets or files.
  • Understand HLOOKUP and INDEX-MATCH as additional lookup options.
Business Impact
  • Understand why merging datasets is often necessary to answer business questions.

In business analytics, it is rare for a single dataset to contain all the information needed to answer a question comprehensively. The data is often scattered in multiple files and you need to find a way to merge them so that you can generate insights. For example, the Rossmann sales data (rossmann-sales.xlsx) contains daily sales for each store, along with information such as customer traffic, store open status, promotions, and holidays. However, detailed information about the stores themselves – such as store type, assortment, or proximity to competitors – is stored separately in another file: rossmann-store.xlsx.

To answer questions like, “Do stores with extended assortments generate higher sales?” or “Does competition distance impact sales?”, we must combine these two datasets. Excel’s lookup functions allow us to do exactly this: merge data based on a common key, in this case, the store ID.

 

1. Deeper look into Rossmann Sales Dataset

Let’s look at a Rossmann sales dataset where we have two different files – sales and stores – with different kinds of information related to Rossmann sales and stores data. Let’s examine these files and see how we can make the most of them by combining the information.

Sales Data File Contains:

  • Store ID (1 to 100 for the 100 stores)
  • Day of the week (1=Monday, 2=Tuesday… 7=Sunday)
  • Dates for which sales and customer traffic is recorded
  • Store open status (1=open, 0=closed)
  • Promotion status (1=yes, 0=no)
  • State holiday information (0=no holiday, A=public holiday, B=Easter, C=Christmas)
  • School holiday impact (1=yes, 0=no)

 

Stores Data File Contains:

  • Store ID (corresponding to detailed store information)
  • Store type (A, B, C, D – representing different location types)
  • Assortment types (A=basic, B=extra, C=extended)
  • Competition distance (how far away is nearest competition in meters)
  • Competition opening date (year and month since competition opened)
  • Promo2 information (participation in ongoing repeated promotions, date since joined and when do promotions run every year)

 

2. VLOOKUP: Vertical Lookup Function

Now let’s answer our first business question: Do stores with extended assortment generate higher average sales compared to those with basic assortment?

The parameters important for this analysis are:

  • Store (because we want store information)
  • Assortment (the factor we’re testing)
  • Sales (the outcome we’re measuring)

 

Store ID is our common parameter – it exists in both files and will be our key for combining data.

Follow the video walkthrough (recommended) for using VLOOKUP and answer the business question. Alternatively, you can read below.

Video 1: Do Store Types Impact Sales? VLOOKUP to answer business questions

2.1. Data Preparation and Cleaning

Before we can answer business questions, we need to clean our data. Let’s make our assortment labels more meaningful by replacing codes with descriptive names.

Cleaning Process:

  1. Select the assortment column
  2. Use Find & Replace (Ctrl+H):
    • Find “a” → Replace with “Basic” (ensure “Match entire cell contents” is checked)
    • Find “b” → Replace with “Extra” 
    • Find “c” → Replace with “Extended”

 

This cleanup process is crucial because meaningful labels make analysis more intuitive and reduce errors in interpretation.

 

2.2. VLOOKUP Formula Structure:

=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])

 

2.3. Step-by-Step Implementation:

Start by first copying the store data from rossmann-store.xlsx to a new sheet in rossmann-sales.xlsx. Let’s call this sheet ‘store-data’. Then follow the steps below.

  1. Create the formula: =VLOOKUP(A2;’store-data’!A:C; 3; FALSE)
    • A2: The Store ID we want to look up
    • ‘store-data’!A:C: The table range containing Store ID and assortment data
    • 3: Return value from the 3rd column (assortment type)
    • FALSE: Exact match required
  1. Verify the result: Check that Store ID 57 returns “Extended” by cross-referencing the stores file
  2. Copy the formula down: Select the cell and drag to apply the formula to all rows

 

Cross-File VLOOKUP: Sometimes, it’s not reasonably possible to copy the data in the same file before doing lookup and you may want to access the data from its original file. When data is in separate files, modify the formula:

=VLOOKUP(A2, ‘[rossmann-store.xlsx]Sheet1’!A:C, 3, FALSE)

Where, our store data is in Sheet1 in rossmann-store.xlsx file. Notice that only second parameter in the lookup formula has changed in this case to point to the right file and the right sheet.

 

2.4. Analyzing Results with Pivot Tables

Once data is integrated, use pivot tables to answer the business question: “What is the impact of assortments on sales?”

  1. Insert Pivot Table → New Worksheet
  2. Rows: Assortment types
  3. Values: Sales (use Average, not Sum)

 

Results Interpretation:

  • Extended stores: 6153 average sales
  • Basic stores: 5025 average sales
  • Conclusion: Extended stores generate higher average sales than basic stores

 

3. XLOOKUP: The Modern Lookup Solution

One of the biggest limitation of VLOOKUP is the rigid requirement  of how the columns are arranged. The column where the data needs to be searched (e.g. store) needs to be the leftmost column in the table. Next you need to count and specify the index of column from where you’ll pick up the value (3 in the example above). XLOOKUP addresses these (and manny other) limitations of VLOOKUP and provides a more intuitive approach to data integration.

 

Let’s use XLOOKUP to answer the Business Question: What is the impact of competition distance on sales?

 

You can follow the video walkthrough (recommended) for using XLOOKUP and answer the business question. Alternatively, you can read below.

Video 2: How does Competition Impact Sales of Retail Stores? XLOOKUP to answer business questions

3.1. Data Preparation for Competition Analysis:
  1. Identify missing values: Use filters to find blank competition distances
  2. Handle missing data: Replace blanks with reasonable values (e.g., average distance of 5400m)
  3. Document assumptions: Note why you chose specific replacement values

 

3.2. XLOOKUP Formula Structure:

=XLOOKUP(lookup_value, lookup_array, return_array, [if_not_found])

 
3.3. Step-by-Step Implementation:

We have our store’s data in the sheet ‘store-data’. Follow the steps below for integrating the competition information in sales-data sheet.

  1. Create the formula: =XLOOKUP(A2; ‘store-data’!A:A; ‘store-data’!D:D; “No data”)
    • A2: The Store ID we want to look up
    • ‘store-data’!A:A: The table range containing Store ID 
    • ‘store-data’!D:D: Return value from column D (CompetitionDistance)
    • “No data”: What to return in case a store ID match is not found
  1. Verify the result: Check that Store ID  returns the right distance cross-referencing the stores file
  2. Copy the formula down: Select the cell and drag to apply the formula to all rows

 

Key Advantages:

  • No column counting required
  • Can return values from any column regardless of position
  • Built-in error handling with custom messages
  • More readable and maintainable formulas

 

3.4. Advanced Analysis: Competition Impact on Sales

After integrating competition data using XLOOKUP, create a pivot table to analyze patterns:

  1. Rows: Competition Distance
  2. Values: Sales (Average)
  3. Group distances into meaningful buckets:
    • 0-5 km: Close competition
    • 5-10 km: Moderate distance
    • 10-15 km: Far competition
    • 15-20 km: Very far
    • 20+ km: No significant competition

 

Analysis Findings: The data shows no clear pattern between competition distance and sales, indicating that proximity to competitors may not be the primary factor affecting store performance.

 

4. HLOOKUP: Horizontal Lookup Function

HLOOKUP searches horizontally across the first row of a table and returns a value from a specified row below.

When to Use HLOOKUP:

  • Data organized with categories across the top row
  • Time series data with dates/periods as column headers
  • Cross-tabulated reports where data flows horizontally

 

Syntax: =HLOOKUP(lookup_value, table_array, row_index_num, [range_lookup])

 

Example Scenario (Not Rossmann-related): Imagine we need Monthly sales summary where we have months as columns

=HLOOKUP(“March”, MonthlySales!A1:M5, 3, FALSE)

This looks for “March” in the first row and returns the value from the 3rd row of that column.

 

Key Differences from VLOOKUP:

  • Searches horizontally instead of vertically
  • Uses row_index_num instead of col_index_num
  • Table organization must have lookup values in the top row

 

5. INDEX MATCH

INDEX MATCH provides a flexible and powerful lookup solution by combining two functions: INDEX and MATCH.

Why INDEX MATCH Works:

  • Direction Independence: Can look left or right of the lookup column
  • Performance: More efficient for large datasets
  • Flexibility: Easy to modify when table structure changes
  • Multi-Criteria Capability: Can handle complex lookup scenarios

 

Basic INDEX MATCH Syntax:

=INDEX(return_range, MATCH(lookup_value, lookup_range, 0))

 

Example:

It works by first matching the right element, followed by extracting value at the returned index:

=INDEX(‘store-data’!C:C;MATCH(A2;’store-data’!A:A;0))

corresponding to above xlookup example:

=XLOOKUP(A2; ‘store-data’!A:A; ‘store-data’!D:D; “No data”) 

 

Two-Way Lookup with INDEX MATCH:

For scenarios requiring both row and column matching:

=INDEX(data_range, MATCH(row_criteria, row_range, 0), MATCH(col_criteria, col_range, 0))

 

Multiple Criteria INDEX MATCH:

=INDEX(return_range, MATCH(1, (criteria1_range=criteria1)*(criteria2_range=criteria2), 0))

(Enter as array formula with Ctrl+Shift+Enter in older Excel versions)

 

6. Best Practices and Common Pitfalls

6.1. Formula Best Practices:
  1. Always use FALSE for exact matches unless you specifically need approximate matching
  2. Use absolute references ($) for table arrays when copying formulas
  3. Implement error handling with IFERROR: =IFERROR(VLOOKUP(…), “Not Found”)
  4. Test formulas with known values before applying to entire datasets

 

6.2. Performance Optimization:
  • INDEX MATCH generally performs better than VLOOKUP on large datasets
  • Avoid volatile functions within lookup formulas
  • Consider using Excel Tables for dynamic range management
  • Sort data when using approximate match lookups

 

6.3. Data Quality Checks:
  • Verify that lookup keys are unique in the source table
  • Check for leading/trailing spaces that prevent matches
  • Ensure consistent data types between lookup and source columns
  • Validate a sample of results manually

 

7. Choosing the Right Lookup Function

7.1. Use VLOOKUP when:
  • Simple vertical lookup needed
  • Return column is to the right of lookup column
  • Working with legacy Excel versions

 

7.2. Use HLOOKUP when:
  • Data is organized horizontally
  • Lookup values are in the top row
  • Working with cross-tabulated data

 

7.3. Use INDEX MATCH when:
  • Need flexibility in lookup direction
  • Working with large datasets
  • Need to lookup left of the search column
  • Require multi-criteria lookups

 

7.4. Use XLOOKUP when:
  • Maximum flexibility required
  • Want built-in error handling
  • Working with Excel 365 or Excel 2021

 

8. Deeper Comparison of XLOOKUP vs INDEX-MATCH

As you noticed in example above, XLOOKUP can easily replace most of the tasks of INDEX-MATCH – sometimes in a more effective manner. Here is a detailed comparison on when to use which one:

 

8.1. When you can (and should) use XLOOKUP instead of INDEX+MATCH
  • You have Excel 365 or Excel 2021+ → older versions don’t have XLOOKUP.
  • You need cleaner formulas=XLOOKUP(lookup_value, lookup_array, return_array) is easier to read than nesting INDEX and MATCH.
  • You want left-lookups → unlike VLOOKUP, XLOOKUP can look left and right, so no need for INDEX+MATCH in this case.
  • You want flexible match modes → XLOOKUP has built-in options for:
    • Exact match (default)
    • Next smaller/larger item
    • Wildcards
  • You want built-in error handling → =XLOOKUP(…,”Not Found”) replaces the need for IFERROR around INDEX+MATCH.
  • You want multiple return columns → XLOOKUP can spill results (if your return array is multi-column), INDEX+MATCH can’t.

 

8.2. When INDEX+MATCH may still be better
  • Compatibility: If you (or colleagues) use Excel 2016 or older → INDEX+MATCH is safer.
  • Performance on very large datasets: In rare cases with hundreds of thousands of rows, INDEX+MATCH can be slightly faster, especially if optimized with exact-match on sorted data.
  • Array flexibility: INDEX+MATCH can be combined in advanced ways (like 2D lookups with row+column MATCH). XLOOKUP also supports this, but INDEX+MATCH may feel more natural in some matrix-style lookups.
  • Control over return reference: INDEX can return a reference rather than just a value, which allows more advanced formula chaining (XLOOKUP always returns values).

 

8.3. Rule of Thumb
  • Use XLOOKUP by default if you’re on a modern Excel.
  • Use INDEX+MATCH if:
    • You need backward compatibility
    • You’re doing unusual matrix tricks
    • You’re optimizing for legacy spreadsheets

 

In this lesson, we explored how Excel’s lookup functions enable business analysts to integrate scattered data and generate meaningful insights. We demonstrated how VLOOKUP can merge store assortment information with sales data to determine that extended assortment stores outperform basic assortment stores. We then used XLOOKUP to integrate competition data, discovering that competition distance shows no clear correlation with sales performance.

 

The key takeaway is that lookup functions are not just technical tools – they are essential business analysis capabilities that allow you to answer questions that require data from multiple sources. Whether you’re analyzing the impact of store characteristics on performance, customer behavior across different segments, or product success across various markets, mastering these lookup functions will enable you to uncover insights that drive strategic business decisions.

 

Remember that successful data integration requires both technical proficiency with the functions and analytical thinking about data quality, business context, and meaningful interpretation of results.

Summary

In this lesson you learned:

  • Business data often exists in multiple sources and must be merged to answer questions accurately.
  • VLOOKUP and XLOOKUP are essential tools for combining data based on a common key.
  • PivotTables allow you to summarize and extract insights from the integrated data.
  • Proper data cleaning, handling of missing values, and grouping are crucial for accurate analysis.
  • Understanding HLOOKUP and INDEX + MATCH provides additional flexibility for complex scenarios.

Exercise

Estimated Time to Complete: 1-2 hours

 

Dataset: 
  • Use walmart-sales.xlsx as the main fact table.

  • Use walmart-features.xlsx for contextual data (temperature, fuel price, holiday flags).

  • Use walmart-stores.xlsx for metadata (store type & size).


The details of data fields for each of the files are available at the  Lumen’s github repository.
 
Task 1: Bringing in Store Information

The sales dataset (walmart-sales) does not contain details about the store itself. Use a lookup function to add the Type and Size columns from walmart-stores into the sales dataset.

Hint: You’ll need to match on the Store column.

Reflection: Why might knowing whether a store is Type A or B (and its size) be important when comparing sales performance?

 

Task 2 – Enriching Sales with Features

Extend the sales dataset by adding Fuel_Price from walmart-features.

  • Use the combination of Store and Date as the match keys. (Hint: create a new column with combined value and use that as key)

  • Try with XLOOKUP (multi-criteria) or with INDEX+MATCH if you want to challenge yourself.

Reflection: Imagine you are preparing data for a forecasting model. Why would linking contextual variables like weather and fuel price improve the quality of your analysis?

 

Task 3 – Calculating Sales per Square Foot

Using your enriched dataset (with Store Size added from Task 1), calculate Weekly Sales per Square Foot.

  • Formula: Weekly_Sales ÷ Store Size.

  • Use lookups to ensure each sales row has the correct store size.

Reflection: Why might “sales per square foot” be a better comparison metric across stores than total sales?

 
Task 4 – Connecting All Tables

Now, bring it all together:

  • You started with  walmart-sales.

  • You’ve added Store Type and Size from walmart-stores.

  • You’ve also added Fuel_Price, and Unemployment from walmart-features.

Now, answer:

  • Which store type had the highest average weekly sales per square foot?

  • Did higher unemployment weeks correlate with lower sales? (Provide a short interpretation.)

  • How might fluctuations in weekly fuel prices influence Walmart store sales?

Reflection: This is a miniature version of what analysts do every day — stitching multiple sources into one coherent dataset. How does mastering lookup functions prepare you for larger-scale analytics projects?

Submission Guidelines

Submit your solution as an Excel workbook and a Word/PPT/PDF:

Workbook:
  • appropriate worksheets with analysis
Document:
  • executive reports and reflections

 

Filename Format:

  • YourName_Lesson8_Walmart_Lookups.xlsx
  • YourName_Lesson8_Walmart_Reflections.xxx

 

When you’re ready, submit your completed exercise to the designated folder in OneDrive. Drop your mentor a note about submission.

 

Important: Please scan your files for viruses before uploading.

 
Submission & Resubmission Guidelines
  1. Initial Submission Format: YourName_Lesson8_…
  2. Resubmission Format:
    • YourName_Lesson8_…_v2
    • YourName_Lesson8_…_v3
  3. Rubric Updates:
    • Do not overwrite original evaluation entries
    • Add updated responses in new “v2” or “v3” columns
    • This allows mentors to track your improvement process

 

Evaluation Rubric

Criterion

Exceeds Expectation

Meets Expectation

Needs Improvement

Incomplete / Off-Track

1. Bringing in Store Information

Clarity & Accuracy: Store details joined using the most efficient lookup (e.g., XLOOKUP or structured references).
Depth of Analysis: Explicitly connects store type/size to sales interpretation.
Visual Presentation & Formatting: Output is clean, professional, and analysis-ready.
Reflection & Storytelling: Strong narrative on why integrating metadata transforms raw sales into insight.
  • Store information correctly joined using an appropriate lookup method (VLOOKUP, INDEX-MATCH, or XLOOKUP).
  • Output is clear, well-labeled, and error-free.
  • Reflection notes that store details add meaningful context to sales. 
  • Lookup applied incorrectly, or results have errors/duplication.
  • Limited clarity in column labeling. Reflection superficial or generic.
  • Store details not integrated, or solution unusable.
  • No reflection.

2. Enriching Sales with Features

Clarity & Accuracy: Features integrated with well-chosen joins, ensuring data consistency.
Depth of Analysis: Insightful commentary on how new features (e.g., holiday, region) shape business questions.
Visual Presentation & Formatting: Data model tidy, with clear naming conventions.
Reflection & Storytelling: Draws a compelling link between enriched features and strategic retail decision-making.
  • Features correctly merged into sales table with appropriate joins.
  • Data is clean, labeled, and analysis-ready.
  • Reflection acknowledges why added attributes  improve analysis. 
  • Merge incomplete or inconsistently applied; unclear column naming. Reflection vague. 
  • Features missing or irrelevant; no reflection.

3. Calculating Sales per Square Foot

Clarity & Accuracy: Formula set up elegantly (e.g., dynamic references, reusable structure).
Depth of Analysis: Connects metric to store productivity and benchmarks insights across formats.
Visual Presentation & Formatting: Results formatted for immediate interpretation (e.g., 2 decimals, currency alignment).
Reflection & Storytelling: Explains retail strategy implications (e.g., high sales per sq. ft. indicating operational efficiency).
  • Formula applied correctly and consistently across dataset.
  • Results labeled clearly and formatted for readability.
  • Reflection recognizes that sales per sq. ft. enables fairer store comparisons. 
  • Errors in formula or inconsistent application; formatting unclear.
  • Reflection shallow or generic. 
  • Metric not calculated or incorrect; no reflection.

4. Connecting All Tables

Clarity & Accuracy: Tables integrated into a well-structured model (relationships or multiple lookups).
Depth of Analysis: Highlights analytical advantages of a unified data model (cross-feature insights, flexible pivots).
Visual Presentation & Formatting: Model or linked tables presented neatly with documentation.
Reflection & Storytelling: Persuasive narrative on the power of connected data for deeper business insight.

  • Tables successfully linked to allow combined analysis.
  • Relationships/joins clearly applied and documented.
  • Reflection explains why combining datasets enables richer analysis. 
  • Partial or inconsistent connections.
  • Unclear relationship between datasets.
  • Reflection weak or disconnected. 
  • Connections missing/incorrect; no reflection.

Got Feedback?

Drop us a line here.

Contact

Talk to us

Have questions or feedback about Lumen? We’d love to hear from you.