Top 40 Data Analytics Interview Questions (SQL, Excel, Power BI): The Ones They Actually Ask
I once interviewed a candidate for a data analyst role. His resume was solid. His certifications were impressive. I asked him to write a SQL query. A simple JOIN with a GROUP BY. He froze. Not the nervous pause kind of freeze. The kind where you can see someone realizing they have never actually written a query outside of a guided tutorial. After about thirty seconds, he said, "I usually just use the export button in the analytics tool."
I did not hire him. Not because he was not smart. Because the job required pulling data from databases that do not have export buttons. The tools are conveniences. The fundamentals are the job.
That experience, and many others like it, shaped how I think about data analytics interview questions. The real ones. Not the theoretical ones from some textbook. The questions that test whether you have actually done the work or just watched videos about it. SQL. Excel. Power BI. The three tools that run the business world.
I have put together forty of them. Split across the tools and some scenario questions at the end. Do not memorize the answers. Use them to check if you actually understand the concepts. If you can explain something clearly without jargon, you know it. If you cannot, the definition in your head is just decoration.
The SQL Questions You Cannot Escape
SQL is the most important tool in analytics. Period. If you cannot write a query, you cannot get data. If you cannot get data, you cannot analyze it. These questions test whether you have written real queries or just watched someone else write them.
What is SQL and why do data analysts use it?
Structured Query Language. It is how you communicate with databases. Analysts use it to pull data, clean data, join tables, and prepare data for analysis. Without SQL, you are dependent on other people to give you data. That dependency makes you slower and less valuable.
What is the difference between SELECT and SELECT DISTINCT?
SELECT returns all rows. SELECT DISTINCT returns only unique rows. Duplicates are removed. It is useful for finding unique values in a column, like all the different product categories in a sales table. But be careful. If you use DISTINCT on multiple columns, it looks for unique combinations, not unique values in each column.
Explain the difference between WHERE and HAVING.
WHERE filters individual rows before any grouping happens. HAVING filters groups after GROUP BY has done its work. You use WHERE for things like "only show sales from last month." You use HAVING for things like "only show product categories that sold more than a hundred units." You cannot use aggregate functions like SUM or COUNT in WHERE. That is what HAVING is for.
Write a query to find the second highest order value from a sales table.
There are a few ways. A subquery with LIMIT and OFFSET works. SELECT DISTINCT order_value FROM sales ORDER BY order_value DESC LIMIT 1 OFFSET 1. Or use a window function like DENSE_RANK. The interviewer wants to see that you can think beyond basic SELECT-FROM-WHERE.
What is a JOIN and what types have you used?
A JOIN combines rows from two or more tables based on a related column. INNER JOIN returns only matching rows. LEFT JOIN returns all rows from the left table and matching rows from the right. Non-matching rows from the right show as NULL. RIGHT JOIN is the reverse. FULL OUTER JOIN returns everything. I use LEFT JOIN the most. Most real-world queries involve some kind of JOIN.
What is a subquery? When would you use one?
A query nested inside another query. Used in WHERE, FROM, or SELECT clauses. You might use a subquery to filter based on a calculation, like finding customers whose average order value is above the overall average. They are powerful. They can also be slow if written badly. Sometimes a JOIN or a CTE is a better choice.
What are aggregate functions? Name a few.
Functions that perform a calculation on a set of rows and return a single value. COUNT, SUM, AVG, MIN, MAX. They are almost always used with GROUP BY. The GROUP BY defines the groups. The aggregate function calculates the summary for each group.
What is GROUP BY and why is it important?
It groups rows that have the same values in specified columns. It is how you summarize data. Sales by region. Average order value by customer segment. Count of users by signup month. Most analytical queries use GROUP BY. Without it, you are just looking at raw rows.
What is the difference between COUNT(*) and COUNT(column)?
COUNT() counts all rows, including rows with NULL values. COUNT(column) counts only rows where that column is not NULL. The difference matters when your data has missing values. If you are counting customers and some records have NULL customer IDs, COUNT() and COUNT(customer_id) will give you different numbers.
How do you handle NULL values in SQL?
You cannot use equals to check for NULL. NULL is not a value. It is the absence of a value. Use IS NULL or IS NOT NULL. COALESCE replaces NULLs with a specified default value. NULLIF returns NULL if two expressions are equal. These functions are essential for data cleaning.
What is a Common Table Expression (CTE)?
A temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It makes complex queries more readable. Instead of nesting five subqueries, you define each step as a CTE and then combine them. It is like breaking a big problem into smaller pieces.
Explain window functions and give an example.
Functions that perform calculations across a set of rows related to the current row. Unlike GROUP BY, they do not collapse rows. You still see every row, plus the calculation. ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, SUM with OVER. Example. Calculating a running total of sales by month. Or ranking customers by order value within each region.
What is an index and why is it important?
A database structure that speeds up data retrieval. Like an index in a book. Without it, the database scans every row to find what you need. With it, the database knows exactly where to look. Indexes make queries faster. They also slow down inserts and updates because the index must be maintained. It is a tradeoff.
How would you optimize a slow-running query
First, look at the execution plan. It shows where the time is going. Check if the query is scanning entire tables when it could use an index. Avoid SELECT *. Only pull the columns you need. Filter early with WHERE to reduce the data volume. Avoid functions on indexed columns in the WHERE clause because that prevents index usage.
Write a query to find duplicate records in a table.
GROUP BY the columns that should be unique. Use HAVING COUNT(*) greater than one. This returns the groups that appear more than once. This is a practical task that comes up constantly when cleaning data. Every analyst should be able to write this without thinking.
The Excel Questions That Still Matter
Excel is not going anywhere. People keep predicting its death. It keeps surviving. Many stakeholders live in Excel. Reports are shared as Excel files. Quick ad hoc analyses happen in Excel. Knowing it well is a career advantage.
What is VLOOKUP and what are its limitations?
It searches for a value in the first column of a range and returns a value from a specified column in the same row. Limitations. It only looks right, never left. It is slow on large datasets. It breaks if you insert or delete columns because it uses column index numbers. XLOOKUP fixes most of these problems. If you mention XLOOKUP, the interviewer knows you are current.
What is the difference between VLOOKUP and INDEX-MATCH?
INDEX-MATCH is more flexible. It can look left or right. It is not dependent on column position. It is faster on large datasets. It is slightly harder to learn, which is why VLOOKUP remains popular despite being worse. Knowing both is a good signal.
What is a PivotTable and how do you use it?
A tool that summarizes large datasets interactively. Drag fields into rows, columns, values, and filters. It creates summaries without formulas. Sales by region. Count of orders by month. Average order value by customer segment. It is fast, flexible, and the first thing I use when exploring a new dataset in Excel.
How do you remove duplicates in Excel?
Data tab, Remove Duplicates button. Select the columns to check for duplicates. But before that, always make a copy of your data. Removing duplicates is destructive. You cannot undo it easily. Also understand what Excel considers a duplicate. It checks the selected columns, not the entire row necessarily.
What is conditional formatting and how have you used it?
It changes cell formatting based on rules. Highlight cells above a threshold. Color scales for values. Data bars for in-cell visualization. I use it to highlight outliers, flag values that need attention, and make tables scannable. It is simple but effective.
What are some common Excel functions you use regularly?
SUMIF, COUNTIF, AVERAGEIF for conditional calculations. TEXT functions like LEFT, RIGHT, MID, CONCATENATE for cleaning data. DATE functions like EOMONTH, DATEDIF for date calculations. IFERROR to handle errors gracefully. Name the functions and briefly explain when you use each.
How do you create a chart in Excel and what makes a chart effective?
Select data, insert chart. An effective chart has a clear title, labeled axes, a readable scale, and no unnecessary decoration. Remove gridlines if they do not add value. Use color intentionally, not just for decoration. The chart should make the insight obvious within seconds.
What is the difference between a relative and an absolute cell reference?
Relative references change when you copy a formula. Absolute references, with dollar signs, stay fixed. A1 is relative. $A$1 is absolute. $A1 and A$1 are mixed. Not knowing this leads to broken formulas and wrong results. It is basic but crucial.
How do you handle a large dataset that slows down Excel?
Remove unnecessary formatting. Use tables instead of raw ranges. Avoid volatile functions like INDIRECT and OFFSET. Consider using Power Query to load and transform data without bringing it all into the sheet. If it is millions of rows, Excel is the wrong tool. Use SQL or Python.
What is Power Query and why is it useful?
A data connection and transformation tool built into Excel and Power BI. It lets you import data from various sources, clean it, reshape it, and load it. The steps are recorded and repeatable. Refresh the query and the data updates without redoing the work. It is a bridge between basic Excel and more advanced analytics.
The Power BI Questions That Test Real Skills
Power BI is the most common visualization tool in the Microsoft ecosystem. Knowing it means you can build dashboards that stakeholders actually use.
What is Power BI and how does it differ from Excel?
Power BI is a business intelligence tool for building interactive dashboards and reports. Excel is a spreadsheet tool that can also do analysis and charts. Power BI handles larger data volumes. It connects to more data sources natively. It is designed for sharing and collaboration. Excel is designed for individual analysis. They complement each other.
What are the main components of Power BI?
Power Query for data import and transformation. Power Pivot for data modeling and DAX calculations. Power View and Power Map for visualization. The Power BI Service for sharing and collaboration online. Understanding the components helps you know which tool to use for which task.
What is DAX and give an example of a DAX formula you have used.
Data Analysis Expressions. A formula language for Power BI, Power Pivot, and Analysis Services. It is used to create calculated columns, measures, and tables. Example. Total Sales equals SUM of Sales Amount. Or a more complex one. Year-over-Year Growth. CALCULATE with DATEADD. DAX is the most challenging part of Power BI to learn well.
What is the difference between a calculated column and a measure?
A calculated column is computed row by row during data refresh and stored in the model. It does not change with filter context. A measure is computed at query time and responds to filters and slicers. Measures are more flexible. Use calculated columns for static categorizations. Use measures for aggregations that need to respond to user interactions.
What is a star schema and why is it important?
A data model design with a central fact table surrounded by dimension tables. Fact table contains transactions or events. Dimension tables contain descriptive attributes like customer details, product info, date attributes. It makes models faster, simpler, and easier to understand. It is the standard approach for analytical data models.
How do you handle relationships between tables in Power BI?
Relationships are created in the Model view. One-to-many is the most common. The dimension table is on the one side. The fact table is on the many side. Filter direction matters. Usually filters flow from dimension to fact. Understanding relationships is fundamental to building accurate reports.
What visualizations do you use most often and why?
Bar and column charts for comparisons. Line charts for trends over time. Card visuals for key metrics. Tables and matrices for detailed data. Slicers for user interactivity. The choice depends on what question the visualization is answering. A good dashboard has a mix, each visual serving a clear purpose.
How do you improve the performance of a slow Power BI report?
Reduce the data volume. Only import columns you need. Filter rows at the source. Optimize DAX. Avoid complex calculated columns when measures would work. Use a star schema. Check the performance analyzer to identify slow visuals and queries. Performance tuning is an advanced skill.
What is the difference between Power BI Desktop and Power BI Service?
Desktop is the free authoring tool installed on a local machine. Service is the cloud-based platform for sharing, collaborating, and consuming reports. You build in Desktop. You publish to Service. Service also allows scheduled data refreshes and dashboard creation.
How do you ensure data accuracy in a Power BI report?
Validate against source data. Check totals against known benchmarks. Test filters and slicers to make sure they behave correctly. Use version control for datasets. Document assumptions. Have someone else review the report before it goes to stakeholders. There is no substitute for thorough checking.
The Scenario and Behavioral Questions
These test whether you can apply your skills to real situations. The tools matter, but judgment matters more.
You are given a messy dataset and asked to find insights. Walk me through your approach.
First, understand the business context. What question are we trying to answer? Then explore the data. Look at the columns, data types, missing values. Clean as needed. Then do exploratory analysis. Trends, distributions, relationships between variables. Then form hypotheses and test them. Finally, summarize findings in a clear way tied to the original business question.
How do you explain a technical analysis to a non-technical stakeholder?
Start with the conclusion. Not the methodology. "We found that customers who contact support within the first week are three times more likely to cancel." Then explain the evidence in simple terms. Use analogies. Use visuals. Avoid jargon. Check for understanding. Be patient with questions.
Tell me about a time your analysis led to a business decision.
Have a specific example ready. What was the problem. What data did you analyze. What did you find. What decision was made based on your work. What was the result. Even if the example is from a project rather than a job, the structure matters. Problem, analysis, recommendation, outcome.
How do you prioritize when you have multiple data requests?
Consider business impact and urgency. Talk to the requesters to understand the context, not just the ask. Communicate timelines. Do not overpromise. If everything is urgent, escalate and ask for prioritization from a manager. Transparency prevents disappointment later.
What do you do if the data you need does not exist or is unreliable?
First, verify that it truly does not exist. Check all sources. Talk to people who might know. If it genuinely does not exist, see if there is a proxy. An imperfect metric that correlates with what you need. Communicate the limitation clearly. If the data exists but is unreliable, document the issues and proceed with appropriate caveats. Do not present bad data as good.
A Quick Preparation Checklist
One. Write SQL queries by hand. On paper or in a plain text editor. No autocomplete. This forces you to actually know the syntax.
Two. Open Excel. Build a PivotTable. Write a VLOOKUP. Create a chart. Do not just watch tutorials. Do the thing.
Three. Download Power BI Desktop. It is free. Build a dashboard with some public dataset. Publish it. Share the link. Having a live dashboard in your portfolio is impressive.
Four. Practice explaining your analysis out loud. Record yourself. Hear how you sound. Are you clear? Are you using jargon? Would a non-technical person understand?
Five. Have stories ready for the scenario questions. Real examples from projects. The problem, your approach, the outcome. Specifics matter more than generalities.
The Honest Closing
Forty questions is a lot. You will not be asked all of them. But if you understand the concepts behind them, you can handle whatever comes up. The interviewer wants to see that you have actually done the work. That you have written SQL queries that broke and you fixed them. That you have built dashboards that someone actually used. That you can take a messy question and turn it into a clear answer.
If you are still building these skills, structured practice helps. SkillsYard 's Data Analytics program covers SQL, Excel, Power BI, and the business context that ties them together. Live projects. Mentors who have worked as analysts. Placement support that connects you with opportunities. A free demo class is available if you want to see the teaching style before committing.