A Practical EDA Checklist
A practical EDA checklist helps you inspect a dataset before modeling: understand its structure, summarize key statistics, explore distributions, detect outliers, study relationships, review categories, and connect findings to modeling decisions.
Use this checklist whenever you receive a new dataset.
- Dataset overview
- How many rows and columns?
- What does each row represent?
- What does each column mean?
- Are the data types correct?
- Are there duplicate rows?
- Descriptive statistics
- What are the mean and median?
- Are they close or far apart?
- What are the minimum and maximum values?
- What is the variance or standard deviation?
- Are there impossible values?
- Distributions
- Is the data symmetric or skewed?
- Are there multiple peaks?
- Are there long tails?
- Should variables be transformed?
- Outliers
- Which values are extreme?
- Are they errors or valid rare cases?
- Should they be kept, removed, capped, transformed, or segmented?
- Relationships
- Which variables correlate with the target?
- Which variables correlate with each other?
- Are relationships linear or curved?
- Do scatter plots reveal clusters or exceptions?
- Categorical variables
- Which categories are most common?
- Are there rare categories?
- Do categories have different target distributions?
- Are category labels consistent?
- Modeling implications
- Which features seem promising?
- Which features may need cleaning?
- Which variables may leak target information?
- Which assumptions should be tested later?
Series Parts
Managing Data Science – From Concept to Governance