Upgrade to Get Unlimited Access
($10 One Off Payment)

An Interactive Guide to Hypothesis Testing in Python

Updated: Apr 18

Statistical Test in Python Cheatsheet
upgrade and grab the cheatsheet from our infographics gallery

What is Hypothesis Testing?

Hypothesis testing is an essential part in inferential statistics where we use observed data in a sample to draw conclusions about unobserved data - often the population.

Implication of hypothesis testing:

  • clinical research: widely used in psychology, biology and healthcare research to examine the effectiveness of clinical trials

  • A/B testing: can be applied in business context to improve conversions through testing different versions of campaign incentives, website designs ...

  • feature selection in machine learning: filter-based feature selection methods use different statistical tests to determine the feature importance

  • college or university: well, if you major in statistics or data science, it is likely to appear in your exams

4 Steps in Hypothesis testing

Step 1. Define null and alternative hypothesis

Null hypothesis (H0) can be stated differently depends on the statistical tests, but generalize to the claim that no difference, no relationship or no dependency exists between two or more variables.

Alternative hypothesis (H1) is contradictory to the null hypothesis and it claims that relationships exist. It is the hypothesis that we would like to prove right. However, a more conservational approach is favored in statistics where we always assume null hypothesis is true and try to find evidence to reject the null hypothesis.

Step 2. Choose the appropriate test

Common Types of Statistical Testing including t-tests, z-tests, anova test and chi-square test

how to choose the statistical test

T-test: compare two groups/categories of numeric variables with small sample size

Z-test: compare two groups/categories of numeric variables with large sample size

ANOVA test: compare the difference between two or more groups/categories of numeric variables

Chi-Squared test: examine the relationship between two categorical variables

Correlation test: examine the relationship between two numeric variables

Step 3. Calculate the p-value

How p value is calculated primarily depends on the statistical testing selected. Firstly, based on the mean and standard deviation of the observed sample data, we are able to derive the test statistics value (e.g. t-statistics, f-statistics). Then calculate the probability of getting this test statistics given the distribution of the null hypothesis, we will find out the p-value. We will use some examples to demonstrate this in more detail.

Step 4. Determine the statistical significance

p value is then compared against the significance level (also noted as alpha value) to determine whether there is sufficient evidence to reject the null hypothesis. The significance level is a predetermined probability threshold - commonly 0.05. If p value is larger than the threshold, it means that the value is likely to occur in the distribution when the null hypothesis is true. On the other hand, if lower than significance level, it means it is very unlikely to occur in the null hypothesis distribution - hence reject the null hypothesis.

Hypothesis Testing with Examples

Kaggle dataset “Customer Personality Analysis” is used in this case study to demonstrate different types of statistical test. T-test, ANOVA and Chi-Square test are sensitive to large sample size, and almost certainly will generate very small p-value when sample size is large . Therefore, I took a random sample (size of 100) from the original data:

sampled_df = df.sample(n=100, random_state=100)


T-test is used when we want to test the relationship between a numeric variable and a categorical variable.There are three main types of t-test.

  1. one sample t-test: test the mean of one group against a constant value

  2. two sample t-test: test the difference of means between two groups

  3. paired sample t-test: test the difference of means between two measurements of the same subject

For example, if I would like to test whether “Recency” (the number of days since customer’s last purchase - numeric value) contributes to the prediction of “Response” (whether the customer accepted the offer in the last campaign - categorical value), I can use a two sample t-test.

The first sample would be the “Recency” of customers who accepted the offer:

recency_P = sampled_df[sampled_df['Response']==1]['Recency']

The second sample would be the “Recency” of customers who rejected the offer:

recency_N = sampled_df[sampled_df['Response']==0]['Recency']

To compare the “Recency” of these two groups intuitively, we can use histogram (or distplot) to show the distributions.