Today I am going to speak about Hypothesis Testing which is frequently used by data scientists to:

  • Test a particular idea
  • Constructed an experiment to answer a particular question

✏️ Table of Contents

  • Definition
  • Significance and p-values
  • Type of Errors
  • Exercise
  • Z-test
  • Strengths & Weaknesses of Z-test
  • Student's t-test
  • Exercise
  • Conclusion
Instagram: Evening above the city at Crestaurant, Perth, Western Australia.
Photo by Harry Cunningham / Unsplash

Note: this article is also available on Medium.

📣 Definition

The goal of hypothesis testing is to rule out the null. The results of a hypothesis test are two:

  • Reject the null hypothesis (so something happened)
  • Fail to reject the null hypothesis


Step 1: We have some idea about a situation:

  • The drug cures the common cold.
  • The evidence proves that you are guilty.
  • The samples come from different populations

Step 2: Formulate a null hypothesis H0:

  • The drug has no effect.
  • You are innocent
  • The samples come from the same population

Step 3: Formulate an alternative hypothesis Ha  != H0

  • The drug has an effect
  • You are guilty
  • The samples come from different populations

For the drug example the null hypothesis is: H0 the effect is due to random chance. However,  if we manage to rule out the null does not confirm the effect was caused by the ‘treatment’!

📚 Significance and p-values

  • A small p-value indicates strong evidence against the null hypothesis, you measured an improbable value. So the null hypothesis can be rejected.
  • A large p-value indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.
  • Marginal p-values (usually in the range 0.01 to 0.1) are generally inconclusive. This usually means you need to collect more data.


  • Experiment: Coin flipping
  • Null Hypothesis H0: The coin is fair: P(heads) = P(tails) = ½
  • Test-statistic = number of heads
  • Result of 5 flips: HHHHH
    P(HHHHH | H0) = (1/2)^5=1/32 ~ 0.03 = p-value
  • Biased coins are rare! Should have a high significance threshold.


The power of a test statistic depends on:

  • Effect size: Easier to detect large effects!
  • Sample size: Statistical tests get more powerful with more data
  • Statistical significance: Increases the chance of rejecting the null hypothesis.

❌ Type of Errors

There are two main ways to be wrong with significance testing.

  • Type 1 error = false positive. Reject a true null hypothesis.
  • Type 2 error = false negative. Fail to reject a false null hypothesis

Example 1

  • H0: the defendant is innocent.
  • Ha: the defendant is guilty.
  • A false positive: imprison an innocent person.
  • A false negative: let a guilty person go free.

Example 2

  • H0: there is no wolf in the valley.
  • Ha: there is a wolf in the valley.
  • A false positive: we thought there was a wolf when there was not.
  • A false negative: we thought there was no wolf when there was.

🚀 For people who like video courses and want to kick-start a career in data science today, I highly recommend the below video course from Udacity:

Learn to Become a Data Scientist Online | Udacity | Udacity
Gain real-world data science experience with projects from industry experts. Take the first step to becoming a data scientist. Learn online, with Udacity.

📚 While for book lovers:

🎳 Exercise

In a coin flipping experiment we perform 7 flips and get HHHTHHT. Is the coin biased?

Bias coined are rare. The probability is quite high so we can conclude that the coin is not biased.

📈 Z-test

How to standardize data:

  • Subtract the mean
  • Divide by the standard deviation

The standard score for a male individual who is 170cm, from the general population which has mean of 175cm and std of 7 is:

If we have n measurements:

Let's find now the standard score of a basketball team (5 players) with mean height of 200cm:

  • H0: Basketball players are a random sample of the general population.
  • Ha: Basketball players are tall i.e. average height > 1.75m
  • ⍺ = 0.01

This is a very large deviation! This implies that basketball players are a significantly different population.


Find the p-value associated with the Z-score where the population mean is 1.75m given a list X containing the heights of the basketball players in Python?

import statsmodels.stats.weightstats as sms 
from scipy.stats import norm 

x = [2.06,2.08,1.88,1.91,

s = sms.ztest(x, value=1.75); 
print( 2*norm.sf(16.06960894924432) )

🥊 Strengths & Weaknesses of Z-test

Strengths of z-test:

  • Intuitive

Weaknesses of z-test:

  • Need the true population mean
  • Need the true population standard deviation

Usually, we only have access to the sample mean and standard deviation. In this case, a Student's t-test is appropriate.

🧩 Student's t-test

Two sample t-test is often called “Student’s t-test” compares the means of two populations. The two sample test applies when:

  • Equal number of measurements of two populations
  • Tests the null hypothesis that the means of the populations from which the two samples were taken are equal.

A high overview of the maths involved:

There are exist two-tailed and one-tailed:

Usually, we used two-tailed tests but in the case our variables take only positive values (i.e heights, weights) we preferred one-tailed test.

🎮 Exercise

Use scipy's t-test to compare the heights of the Golden State Warriors X with the Cleveland Cavaliers Y i.e. are they significantly different populations?

from scipy import stats

x = [2.06,2.08,1.88,1.91,

y = [1.91,1.96,2.06,1.91,

print( stats.ttest_ind(x, y, equal_var=True) )
print( stats.ttest_ind(x, y, equal_var=False) )

🤖 Conclusion

This brings us to the end of this article. Hope you got a basic understanding of how a Hypothesis Test is used.

Thanks for reading; if you liked this article, please consider subscribing to my blog. That way I get to know that my work is valuable to you and also notify you for future articles.‌

💪💪💪💪 As always keep studying, keep creating 🔥🔥🔥🔥