Today I am going to speak about Hypothesis Testing which is frequently used by data scientists to:

  • Test a particular idea
  • Constructed an experiment to answer a particular question

Table of Contents

  • Definition
  • Significance and p-values
  • Type of Errors
  • Exercise
  • Z-test
  • Strengths & Weaknesses of Z-test
  • Student's t-test
  • Exercise
  • Conclusion
Instagram: Evening above the city at Crestaurant, Perth, Western Australia.
Photo by Harry Cunningham / Unsplash


The goal of hypothesis testing is to rule out the null. The results of a hypothesis test are two:

  • Reject the null hypothesis (so something happened)
  • Fail to reject the null hypothesis


Step 1: We have some idea about a situation:

  • The drug cures the common cold.
  • The evidence proves that you are guilty.
  • The samples come from different populations

Step 2: Formulate a null hypothesis H0:

  • The drug has no effect.
  • You are innocent
  • The samples come from the same population

Step 3: Formulate an alternative hypothesis Ha  != H0

  • The drug has an effect
  • You are guilty
  • The samples come from different populations

For the drug example the null hypothesis is: H0 the effect is due to random chance. However,  if we manage to rule out the null does not confirm the effect was caused by the ‘treatment’!

I urge to have a look at this book for more examples on Hypothesis testing.

The Elements of Statistical Learning (Springer Series in Statistics)

For people who prefer video course have a look on this online course:

Programming for Data Science

Significance and p-values

  • A small p-value indicates strong evidence against the null hypothesis, you measured an improbable value. So the null hypothesis can be rejected.
  • A large p-value indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.
  • Marginal p-values (usually in the range 0.01 to 0.1) are generally inconclusive. This usually means you need to collect more data.


  • Experiment: Coin flipping
  • Null Hypothesis H0: The coin is fair: P(heads) = P(tails) = ½
  • Test-statistic = number of heads
  • Result of 5 flips: HHHHH
    P(HHHHH | H0) = (1/2)^5=1/32 ~ 0.03 = p-value
  • Biased coins are rare! Should have a high significance threshold.


The power of a test statistic depends on:

  • Effect size: Easier to detect large effects!
  • Sample size: Statistical tests get more powerful with more data
  • Statistical significance: Increases the chance of rejecting the null hypothesis.

Type of Errors

There are two main ways to be wrong with significance testing.

  • Type 1 error = false positive. Reject a true null hypothesis.
  • Type 2 error = false negative. Fail to reject a false null hypothesis

Example 1

  • H0: the defendant is innocent.
  • Ha: the defendant is guilty.
  • A false positive: imprison an innocent person.
  • A false negative: let a guilty person go free.

Example 2

  • H0: there is no wolf in the valley.
  • Ha: there is a wolf in the valley.
  • A false positive: we thought there was a wolf when there was not.
  • A false negative: we thought there was no wolf when there was.


In a coin flipping experiment we perform 7 flips and get HHHTHHT. Is the coin biased?

Bias coined are rare. The probability is quite high so we can conclude that the coin is not biased.


How to standardize data:

  • Subtract the mean
  • Divide by the standard deviation

The standard score for a male individual who is 170cm, from the general population which has mean of 175cm and std of 7 is:

If we have n measurements:

Let's find now the standard score of a basketball team (5 players) with mean height of 200cm:

  • H0: Basketball players are a random sample of the general population.
  • Ha: Basketball players are tall i.e. average height > 1.75m
  • ⍺ = 0.01

This is a very large deviation! This implies that basketball players are a significantly different population.


Find the p-value associated with the Z-score where the population mean is 1.75m given a list X containing the heights of the basketball players in Python?

import statsmodels.stats.weightstats as sms 
from scipy.stats import norm 

x = [2.06,2.08,1.88,1.91,

s = sms.ztest(x, value=1.75); 
print( 2*norm.sf(16.06960894924432) )

Strengths & Weaknesses of Z-test

Strengths of z-test:

  • Intuitive

Weaknesses of z-test:

  • Need the true population mean
  • Need the true population standard deviation

Usually, we only have access to the sample mean and standard deviation. In this case, a Student's t-test is appropriate.

Student's t-test

Two sample t-test is often called “Student’s t-test” compares the means of two populations. The two sample test applies when:

  • Equal number of measurements of two populations
  • Tests the null hypothesis that the means of the populations from which the two samples were taken are equal.

A high overview of the maths involved:

There are exist two-tailed and one-tailed:

Usually, we used two-tailed tests but in the case our variables take only positive values (i.e heights, weights) we preferred one-tailed test.


Use scipy's t-test to compare the heights of the Golden State Warriors X with the Cleveland Cavaliers Y i.e. are they significantly different populations?

from scipy import stats

x = [2.06,2.08,1.88,1.91,

y = [1.91,1.96,2.06,1.91,

print( stats.ttest_ind(x, y, equal_var=True) )
print( stats.ttest_ind(x, y, equal_var=False) )

Another really good Python introduction book to machine learning is:

Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

For people who prefer video course have a look on this online course:

Machine Learning Engineer


This brings us to the end of this article. Hope you got a basic understanding of how a Hypothesis Test is used.

‌If you liked this article, please consider subscribing to my blog. That way I get to know that my work is valuable to you and also notify you for future articles.‌
‌Thanks for reading and I am looking forward to hearing your questions :)‌
Stay tuned and Happy Machine Learning.