Dive into Distributions

7 minute read

In this post, let’s dive into distributions.

MotivationPermalink

When studying machine learning, one undoubtedly encounters many diverse distributions. For myself, I learned about normal, binomial, Bernoulli, and χ2 distributions in high school AP Statistics, but I happened to recently come across some unfamiliar ones. In fact, after getting tired of individually looking up the distributions each time I ran into them, I decided to create my own distributions reference sheet based on what I studied.

This blog post was originally intended to contain information on seven different types of distributions. Due to the sheer length of the intended post though, I chose to first deal with the three that I saw the most frequently, but never grasped thoroughly: Bernoulli, Poisson, and Gamma distributions. For the latter two, I took notes on the derivation processes for the mean and variance using the moment generating function. The following contents are my own explanations of what I learned from one of MIT OpenCourseWare’s courses.

Bernouille Trials, DistributionPermalink

A Bernouille trial is a trial in which the only outcomes are either success or failure. When repeating Bernouilli trials, the probability of success should remain the same. When we set X to be the random variable for a Bernouilli trial, we usually denote X=1 and X=0 to be the case for success and failure, respectively. If we define the probabilities for failure and success to be P(X=0)=1p and P(X=1)=p, we can combine those two expressions into a single form:

p(x)=px(1p)1x, for x=0,1

In this case, it is easy to see that p(0)=1p and p(1)=p. Therefore, the expectation value E(X) is as follows: E(X)=0(1p)+1(p)=p. Not surprisingly, X, the random variable for a Bernouilli trial, has a Bernoille distribution.

For the variance of X, we use the variance formula:

Var(X)=σ2=(Xμ)2P(X=x)

Since μ=p,

σ2=Var(X)

=P(X=1)(1p)2+P(X=0)(0p)2 =p(1p)2+(1p)p2=p(1p)

σ=p(1p)

Poisson DistributionPermalink

Before we get into what a Poisson distribution is, let’s first imagine a simple scenario. Suppose there is a cafe called “Cafe A,” where customers visit from 7 PM to 9 PM every night. One observes that the cafe and its visitors observe the following rules:

  1. Each visiting customer does not affect the probability of the other customers visiting.
  2. Customers do not visit sporadically–that is, one can reasonably assume that fifteen customers would visit in half an hour, if thirty visit in a full hour.
  3. No two customers visit within small time intervals, such as seconds, of one another.

Now suppose that random variable X describes the number of visitors per night, and that P(X=n) denotes the probability of n people visiting between 7 and 9 P.M. Let λ denote the average number of visitors per night as the population parameter, and suppose that the P.D.F of X is as follows: P(X=x)=λxeλx!.

In such a case, X is called the random variable of a Poisson distribution. In fact, such a random variable retains three key characteristics:

  1. Independence
  2. Consistency
  3. Non-clusteredness

The first condition, independence, means that events in one time interval or space are independent from, and do not affect the probabilities of, events in other time intervals and spaces. The second condition, consistency, implies that the rate of the events’ occurrences is consistent. Finally, the third condition, non-clusteredness, signifies that the events must not be clustered together in a short timeframe.

The coffee shop example above is a rather oversimplified real-life example of where a Poisson distribution might be seen. In reality, a Poisson distribution is essentially a binomial distribution under special circumstances–the number of trials must approach infinity, and the probability of success must approach zero. In other words, if a binomial distribution describes the probability of observing k successes among n trials with p as the probability of success,

p(k)=nCkpk(1p)nk,

if nandp0,

then P(X=k)=λkeλk!. The proof is shown below.

We start with the basic binomial distribution formula:

p(x)=nCxpx(1p)nx

In the case of Poisson distributions, λ=np, and substituting λ/n in place of p yields:

p(x)=nCx(λ/n)x(1λ/n)nx

Then after manipulating the form thrice,

p(x)=n!(nx)!x!(λ/n)x(1λ/n)nx p(x)=n(n1)(n2)(nx+1)x!(λxnx)(1λn)nx p(x)=n(n1)(n2)(nx+1)nx(λxx!)(1λn)nx

As described above, take the limit of p(x) as n approaches infinity.

limnp(x)=limnn(n1)(n2)(nx+1)nx(λxx!)(1λn)n(1λn)x

and since the first and last terms are equivalent to one,

limnp(x)=limn(λxx!)(1λn)n limnp(x)=limn(λxx!){(1λn)nλ}λ=λxeλx!

Using the moment generating function, we can also find the mean and the standard deviation of a Poisson distribution.

M(t)=E[etx]=x=0etxp(x) =x=0etxλxeλx!=eλx=0(etλ)xx!

Given the McLaurin series of ex,

ex=n=0xnn! eλx=0(etλ)xx!=eλeλet=eλ(et1) M(t)=eλ(et1)

The kth moment of random variable X is given by:

μk=E(Xk)

And the mean and median expressed in terms of moments are:

μ=E(X),σ=E(X2)E2(X) M(t)=E[etX]

When t=0, use McLaurin series of ex

M(t)=E[etX]=E[k=0(tX)kk!] =E[1+tX+(tX)2/2!++(tX)k/k!+] =1+tE(X)+t2E(X2)/2!++tkE(Xk)/k!+ =1+μ1t+μ2t2/2!++μktk/k!+

Therefore, the kth derivative of M(t) at t=0 is the kth moment.

Mk(t)=E[Xk],t=0

Applying the above conclusion down below,

μ=E(X),σ2=E(X2)E2(X) μ=M(t)=[eλ(et1)]=[eλ(et1)](λet),(t=0) μ=λ σ2=M(t)(M(t))2=[eλ(et1)]λ2,(t=0)

With some basic algebra,

σ2=λ,σ=λ
import matplotlib.pyplot as plt
import numpy as np
import math

x = np.linspace(-5, 5, 1000)
# y = 1 / (1 + np.e**(-x))
# mu = 0
# sigma = 1
# normal: y = 1 / (sigma * np.sqrt(2*np.pi)) * np.e**(-0.5*((x-mu)/sigma)**2)

lambdaP = 4
lambdaP2 = 10
lambdaP3 = 20

k = np.linspace(0, 40, 41) # k = np.ndarray
kList = list(k)
newK = list()

for i in kList:
  newK.append(math.factorial(int(i)))
# print(newK)
newKNP = np.asarray(newK)
# type(newKNP) # np.ndarray
 
y = (lambdaP**k) * (np.e**(-lambdaP)) / newKNP 
y2 = (lambdaP2**k) * (np.e**(-lambdaP2)) / newKNP 
y3 = (lambdaP3**k) * (np.e**(-lambdaP3)) / newKNP 

# type(y)
plt.plot(k.tolist(), y.tolist())
plt.plot(k.tolist(), y2.tolist())
plt.plot(k.tolist(), y3.tolist())

plt.legend(["lambda = 4", "lambda = 10", "lambda = 20"])
plt.title("Poisson Distributions")

# also works, but generates two separate plots
# fig, ax = plt.subplots()
# ax.plot(k,y)

# fig, ax2 = plt.subplots()
# ax2.plot(k,y2)
 
Poisson Distributions with Different Lambdas

The Γ DistributionPermalink

The definition of a gamma function is as follows:

Definition

Γ(x)=0tx1etdt

With integration by parts, the second property can be proven in less than three lines. And by substituting one in place of x, the third property can also be easily shown. With the second and third properties, the first one is then trivial.

Properties

Γ(n)=(n1)! Γ(n+1)=nΓ(n) Γ(1)=1

Gamma distribution PDF

Replace x by alpha, t by x from definition.

Γ(α)=0xα1exdx 1=1Γ(α)0xα1exdx

Substitute x for βy

0xα1exdx=0(βy)α1eβyβdy 1=01Γ(α)xα1exdx=0βαΓ(α)yα1eβydy

Replace y by x,

f(x|α,β)=βαΓ(α)xα1eβx,(x0) f(x|α,β)=0,(x<0)

Mean, Variance of Gamma Distribution

λ=αβ,σ2=αβ2

Proof:

We start with the momemt generating function:

M(t)=0βαΓ(α)xα1e(tβ)xdx

Substitute βy=(tβ)x

M(t)=0βαΓ(α)(βyβt)α1eβy(ββt)dy =βαΓ(α)0(βαyα1(βt)α)eβydy =βα(βt)α1Γ(α)0βαyα1eβydy=1

Since

f(x|α,β):=βαΓ(α)xα1eβx,(x0), 0f(x|α,β)dx=0βαΓ(α)xα1eβxdx=1 M(t)=βα(βt)α

The rest of the steps then become trivial.

M(t)=αβα(βt)α+1=αβ,(witht=0) M(t)=αβα(α+1)(βt)α+2=α(α+1)β2,(witht=0) λ=αβ,σ2=M(0)[M(0)]2=αβ2

ConclusionPermalink

Distributions and statistical inferences based on those distributions are key in parametric statistics. In this post, I mainly went over Bernoulli, Poisson, and gamma distributions, although my intention was to go over several others. For binomial, multinomial, χ2, and β distributions, I will likely create a separate post on them. Although this blog post took longer to create than I had anticipated, I hope I learned much from the experience. Thanks for reading.