Keyboard operation

Full screen: F11 key
Next Slide: Page down key, Return key
Previous Slide: Page up key
Next Item: Single mouse click, Right Arrow key, Space bar
Previous Item: Left Arrow key
First Slide: Home key
Last Slide: End key
Font size: Use +/- key to increase/decrease font size
Miscellaneous: C-key: navigates to Table of Content
M-key: Mouse navigation on/off
P-key: Print mode / Show all Slides
S-key: Statusbar off/on

Title

Matlab

The Language of Technical Computing

— statistics application —

Haifeng Xu

(hfxu@yzu.edu.cn)

Basic Descriptive Statistics

What Is the Statistics Toolbox?

The Statistics Toolbox is a collection of tools built on the MATLAB® numeric computing environment. The toolbox supports a wide range of common statistical tasks, from random number generation, to curve fitting, to design of experiments and statistical process control.

Primary Topic Areas

Probability distributions
Descriptive statistics
Cluster analysis
Linear models
Nonlinear models
Hypothesis tests
Multivariate statistics
Statistical plots
Statistical process control
Design of experiments

Project 1

The calculation of the probability density and the stochastic simulation

First slide

The calculation of the probability density and the stochastic simulation

The probability density function (pdf) has a different meaning depending on whether the distribution is discrete or continuous.

For discrete distributions, the pdf is the probability of observing a particular outcome. Suppose the random variable $X$ takes values $x_k\ (k=1,2,\ldots)$, and the probability of each value of taken is $P\{X=x_k\}=p_k,\ (k=1,2,\ldots)$, then we have

\[ p_k\geq 0,\quad k=1,2,\ldots \] \[ \sum_{k=1}^{+\infty}p_k=1 \]

Unlike discrete distributions, the pdf of a continuous distribution at a value is not the probability of observing that value. For continuous distributions the probability of observing any particular value is zero. To get probabilities you must integrate the pdf over an interval of interest.

A pdf has following theoretical properties:

$f(x)\geq 0$
$\displaystyle\int_{-\infty}^{+\infty}f(x)dx=1$
$\displaystyle P\{x_1\leq X\leq x_2\}=F(x_2)-F(x_1)=\int_{x_1}^{x_2}f(x)dx$
If $f(x)$ is continuous at point $x$, then $F'(x)=f(x)$.

Binomial Distribution

Background of the Binomial Distribution

The binomial distribution models the total number of successes in repeated trials from an infinite population under the following conditions:

Only two outcomes are possible on each of n trials.
The probability of success for each trial is constant.
All trials are independent of each other.

James Bernoulli derived the binomial distribution in 1713 (Ars Conjectandi). Earlier, Blaise Pascal had considered the special case where p = 1/2.

Definition of the Binomial Distribution.

The binomial pdf is \[ f(x)=\begin{cases} C_n^x p^x(1-p)^{n-x}& x=0,1,\ldots,n\\ 0& \text{otherwise} \end{cases} \] where \[ C_n^x=\frac{n!}{x!(n-x)!} \]

Binomial Distribution

Using binopdf to compute pdf of binomial distribution

binopdf(k,n,p)  % pdf('bino',k,n,p)

where $k$ is the number of repeated trials. $n$ is the total number of trials

Example

Example of Binomial Distribution

There are a large number of electronic tubes. And 10% of them have been damaged. We select 20 tubes randomly to form a circuit. Please find the probability of this circuit to work properly (ie, all the selected 20 tubes are good).

>> binopdf(20,20,0.9)
ans =
    0.1216

Another example from doc binopdf

A Quality Assurance inspector tests 200 circuit boards a day. If 2% of the boards have defects, what is the probability that the inspector will find no defective boards on any given day?

>> binopdf(0,200,0.02)
ans =
    0.0176

What is the most likely number of defective boards the inspector will find?

>> defects=0:200;
>> y=binopdf(defects,200,.02)
y =
  Columns 1 through 9
    0.0176    0.0718    0.1458    0.1963    0.1973    0.1579    0.1047    0.0592    0.0292
  Columns 10 through 18
    0.0127    0.0049    0.0017    0.0006    0.0002    0.0000    0.0000    0.0000    0.0000
  Columns 19 through 27
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 28 through 36
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 37 through 45
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 46 through 54
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 55 through 63
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 64 through 72
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 73 through 81
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 82 through 90
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 91 through 99
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 100 through 108
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 109 through 117
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 118 through 126
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 127 through 135
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 136 through 144
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 145 through 153
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 154 through 162
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 163 through 171
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 172 through 180
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 181 through 189
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000
  Columns 190 through 198
    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000         0         0
  Columns 199 through 201
         0         0         0
>> [x,i]=max(y)
x =
    0.1973
i =
     5
>> defects(i)
ans =
     4

Example and Plot of the Binomial Distribution

The following commands generate a plot of the binomial pdf for n = 10 and p = 1/2.

>> x=0:10;
>> y=binopdf(x,10,0.5)
y =
  Columns 1 through 9
    0.0010    0.0098    0.0439    0.1172    0.2051    0.2461    0.2051    0.1172    0.0439
  Columns 10 through 11
    0.0098    0.0010
>> plot(x,y,'+')

Poisson Distribution

Background of the Poisson Distribution

The Poisson distribution is appropriate for applications that involve counting the number of times a random event occurs in a given amount of time, distance, area, etc.

Definition of the Poisson Distribution

The Poisson pdf is \[ f(x)=\begin{cases} \frac{\lambda^x e^{-\lambda}}{x!}& x=0,1,2,\ldots\\ 0 & \text{otherwise} \end{cases} \]

poisspdf

Poisson probability density function

Y = poisspdf(k,$\lambda$) % pdf('poiss',k,$\lambda$)

where $k$ is the number of times a random event occurs within some interval, $\lambda$ is the mean value of them.

X and lambda can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other input. The parameters in lambda must all be positive.

Example 2

Airline booking office received 36 calls per hour, please find the probability of receiving two calls within 5 minutes.

>> poisspdf(2,3) % here the office will receive 36/(60/5)=3 calls very 5 minutes.
ans =
    0.2240

Exponential Distribution

Background of the Exponential Distribution

Like the chi-square distribution, the exponential distribution is a special case of the gamma distribution (obtained by setting a = 1)

Definition of the Exponential Distribution

If the pdf of a random variable $X$ is \[ f(x)=\begin{cases} \lambda e^{-\lambda x}, & x > 0\\ 0, & \text{otherwise}, \end{cases} \] where $\lambda > 0$, then $X$ is exponentially distributed.

Or $\lambda=\frac{1}{\mu}$, then \[ f(x)=\begin{cases} \frac{1}{\mu}e^{-\frac{x}{\mu}}, & x > 0\\ 0, & \text{otherwise}, \end{cases} \] where $\mu > 0$.

The distribution function is \[ F(x)=\begin{cases} 1-e^{-\frac{x}{\mu}},& x > 0 \\ 0, & \text{otherwise}. \end{cases} \]

exppdf

Exponential probability density function

Y = exppdf(X,$\mu$) returns the pdf of the exponential distribution with mean parameter $\mu$, evaluated at the values in $X$. $X$ and $\mu$ can be vectors, matrices, or multidimensional arrays that have the same size. A scalar input is expanded to a constant array with the same dimensions as the other input. The parameters in $\mu$ must be positive.

That is \[ \text{exppdf}(x,\mu)=\begin{cases} \frac{1}{\mu}e^{-\frac{x}{\mu}}, & x > 0\\ 0, & \text{otherwise}, \end{cases} \]

Example

The lifetime $X$ of some kind electronic component is exponentially distributed. The parameter $\mu=1000$. Consider there are three such electronic components. Find the probability of at least one of them has been damaged.

Answer

Since the distribution function is \[ F(x)=\begin{cases} 1-e^{-\frac{x}{\mu}},& x > 0 \\ 0, & \text{otherwise}. \end{cases} \]

\[ P\{X > 1000\}=1-P\{X\leq 1000\}=1-F(1000)=e^{-1} \]

The lifetimes of these electronic components are independent. So if $Y$ denotes the numbers which are damaged, then $Y\sim b(3, 1-e^{-1})$.

Thus, \[ P\{Y\geq 1\}=1-P\{Y=0\}=1-C_3^0(1-e^{-1})^0 (e^{-1})^3=1-e^{-3}. \]

Another example

The lifetime $X$ of some kind electronic component is exponentially distributed. The parameter $\mu=50$ (It's lifetime is 50 hours). Find the probability of such electroinc component cannot work properly just at the time of 25 hour and the probability of it breaks down during the first 25 hour.

>> exppdf(25,50) % probability of break down at time of 25 hour
ans =
    0.0121
>> expcdf(25,50) % probability of break down during the first 25 hours
ans =
    0.3935

Normal Distribution

Background of the Normal Distribution

The normal distribution is a two parameter family of curves. The first parameter, $\mu$, is the mean. The second, $\sigma$, is the standard deviation. The standard normal distribution (written $\Phi(x)$) sets $\mu$ to 0 and $\sigma$ to 1.

The usual justification for using the normal distribution for modeling is the Central Limit Theorem, which states (roughly) that the sum of independent samples from any distribution with finite mean and variance converges to the normal distribution as the sample size goes to infinity.

Definition of the Normal Distribution

\[ f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]

Find the probability density at point $0.6578$ of random variable $X$ which obey normal distribution N(0,1).

>> normpdf(0.6578,0,1)
ans =
    0.3213

Uniformly distributed (pseudo)random number

Uniformly distributed pseudorandom numbers

To generate uniformly distributed random numbers in interval $[0,1]$, we can use command rand(m,n), try help rand or doc rand

We also can use unifrnd(a,b,m,n) to get continuous uniform random numbers.

>> R=rand(3,4)
R =
    0.6324    0.5469    0.1576    0.4854
    0.0975    0.9575    0.9706    0.8003
    0.2785    0.9649    0.9572    0.1419
>> R=unifrnd(5,15,3,4)
R =
    9.2176   14.5949   13.4913   12.5774
   14.1574   11.5574   14.3399   12.4313
   12.9221    5.3571   11.7874    8.9223

>> help unifrnd

 unifrnd Random arrays from continuous uniform distribution.
    R = unifrnd(A,B) returns an array of random numbers chosen from the
    continuous uniform distribution on the interval from A to B.  The size
    of R is the common size of A and B if both are arrays.  If either
    parameter is a scalar, the size of R is the size of the other
    parameter.

Binomial random number

Try command binornd.

Functions for generating random numbers

Try the following commands.

Function	Usage
unifrnd	unifrnd(A,B,m,n)
unidrnd	unidrnd(N,m,n)
exprnd	exprnd($\lambda$,m,n)
normrnd	normrnd($\mu$,$\sigma$,m,n)
chi2rnd	chi2rnd(N,m,n)
trnd	trnd(N,m,n)
frnd	frnd($N_1$,$N_2$,m,n)
gamrnd	gamrnd(A,B,m,n)
betarnd	betarnd(A,B,m,n)
lognrnd	lognrnd($\mu$,$\sigma$,m,n)
nbinrnd	nbinrnd(R,P,m,n)
ncfrnd	ncfrnd($N_1$,$N_2$,$\delta$,m,n)
nctrnd	nctrnd(N,$\delta$,m,n)
raylrnd	raylrnd(B,m,n)
weibrnd	weibrnd(A,B,m,n)
binornd	binornd(N,P,m,n)
geornd	geornd(P,m,n)
hygernd	hygernd(M,K,N,m,n)
poissrnd	poissrnd($\lambda$,m,n)

random

Random generate random arrays from a specified distribution.

R = random(NAME,A) returns an array of random numbers chosen from the one-parameter probability distribution specified by NAME with parameter values A.

R = random(NAME,A,B) or R = random(NAME,A,B,C) returns an array of random numbers chosen from a two- or three-parameter probability distribution with parameter values A, B (and C).

>> random('norm',2,0.3,3,4)
ans =
    2.0882    1.6559    1.1167    1.7735
    1.7638    1.6793    2.4315    2.4111
    2.2665    1.7572    2.0976    1.4865
>> help random % try it

Project 2

Numerical characteristics of random variables

Sort

For vectors, sort(X) sorts the elements of X in ascending order.

For matrices, sort(X) sorts each column of X in ascending order.

[Y,I] = sort(X), Y is the result matrix, also returns an index matrix I.

>> A=[1 2 3; 4 5 2; 3 7 0]
A =
     1     2     3
     4     5     2
     3     7     0
>> sort(A)
ans =
     1     2     0
     3     5     2
     4     7     3
>> [Y,I]=sort(A)
Y =
     1     2     0
     3     5     2
     4     7     3
I =
     1     1     3
     3     2     2
     2     3     1

sortrows

>> A=[1 2 3; 4 5 2; 3 7 0]
A =
     1     2     3
     4     5     2
     3     7     0
>> sortrows(A)
ans =
     1     2     3
     3     7     0
     4     5     2
>> sortrows(A,1)
ans =
     1     2     3
     3     7     0
     4     5     2
>> sortrows(A,3)
ans =
     3     7     0
     4     5     2
     1     2     3
>> sortrows(A,[3 2])
ans =
     3     7     0
     4     5     2
     1     2     3
>> sortrows(A,[2 3])
ans =
     1     2     3
     4     5     2
     3     7     0

Mean

The mean of the data

>> data = [1 2 3 4 50];
>> mean(data)
ans =
    12
>> x=[174.5 165 180.6 174.5 179 163 175.3 190 174 177.9]
x =
  Columns 1 through 9
  174.5000  165.0000  180.6000  174.5000  179.0000  163.0000  175.3000  190.0000  174.0000
  Column 10
  177.9000
>> mean(x)
ans =
  175.3800

Median

The median of the data

>> data=[1 2 3 4 5]
data =
     1     2     3     4     5
>> median(data)
ans =
     3
>> data=[1 2 3 4 5 6]
data =
     1     2     3     4     5     6
>> median(data)
ans =
    3.5000
>> data=[3 2 3 4 9 19]
data =
     3     2     3     4     9    19
>> median(data)
ans =
    3.5000

Sample variance

\[ \text{var}(X)=s^2=\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{X})^2. \]

>> X=[165 180.6 174.5 179 163 175.3 190 174 177.9 160]
X =
  Columns 1 through 9
  165.0000  180.6000  174.5000  179.0000  163.0000  175.3000  190.0000  174.0000  177.9000
  Column 10
  160.0000
>> var(X)
ans =
   82.1846

The standard deviation

The squared root of the variance is called the standard deviation.

>> X
X =
  Columns 1 through 9
  165.0000  180.6000  174.5000  179.0000  163.0000  175.3000  190.0000  174.0000  177.9000
  Column 10
  160.0000
>> X=[165 180.6 174.5 179 163 175.3 190 174 177.9 160]
X =
  Columns 1 through 9
  165.0000  180.6000  174.5000  179.0000  163.0000  175.3000  190.0000  174.0000  177.9000
  Column 10
  160.0000
>> DX=var(X,1)
DX =
   73.9661
>> sig=std(X,1)
sig =
    8.6004
>> isequal(sig^2,DX)
ans =
     0
>> sig^2
ans =
   73.9661
>> DX1=var(X)
DX1 =
   82.1846
>> sig1=std(X)
sig1 =
    9.0656
>> sig1^2
ans =
   82.1846
>> isequal(sig1^2,DX1)
ans =
     1
>> isequal(sig^2,DX)
ans =
     0
>> format long
>> sig^2
ans =
  73.966100000000026
>> DX
DX =
  73.966100000000012
>> sig1^2
ans =
  82.184555555555562
>> DX1
DX1 =
  82.184555555555562

Geometric mean

\[ M=(\prod_{i=1}^{n}x_i)^{\frac{1}{n}} \]

>> A=[1 2 3 4]
A =
     1     2     3     4
>> M=geomean(A)
M =
    2.2134
>> B=[1 2 3 4;2 3 4 9; 2 9 0 5]
B =
     1     2     3     4
     2     3     4     9
     2     9     0     5
>> M2=geomean(B)
M2 =
    1.5874    3.7798         0    5.6462
>> %Let's test it
>> T=[1 2 2]
T =
     1     2     2
>> geomean(T)
ans =
    1.5874

Harmonic mean

\[ M=\frac{n}{\sum_{i=1}^{n}\frac{1}{x_i}} \]

The arithmetic mean is greater than or equal to the harmonic mean

>> A=[1 2 4 6;
3 4 5 7;
8 9 6 0;
4 6 8 1]
A =
     1     2     4     6
     3     4     5     7
     8     9     6     0
     4     6     8     1
>> M1=harmmean(A)
M1 =
    2.3415    3.8919    5.3933         0
>> Average=mean(A)
Average =
    4.0000    5.2500    5.7500    3.5000

range

range(X) returns the difference between the maximum and the minimum of a sample.

>> A=[1 2 3; 2 8 9; 3 6 2]
A =
     1     2     3
     2     8     9
     3     6     2
>> Y=range(A)
Y =
     2     6     7

Project 2

The calculation of the probability density and the stochastic simulation

Expectation and variance of uniform distribution

[M,V] = unifstat(A,B) returns the mean of and variance for the continuous uniform distribution using the corresponding lower endpoint (minimum), A and upper endpoint (maximum), B.

Vector or matrix inputs for A and B must have the same size, which is also the size of M and V.

A scalar input for A or B is expanded to a constant matrix with the same dimensions as the other input.

The mean of the continuous uniform distribution with parameters a and b is $(a + b)/2$, and the variance is $(a-b)^2/12$.

>> a=1:6
a =
     1     2     3     4     5     6
>> b=2.*a
b =
     2     4     6     8    10    12
>> [M,V]=unifstat(a,b)
M =
    1.5000    3.0000    4.5000    6.0000    7.5000    9.0000
V =
    0.0833    0.3333    0.7500    1.3333    2.0833    3.0000
>> 1/12
ans =
    0.0833

Normal mean and variance

[M,V] = normstat($\mu$,$\sigma$) returns the mean of and variance for the normal distribution using the corresponding mean $\mu$ and standard deviation $\sigma$.

$\mu$ and $\sigma$ can be vectors, matrices, or multidimensional arrays that all have the same size, which is also the size of M and V. A scalar input for $\mu$ or $\sigma$ is expanded to a constant array with the same dimensions as the other input.

The mean of the normal distribution with parameters $\mu$ and $\sigma$ is $\mu$, and the variance is $\sigma^2$

>> n=1:5;
>> A=n'*n
A =
     1     2     3     4     5
     2     4     6     8    10
     3     6     9    12    15
     4     8    12    16    20
     5    10    15    20    25
>> [M,V]=normstat(A,A)
M =
     1     2     3     4     5
     2     4     6     8    10
     3     6     9    12    15
     4     8    12    16    20
     5    10    15    20    25
V =
     1     4     9    16    25
     4    16    36    64   100
     9    36    81   144   225
    16    64   144   256   400
    25   100   225   400   625

Keyboard operation

Title

Haifeng Xu

(hfxu@yzu.edu.cn)

Table of contents

What Is the Statistics Toolbox?

What Is the Statistics Toolbox?

Primary Topic Areas

Primary Topic Areas

Project 1

The calculation of the probability density and the stochastic simulation

First slide

The calculation of the probability density and the stochastic simulation

Binomial Distribution

Background of the Binomial Distribution

Definition of the Binomial Distribution.

Binomial Distribution

Using binopdf to compute pdf of binomial distribution

Example

Example of Binomial Distribution

Another example from doc binopdf

Example and Plot of the Binomial Distribution

The following commands generate a plot of the binomial pdf for n = 10 and p = 1/2.

Poisson Distribution

Background of the Poisson Distribution

Definition of the Poisson Distribution

poisspdf

Poisson probability density function

Example 2

Example 2

Exponential Distribution

Background of the Exponential Distribution

Definition of the Exponential Distribution

exppdf

Exponential probability density function

Example

Example

Answer

Another example

Normal Distribution

Background of the Normal Distribution

Definition of the Normal Distribution

Uniformly distributed (pseudo)random number

Uniformly distributed pseudorandom numbers

Binomial random number

Binomial random number

Functions for generating random numbers

Functions for generating random numbers

random

Random generate random arrays from a specified distribution.

Project 2

Numerical characteristics of random variables

Sort

Sort

sortrows

sortrows

Mean

The mean of the data

Median

The median of the data

Sample variance

Sample variance

The standard deviation

The standard deviation

Geometric mean

Geometric mean

Harmonic mean

Harmonic mean

range

range

Project 2

The calculation of the probability density and the stochastic simulation

Expectation and variance of uniform distribution

Expectation and variance of uniform distribution

Normal mean and variance

Normal mean and variance

Covariance matrix

Covariance matrix

End