of {$slidecount} ½ {$title}, {$author} MATLAB The Language of Technical Computing # Keyboard operation Full screen F11 key Next Slide Page down key, Return key Previous Slide Page up key Next Item Single mouse click, Right Arrow key, Space bar Previous Item Left Arrow key First Slide Home key Last Slide End key Font size Use +/- key to increase/decrease font size Miscellaneous C-key: navigates to Table of Content M-key: Mouse navigation on/off P-key: Print mode / Show all Slides S-key: Statusbar off/on # Title Matlab The Language of Technical Computing statistics application #### Haifeng Xu #### (hfxu@yzu.edu.cn) Basic Descriptive Statistics # Table of contents # What Is the Statistics Toolbox? ## What Is the Statistics Toolbox? The Statistics Toolbox is a collection of tools built on the MATLABĀ® numeric computing environment. The toolbox supports a wide range of common statistical tasks, from random number generation, to curve fitting, to design of experiments and statistical process control. # Primary Topic Areas ## Primary Topic Areas • Probability distributions • Descriptive statistics • Cluster analysis • Linear models • Nonlinear models • Hypothesis tests • Multivariate statistics • Statistical plots • Statistical process control • Design of experiments # Project 1 Project 1 ## The calculation of the probability density and the stochastic simulation # First slide ## The calculation of the probability density and the stochastic simulation The probability density function (pdf) has a different meaning depending on whether the distribution is discrete or continuous. For discrete distributions, the pdf is the probability of observing a particular outcome. Suppose the random variable$X$takes values$x_k\ (k=1,2,\ldots)$, and the probability of each value of taken is$P\{X=x_k\}=p_k,\ (k=1,2,\ldots)$, then we have $p_k\geq 0,\quad k=1,2,\ldots$ $\sum_{k=1}^{+\infty}p_k=1$ Unlike discrete distributions, the pdf of a continuous distribution at a value is not the probability of observing that value. For continuous distributions the probability of observing any particular value is zero. To get probabilities you must integrate the pdf over an interval of interest. A pdf has following theoretical properties: •$f(x)\geq 0$•$\displaystyle\int_{-\infty}^{+\infty}f(x)dx=1$•$\displaystyle P\{x_1\leq X\leq x_2\}=F(x_2)-F(x_1)=\int_{x_1}^{x_2}f(x)dx$• If$f(x)$is continuous at point$x$, then$F'(x)=f(x)$. # Binomial Distribution ## Background of the Binomial Distribution The binomial distribution models the total number of successes in repeated trials from an infinite population under the following conditions: • Only two outcomes are possible on each of n trials. • The probability of success for each trial is constant. • All trials are independent of each other. James Bernoulli derived the binomial distribution in 1713 (Ars Conjectandi). Earlier, Blaise Pascal had considered the special case where p = 1/2. ## Definition of the Binomial Distribution. The binomial pdf is $f(x)=\begin{cases} C_n^x p^x(1-p)^{n-x}& x=0,1,\ldots,n\\ 0& \text{otherwise} \end{cases}$ where $C_n^x=\frac{n!}{x!(n-x)!}$ # Binomial Distribution ## Using binopdf to compute pdf of binomial distribution binopdf(k,n,p) % pdf('bino',k,n,p)  where$k$is the number of repeated trials.$n$is the total number of trials # Example ## Example of Binomial Distribution There are a large number of electronic tubes. And 10% of them have been damaged. We select 20 tubes randomly to form a circuit. Please find the probability of this circuit to work properly (ie, all the selected 20 tubes are good). >> binopdf(20,20,0.9) ans = 0.1216  ## Another example from doc binopdf A Quality Assurance inspector tests 200 circuit boards a day. If 2% of the boards have defects, what is the probability that the inspector will find no defective boards on any given day? >> binopdf(0,200,0.02) ans = 0.0176  What is the most likely number of defective boards the inspector will find? >> defects=0:200; >> y=binopdf(defects,200,.02) y = Columns 1 through 9 0.0176 0.0718 0.1458 0.1963 0.1973 0.1579 0.1047 0.0592 0.0292 Columns 10 through 18 0.0127 0.0049 0.0017 0.0006 0.0002 0.0000 0.0000 0.0000 0.0000 Columns 19 through 27 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 28 through 36 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 37 through 45 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 46 through 54 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 55 through 63 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 64 through 72 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 73 through 81 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 82 through 90 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 91 through 99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 100 through 108 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 109 through 117 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 118 through 126 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 127 through 135 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 136 through 144 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 145 through 153 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 154 through 162 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 163 through 171 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 172 through 180 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 181 through 189 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 190 through 198 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0 0 Columns 199 through 201 0 0 0 >> [x,i]=max(y) x = 0.1973 i = 5 >> defects(i) ans = 4  # Example and Plot of the Binomial Distribution ## The following commands generate a plot of the binomial pdf for n = 10 and p = 1/2. >> x=0:10; >> y=binopdf(x,10,0.5) y = Columns 1 through 9 0.0010 0.0098 0.0439 0.1172 0.2051 0.2461 0.2051 0.1172 0.0439 Columns 10 through 11 0.0098 0.0010 >> plot(x,y,'+')  # Poisson Distribution ## Background of the Poisson Distribution The Poisson distribution is appropriate for applications that involve counting the number of times a random event occurs in a given amount of time, distance, area, etc. ## Definition of the Poisson Distribution The Poisson pdf is $f(x)=\begin{cases} \frac{\lambda^x e^{-\lambda}}{x!}& x=0,1,2,\ldots\\ 0 & \text{otherwise} \end{cases}$ # poisspdf ## Poisson probability density function Y = poisspdf(k,$\lambda$) % pdf('poiss',k,$\lambda$) where$k$is the number of times a random event occurs within some interval,$\lambda$is the mean value of them. X and lambda can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other input. The parameters in lambda must all be positive. # Example 2 ## Example 2 Airline booking office received 36 calls per hour, please find the probability of receiving two calls within 5 minutes. >> poisspdf(2,3) % here the office will receive 36/(60/5)=3 calls very 5 minutes. ans = 0.2240  # Exponential Distribution ## Background of the Exponential Distribution Like the chi-square distribution, the exponential distribution is a special case of the gamma distribution (obtained by setting a = 1) ## Definition of the Exponential Distribution If the pdf of a random variable$X$is $f(x)=\begin{cases} \lambda e^{-\lambda x}, & x > 0\\ 0, & \text{otherwise}, \end{cases}$ where$\lambda > 0$, then$X$is exponentially distributed. Or$\lambda=\frac{1}{\mu}$, then $f(x)=\begin{cases} \frac{1}{\mu}e^{-\frac{x}{\mu}}, & x > 0\\ 0, & \text{otherwise}, \end{cases}$ where$\mu > 0$. The distribution function is $F(x)=\begin{cases} 1-e^{-\frac{x}{\mu}},& x > 0 \\ 0, & \text{otherwise}. \end{cases}$ # exppdf ## Exponential probability density function Y = exppdf(X,$\mu$) returns the pdf of the exponential distribution with mean parameter$\mu$, evaluated at the values in$X$.$X$and$\mu$can be vectors, matrices, or multidimensional arrays that have the same size. A scalar input is expanded to a constant array with the same dimensions as the other input. The parameters in$\mu$must be positive. That is $\text{exppdf}(x,\mu)=\begin{cases} \frac{1}{\mu}e^{-\frac{x}{\mu}}, & x > 0\\ 0, & \text{otherwise}, \end{cases}$ # Example ## Example The lifetime$X$of some kind electronic component is exponentially distributed. The parameter$\mu=1000$. Consider there are three such electronic components. Find the probability of at least one of them has been damaged. ## Answer Since the distribution function is $F(x)=\begin{cases} 1-e^{-\frac{x}{\mu}},& x > 0 \\ 0, & \text{otherwise}. \end{cases}$ $P\{X > 1000\}=1-P\{X\leq 1000\}=1-F(1000)=e^{-1}$ The lifetimes of these electronic components are independent. So if$Y$denotes the numbers which are damaged, then$Y\sim b(3, 1-e^{-1})$. Thus, $P\{Y\geq 1\}=1-P\{Y=0\}=1-C_3^0(1-e^{-1})^0 (e^{-1})^3=1-e^{-3}.$ ## Another example The lifetime$X$of some kind electronic component is exponentially distributed. The parameter$\mu=50$(It's lifetime is 50 hours). Find the probability of such electroinc component cannot work properly just at the time of 25 hour and the probability of it breaks down during the first 25 hour. >> exppdf(25,50) % probability of break down at time of 25 hour ans = 0.0121 >> expcdf(25,50) % probability of break down during the first 25 hours ans = 0.3935  # Normal Distribution ## Background of the Normal Distribution The normal distribution is a two parameter family of curves. The first parameter,$\mu$, is the mean. The second,$\sigma$, is the standard deviation. The standard normal distribution (written$\Phi(x)$) sets$\mu$to 0 and$\sigma$to 1. The usual justification for using the normal distribution for modeling is the Central Limit Theorem, which states (roughly) that the sum of independent samples from any distribution with finite mean and variance converges to the normal distribution as the sample size goes to infinity. ## Definition of the Normal Distribution $f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ Find the probability density at point$0.6578$of random variable$X$which obey normal distribution N(0,1). >> normpdf(0.6578,0,1) ans = 0.3213  # Uniformly distributed (pseudo)random number ## Uniformly distributed pseudorandom numbers To generate uniformly distributed random numbers in interval$[0,1]$, we can use command rand(m,n), try help rand or doc rand We also can use unifrnd(a,b,m,n) to get continuous uniform random numbers. >> R=rand(3,4) R = 0.6324 0.5469 0.1576 0.4854 0.0975 0.9575 0.9706 0.8003 0.2785 0.9649 0.9572 0.1419 >> R=unifrnd(5,15,3,4) R = 9.2176 14.5949 13.4913 12.5774 14.1574 11.5574 14.3399 12.4313 12.9221 5.3571 11.7874 8.9223  >> help unifrnd unifrnd Random arrays from continuous uniform distribution. R = unifrnd(A,B) returns an array of random numbers chosen from the continuous uniform distribution on the interval from A to B. The size of R is the common size of A and B if both are arrays. If either parameter is a scalar, the size of R is the size of the other parameter.  # Binomial random number ## Binomial random number Try command binornd. # Functions for generating random numbers ## Functions for generating random numbers Try the following commands. Function Usage unifrndunifrnd(A,B,m,n) unidrndunidrnd(N,m,n) exprndexprnd($\lambda$,m,n) normrndnormrnd($\mu$,$\sigma$,m,n) chi2rndchi2rnd(N,m,n) trndtrnd(N,m,n) frndfrnd($N_1$,$N_2$,m,n) gamrndgamrnd(A,B,m,n) betarndbetarnd(A,B,m,n) lognrndlognrnd($\mu$,$\sigma$,m,n) nbinrndnbinrnd(R,P,m,n) ncfrndncfrnd($N_1$,$N_2$,$\delta$,m,n) nctrndnctrnd(N,$\delta$,m,n) raylrndraylrnd(B,m,n) weibrndweibrnd(A,B,m,n) binorndbinornd(N,P,m,n) georndgeornd(P,m,n) hygerndhygernd(M,K,N,m,n) poissrndpoissrnd($\lambda$,m,n) # random ## Random generate random arrays from a specified distribution. R = random(NAME,A) returns an array of random numbers chosen from the one-parameter probability distribution specified by NAME with parameter values A. R = random(NAME,A,B) or R = random(NAME,A,B,C) returns an array of random numbers chosen from a two- or three-parameter probability distribution with parameter values A, B (and C). >> random('norm',2,0.3,3,4) ans = 2.0882 1.6559 1.1167 1.7735 1.7638 1.6793 2.4315 2.4111 2.2665 1.7572 2.0976 1.4865 >> help random % try it  # Project 2 Project 2 ## Numerical characteristics of random variables # Sort ## Sort For vectors, sort(X) sorts the elements of X in ascending order. For matrices, sort(X) sorts each column of X in ascending order. [Y,I] = sort(X), Y is the result matrix, also returns an index matrix I. >> A=[1 2 3; 4 5 2; 3 7 0] A = 1 2 3 4 5 2 3 7 0 >> sort(A) ans = 1 2 0 3 5 2 4 7 3 >> [Y,I]=sort(A) Y = 1 2 0 3 5 2 4 7 3 I = 1 1 3 3 2 2 2 3 1  # sortrows ## sortrows >> A=[1 2 3; 4 5 2; 3 7 0] A = 1 2 3 4 5 2 3 7 0 >> sortrows(A) ans = 1 2 3 3 7 0 4 5 2 >> sortrows(A,1) ans = 1 2 3 3 7 0 4 5 2 >> sortrows(A,3) ans = 3 7 0 4 5 2 1 2 3 >> sortrows(A,[3 2]) ans = 3 7 0 4 5 2 1 2 3 >> sortrows(A,[2 3]) ans = 1 2 3 4 5 2 3 7 0  # Mean ## The mean of the data >> data = [1 2 3 4 50]; >> mean(data) ans = 12 >> x=[174.5 165 180.6 174.5 179 163 175.3 190 174 177.9] x = Columns 1 through 9 174.5000 165.0000 180.6000 174.5000 179.0000 163.0000 175.3000 190.0000 174.0000 Column 10 177.9000 >> mean(x) ans = 175.3800  # Median ## The median of the data >> data=[1 2 3 4 5] data = 1 2 3 4 5 >> median(data) ans = 3 >> data=[1 2 3 4 5 6] data = 1 2 3 4 5 6 >> median(data) ans = 3.5000 >> data=[3 2 3 4 9 19] data = 3 2 3 4 9 19 >> median(data) ans = 3.5000  # Sample variance ## Sample variance $\text{var}(X)=s^2=\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{X})^2.$ >> X=[165 180.6 174.5 179 163 175.3 190 174 177.9 160] X = Columns 1 through 9 165.0000 180.6000 174.5000 179.0000 163.0000 175.3000 190.0000 174.0000 177.9000 Column 10 160.0000 >> var(X) ans = 82.1846  # The standard deviation ## The standard deviation The squared root of the variance is called the standard deviation. >> X X = Columns 1 through 9 165.0000 180.6000 174.5000 179.0000 163.0000 175.3000 190.0000 174.0000 177.9000 Column 10 160.0000 >> X=[165 180.6 174.5 179 163 175.3 190 174 177.9 160] X = Columns 1 through 9 165.0000 180.6000 174.5000 179.0000 163.0000 175.3000 190.0000 174.0000 177.9000 Column 10 160.0000 >> DX=var(X,1) DX = 73.9661 >> sig=std(X,1) sig = 8.6004 >> isequal(sig^2,DX) ans = 0 >> sig^2 ans = 73.9661 >> DX1=var(X) DX1 = 82.1846 >> sig1=std(X) sig1 = 9.0656 >> sig1^2 ans = 82.1846 >> isequal(sig1^2,DX1) ans = 1 >> isequal(sig^2,DX) ans = 0 >> format long >> sig^2 ans = 73.966100000000026 >> DX DX = 73.966100000000012 >> sig1^2 ans = 82.184555555555562 >> DX1 DX1 = 82.184555555555562  # Geometric mean ## Geometric mean $M=(\prod_{i=1}^{n}x_i)^{\frac{1}{n}}$ >> A=[1 2 3 4] A = 1 2 3 4 >> M=geomean(A) M = 2.2134 >> B=[1 2 3 4;2 3 4 9; 2 9 0 5] B = 1 2 3 4 2 3 4 9 2 9 0 5 >> M2=geomean(B) M2 = 1.5874 3.7798 0 5.6462 >> %Let's test it >> T=[1 2 2] T = 1 2 2 >> geomean(T) ans = 1.5874  # Harmonic mean ## Harmonic mean $M=\frac{n}{\sum_{i=1}^{n}\frac{1}{x_i}}$ The arithmetic mean is greater than or equal to the harmonic mean >> A=[1 2 4 6; 3 4 5 7; 8 9 6 0; 4 6 8 1] A = 1 2 4 6 3 4 5 7 8 9 6 0 4 6 8 1 >> M1=harmmean(A) M1 = 2.3415 3.8919 5.3933 0 >> Average=mean(A) Average = 4.0000 5.2500 5.7500 3.5000  # range ## range range(X) returns the difference between the maximum and the minimum of a sample. >> A=[1 2 3; 2 8 9; 3 6 2] A = 1 2 3 2 8 9 3 6 2 >> Y=range(A) Y = 2 6 7  # Project 2 Project 2 ## The calculation of the probability density and the stochastic simulation # Expectation and variance of uniform distribution ## Expectation and variance of uniform distribution [M,V] = unifstat(A,B) returns the mean of and variance for the continuous uniform distribution using the corresponding lower endpoint (minimum), A and upper endpoint (maximum), B. Vector or matrix inputs for A and B must have the same size, which is also the size of M and V. A scalar input for A or B is expanded to a constant matrix with the same dimensions as the other input. The mean of the continuous uniform distribution with parameters a and b is$(a + b)/2$, and the variance is$(a-b)^2/12$. >> a=1:6 a = 1 2 3 4 5 6 >> b=2.*a b = 2 4 6 8 10 12 >> [M,V]=unifstat(a,b) M = 1.5000 3.0000 4.5000 6.0000 7.5000 9.0000 V = 0.0833 0.3333 0.7500 1.3333 2.0833 3.0000 >> 1/12 ans = 0.0833  # Normal mean and variance ## Normal mean and variance [M,V] = normstat($\mu$,$\sigma$) returns the mean of and variance for the normal distribution using the corresponding mean$\mu$and standard deviation$\sigma$.$\mu$and$\sigma$can be vectors, matrices, or multidimensional arrays that all have the same size, which is also the size of M and V. A scalar input for$\mu$or$\sigma$is expanded to a constant array with the same dimensions as the other input. The mean of the normal distribution with parameters$\mu$and$\sigma$is$\mu$, and the variance is$\sigma^2\$

>> n=1:5;
>> A=n'*n
A =
1     2     3     4     5
2     4     6     8    10
3     6     9    12    15
4     8    12    16    20
5    10    15    20    25
>> [M,V]=normstat(A,A)
M =
1     2     3     4     5
2     4     6     8    10
3     6     9    12    15
4     8    12    16    20
5    10    15    20    25
V =
1     4     9    16    25
4    16    36    64   100
9    36    81   144   225
16    64   144   256   400
25   100   225   400   625


# End

Thanks very much!

Welcome to visit my site: www.atzjg.net