# Basic statistics pdf

These lecture notes have been used at Basics of Statistics course held in Uni- versity of Tampere Population and sample are two basic concepts of statistics. Basic statistics. • My contact information: –Hui Bian, Statistics & Research Consultant. –Office for Faculty Excellence, Joyner library, room There are two main branches of statistics: descriptive and inferential. Descrip- tive statistics is used to say something about a set of information that has been.

 Author: ELVIA DARYANL Language: English, Spanish, Japanese Country: Micronesia Genre: Science & Research Pages: 457 Published (Last): 24.11.2015 ISBN: 488-7-61649-664-6 Distribution: Free* [*Sign up for free] Uploaded by: PAULINE This is the book Beginning Statistics (v. ). . Chapter 3: Basic Concepts of Probability. .. To learn the basic definitions used in statistics and some of its key. Crash Course on Basic Statistics . Statistical inference relies on making assump - tional to a Gaussian PDF with mean that is half. Statistics plays a key role in summarizing and distilling properly, and then dive into definitions of basic statistical concepts, exploratory.

Sampling[ edit ] When full census data cannot be collected, statisticians collect sample data by developing specific experiment designs and survey samples. Statistics itself also provides tools for prediction and forecasting through statistical models. The idea of making inferences based on sampled data began around the mids in connection with estimating populations and developing precursors of life insurance. Representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. A major problem lies in determining the extent that the sample chosen is actually representative. Statistics offers methods to estimate and correct for any bias within the sample and data collection procedures. There are also methods of experimental design for experiments that can lessen these issues at the outset of a study, strengthening its capability to discern truths about the population. Sampling theory is part of the mathematical discipline of probability theory.

Experiments[ edit ] The basic steps of a statistical experiment are: Planning the research, including finding the number of replicates of the study, using the following information: preliminary estimates regarding the size of treatment effects , alternative hypotheses , and the estimated experimental variability.

Consideration of the selection of experimental subjects and the ethics of research is necessary. Statisticians recommend that experiments compare at least one new treatment with a standard treatment or control, to allow an unbiased estimate of the difference in treatment effects. Design of experiments , using blocking to reduce the influence of confounding variables , and randomized assignment of treatments to subjects to allow unbiased estimates of treatment effects and experimental error. At this stage, the experimenters and statisticians write the experimental protocol that will guide the performance of the experiment and which specifies the primary analysis of the experimental data. Performing the experiment following the experimental protocol and analyzing the data following the experimental protocol. Further examining the data set in secondary analyses, to suggest new hypotheses for future study. Documenting and presenting the results of the study.

## Statistics Formulas

Experiments on human behavior have special concerns. The famous Hawthorne study examined changes to the working environment at the Hawthorne plant of the Western Electric Company. The researchers were interested in determining whether increased illumination would increase the productivity of the assembly line workers. The researchers first measured the productivity in the plant, then modified the illumination in an area of the plant and checked if the changes in illumination affected productivity.

It turned out that productivity indeed improved under the experimental conditions. However, the study is heavily criticized today for errors in experimental procedures, specifically for the lack of a control group and blindness. The Hawthorne effect refers to finding that an outcome in this case, worker productivity changed due to observation itself.

Those in the Hawthorne study became more productive not because the lighting was changed but because they were being observed. This type of study typically uses a survey to collect observations about the area of interest and then performs statistical analysis.

In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through a cohort study , and then look for the number of cases of lung cancer in each group.

Types of data[ edit ] Main articles: Statistical data type and Levels of measurement Various attempts have been made to produce a taxonomy of levels of measurement. The psychophysicist Stanley Smith Stevens defined nominal, ordinal, interval, and ratio scales.

Nominal measurements do not have meaningful rank order among values, and permit any one-to-one injective transformation. Ordinal measurements have imprecise differences between consecutive values, but have a meaningful order to those values, and permit any order-preserving transformation. Interval measurements have meaningful distances between measurements defined, but the zero value is arbitrary as in the case with longitude and temperature measurements in Celsius or Fahrenheit , and permit any linear transformation. Ratio measurements have both a meaningful zero value and the distances between different measurements defined, and permit any rescaling transformation. Because variables conforming only to nominal or ordinal measurements cannot be reasonably measured numerically, sometimes they are grouped together as categorical variables , whereas ratio and interval measurements are grouped together as quantitative variables , which can be either discrete or continuous , due to their numerical nature.

Such distinctions can often be loosely correlated with data type in computer science, in that dichotomous categorical variables may be represented with the Boolean data type , polytomous categorical variables with arbitrarily assigned integers in the integral data type , and continuous variables with the real data type involving floating point computation. The pdf represents the relative frequency of failure times as a function of time.

The cdf is a function, , of a random variable , and is defined for a number by: That is, for a number , is the probability that the observed value of will be at most.

The cdf represents the cumulative values of the pdf. That is, the value of a point on the curve of the cdf represents the area under the curve to the left of that point on the pdf. In reliability, the cdf is used to measure the probability that the item in question will fail before the associated time value, , and is also called unreliability.

Note that depending on the density function, denoted by , the limits will vary based on the region over which the distribution is defined.

For example, for the life distributions considered in this reference, with the exception of the normal distribution, this range would be Mathematical Relationship: pdf and cdf The mathematical relationship between the pdf and cdf is given by: where is a dummy integration variable.

Conversely: The cdf is the area under the probability density function up to a value of. The total area under the pdf is always equal to 1, or mathematically: The well-known normal or Gaussian distribution is an example of a probability density function.

The pdf for this distribution is given by: where is the standard deviation. The normal distribution has two parameters, and Another is the lognormal distribution, whose pdf is given by: where is the mean of the natural logarithms of the times-to-failure and is the standard deviation of the natural logarithms of the times-to-failure.

Again, this is a 2-parameter distribution. Reliability Function The reliability function can be derived using the previous definition of the cumulative distribution function,.

From our definition of the cdf, the probability of an event occurring by time is given by: Or, one could equate this event to the probability of a unit failing by time. Since this function defines the probability of failure by a certain time, we could consider this the unreliability function.

Subtracting this probability from 1 will give us the reliability function, one of the most important functions in life data analysis. The reliability function gives the probability of success of a unit undertaking a mission of a given time duration.

The following figure illustrates this. To show this mathematically, we first define the unreliability function, , which is the probability of failure, or the probability that our time-to-failure is in the region of 0 and.

This is the same as the cdf. So from : Reliability and unreliability are the only two events being considered and they are mutually exclusive; hence, the sum of these probabilities is equal to unity. Then: Conversely: Conditional Reliability Function Conditional reliability is the probability of successfully completing another mission following the successful completion of a previous mission.