When you measure something and get your data, what you get is called sample from the entire population of your data. For example, you have some questionnaire survey about marketing of a product. You ask many questions especially whether people like to buy this new product. If you can distribute your questionnaire to all citizen of the city, and all of them response to your questions, you get the population. In reality, it is not practical to distribute questionnaire to millions of people. What you can do is take some random sample, say a hundred or a thousand, from the population and assume that your sample is a valid representation of the entire population. From your sample, you can estimate the true value of the population.
This estimation procedure is called inferential statistics. From your sample, you want to know especially the distribution of the sample. Because your sample is representing the population, the distribution of the sample is also characterizing the population. From the distribution we can get
- Statistics to estimate the properties of sample (and therefore, the population)
- Confidence Interval
- Hypothesis testing
The statistical properties can be any parameters such as summation, central tendency (mean, median, mode) or variation (inter quartile range, variance, range, standard deviation) or some ratio (coefficient of variation, t statistics), etc. Confidence interval represents a range where the statistical properties value will most probably is. For example, you may say that you have confidence that the range of your data will be within 37.5 to 38.5 with small possible degree of error due to random chance. From the distribution, you can also test some hypothesis whether the mean of sample is equal or less than a certain value, or to test whether two samples are taken from the same populations.
The problems happen when you do not know the distribution of the population. If your sample is very small that you cannot even fit the sample into some theoretical distribution, most people simply assume the distribution of the population follows Normal distribution. However, this assumption may not be correct. Can we estimate the distribution from the sample from unknown population distribution?
The answer is yes. Several non parametric tests exist including permutation test, rank (Wilcoxon) test and bootstrap. Using bootstrap method, you have additional benefit. You can even go one step further beyond the estimation of sampling distribution. You can even get distribution of your estimator.
Preferable reference for this tutorial is
Teknomo, Kardi. Bootstrap Sampling Tutorial. http://people.revoledu.com/kard/ tutorial/bootstrap/