10 is a good sample size

The aim of a sample is to approximate the distribution of properties in the population. Basically, a statement about a sample can only be valid with one hundred percent certainty for the sample itself; the approximate validity for the population is derived statistically. Since the aim of a sampling procedure is to examine the smallest possible subset, the statistical requirements are very high. In order to ensure the representativeness of the sample for the population, the necessary sample size must be statistically calculated and the appropriate type of sample must be determined through organizational tests. Subjective "empirical values" do not help here. Therefore, first of all, together with the client, stipulations about the desired accuracy and security of the sample collection must be made. These assumptions and calculations essentially determine the size of the sample, i. H. whether 5, 10 or 20 percent or possibly even larger proportions of the population are to be examined. Furthermore, the standard deviation (see excursus) must be estimated, for which the assumed maximum and minimum processing time are used.

Determination of the desired accuracy:

The accuracy of a sample (also: confidence interval or sample error) determines how strong the value determined with the help of a sample - e.g. B. the mean processing time - may differ from the actual value of the population at most. An accuracy of 5 percent means that the real value of the sample can be 5 percent below or above the real value of the population.

For example, if the actual but unknown average processing time for a process step is 50 minutes, an accuracy of the sample of 5 percent means that the average processing time determined with the help of the sample may be in the range of 47.5 and 52.5 minutes.

The higher the desired accuracy, the larger the sample size must be.

Determination of the desired security:

The certainty of a sample indicates in how many cases the method used delivers reliable results. A certainty or confidence level of 95% means that for 100 measurements 95 are correct or the value determined by the sample correctly maps the actual value of the population (or the above accuracy range) with a probability of 95 percent. The probability of error in the sampling procedure is 5 percent.

For the above example of accuracy, this means that the individually determined processing times of the process step are in the range of 47.5 and 52.5 minutes with a 95 percent probability.

The higher the desired security, the larger the sample size must be. The confidence level is statistically based on the so-called z-value[1] expressed. The statistical normal distribution results in the following z-values, which are required to calculate the sample size:

 Confidence level 50% 75% 95% 97,5% 99% Z-Value 0,674 1,15 1,96 2,24 2,57

Example: z-values ​​for the confidence level

Summary of the procedure for calculating the sample size

Summary of the procedure for calculating the sample size

Example: As part of a process-oriented PBE, the mean processing time for a process or sub-process is to be determined, which is carried out a total of 700 times a year in an investigation area.

Since there have not been any surveys in this regard, it is estimated that the mean processing time is around 25 minutes. The standard deviation based on the mean processing time is estimated at 10 minutes. In order to keep the effort for the survey within limits, a partial survey should be carried out. The deviation of the value determined by the random sample from the actual value of the population should not be more than 5%.

Example of the procedure for calculating the sample size

The calculation of the mean value is the rule when using the random sampling procedure in the context of a PBE. In addition, the random sampling method can also be used to determine the so-called proportionate value, for example if the proportion is to be determined that a certain workflow has in relation to the total number of workflows in the investigation area. A corresponding example can be found in Chapter 6.1.6 - Multi snapshot.

footnote

[1] The z-value is a mathematically determined constant for the respective confidence level. In particular, it is about the central probability D (z1) (or central statement probability) of the standard normal distribution and not just the left-hand (z1).
[2] This rule of thumb assumes an equal distribution of the individual values ​​between the minimum and the maximum value. In the case of other assumptions, e.g. a 2-point distribution, the denominator would be a 2, in the case of a triangular distribution a 4.2, in the assumption of a normal distribution a 5.15 in order to estimate the standard deviation. See Lange (2004): p. 29.
[3] For the minimum size of samples see excursus.

up