What is a class interval in statistics

Classified variable

Next page:Types of variables Upwards:Data preparation Previous page:Grouped data & nbsp index

For practical reasons, it is often useful to assign variables with a large number of characteristics classify, i.e. grouping several values ​​into one value range (classified variable). This can already be done during the data collection or during the later statistical analysis.

The choice of classes and their limits largely depends on content-related criteria. Formally, a classification must meet three requirements: uniqueness, exclusivity and completeness. A classification is unambiguous if every empirically occurring value can be assigned to a class. It is exclusive if each value falls into only one class and not into several classes. Finally, it is complete when both of the previous conditions apply, i.e. there is no value that cannot be assigned to any class. In particular, the class boundaries must be defined in such a way that it is clear which class is assigned a value that falls exactly on the class boundary.

From a statistical point of view, it is advantageous if the Class widths are the same size and open classes be avoided. However, these claims cannot always be honored for the following reasons:

  • The class width should be chosen sufficiently large to prevent individual classes from being filled too small. To ensure this, it may be necessary in some cases to increase the widths of individual classes deviating from the selected classification interval. Or if many values ​​fall into one class, it may be necessary to reduce certain classes (see the income class 1,800-2,000 DM in the example).
  • Sometimes it is also unknown which value the examined characteristic can assume maximally or minimally. In this case, the top or bottom class must remain open at the top or bottom (cf. the income class 7,500 DM and more in the example).
The classification is always associated with a loss of information, so that the original values ​​should, if possible, be used, unless there are substantive reasons for differentiating between different classes of variable values.

How one can treat the classified variable in the subsequent statistical analysis also depends on these (content-related) decisions. In the second example (phases of life), the individual classes are viewed as categories that are different from one another and, at best, have a hierarchy. In the first example (average age) you should actually go back to the original values. If these are not available, the middle of the class is used as an estimate for the original values ​​combined in the class.

notation: be the upper limits of the classes . Square and round brackets then indicate whether the respective class boundary should still belong to the class interval (square brackets) or not (round brackets). says, for example, that the second class has all values ​​of including to below includes: . In other words: the lower limit belongs to this class, but the upper limit does not. The square brackets in front and after finally indicate that no open classes are used at the lower and upper end of the classification. You are therefore certain that the range of values ​​of the variables to be classified is from to is enough and no smaller or larger values ​​can appear.

Next page:Types of variables Upwards:Data preparation Previous page:Grouped data & nbsp index HJA 2001-10-01