Frequency Distributions

MEASURES OF CENTRAL TENDENCY

2.1. Frequency Distributions.

When observations, discrete or continous , are available on a single characteristic of a large number of individuals, often it becomes necessary to condense the data as far as possible without losing any

information of interest Lets consider the marks in Statistics obtained by 250
candidates selected at random from among those appearing in a certain examination
·.
TABLE 1: MARKS IN STATISTICS OF 250 CANDIDATES , ' ,

32	47	41	51	41	30	39	18	48	53
54	32	31	46	15	37	32	56	42	48
38	26	50	40	38	42	35	22	62	51
44	21	45	31	37	41	44	18	37	47
68	41	30	52	52	60	42	38	38	34
41	53	48	21	28	49	42	36	41	29
30	33	31	35	29	37	38	40	32	49
43	32	24	38	38	22	41	50	17	46
46	50	26	15	23	42	25	52	38	46
41	38	40	37	40	48	45	30	28	31
40	33	42	36	51	42	56	44	35	38
31	51	45	41	50	53	50	32	45	48
40	43	40	34	34	44	38	58	49	28
40	45	19	24	34	47	37	33	37	36
36	32	61	30	44	43	50	31	38	45
46	40	32	34	44	54	35	39	31	48
48	50	43	55	43	39	41	48	53	34
32	31	42	34	34	32	33	24	43	39
40	50	27	47	34	44	34	33	47	42
17	42	57	35	38	17	33	46	36	23
48	50	31	58	33	44	26	29	31	37
47	55	57	37	41	54	42	45	47	43
37	52	47	46	44	50	44	38	42	19
52	45	23	41	47	33	42	24	48	39
48	44	60	38	38

This representation of, the data dQes not furnish any useful information and is
rather confusing to mind. A better way may be to express the figures in an
ascending or descending order of magnitude, commonly termed as array. But this
does not reduce the bulk of the data. A much better representation is given on the
next page.
A bar ( I ) called tally mark is put against the'number when it occurs. Having
occurred four times. the fifth occurrence is represented by puttfug a cross tally (j)
on the first four. tallies. This technique faciliiates the counting of the tally marks
at the end.
The representation of the data as above is known as frequency distribution.
Marks are called the variable (x) and the 'number of students' against the marks
is known as the frequency (f) of the variable. The word 'frequency' is derived
from 'how frequently' a variable occurs. For example, in the above case the
frequency of 31 is 10 as there are ten students getting 31 marks. This representation,
though beuer than an array' ,does not condense the data much and it is
quite cumbersome to go through this huge mass of iIata.

If the identity of the individuals about whom a particular information is taken
is not relevant, nor the order in which the observations arise, then the first real step
of condensation is to divide the observed range of variable into a suitable number
of class-intervals and to recall the number of observations in each class. For
example, in the above case, the data may be expressed as shown in Table 3.
Such a table showing the distribution

Frequency Distributions Arid Measures Of'Central Tendency 1·3
all the values from:20 10 24, both inclusive 'and tlie classification is termed as
inclusive type classification.
In spite of great importance of classification in statistical analysis, no hard and
fast rules can be laid down for it The following points may be kep~ in mind for
classification' :
(i) Th~ classes should be clearl5' defmedand should not lead 10 aliy ambiguity.
(ii) The classes should be exhaustive, i.e., each of the given values should be
included in one of the classes.
(iii) The classes should
(iv) The classes should be of equal width. The principle, however, cannot be
rigidly followed. If the classes are of var:yin~ width, the different class frequencies
will not be comparable. Comparable figures can be obtained by dividing the value
of the frequencieS by the 'corresponding widths of the class intervals. The ratios
thus obtained are called 'frequency densities' .
(v) Indeterminate classes, e.g •• the open-end classes. less than 'a' or greater
than 'b' should be avoided as far as possible since they create difficulty in analysis
and interpretation.
(vi) The number of classes should neither be too large nor too small. It should
preferably lie between 5 and 15. However. the number of classes may be more"
than 15 depending upon the IOtaI frequency and the details required. but it is
d~irable that it is not less than 5 since in.that case the classification may not reveal
the essential characteristics of the population. The following fQrmula due to
SlrUges may be ~ to determine an approximate ~umber k of classes :
k = 1 + 3·322log10 N.
where N is the total frequency.
The Magni~de or u.e (::Iass IDle"al
Having'faxed the number of classe$.'divide the range (the difference. bet}Yeen
the greateSt and the smallest observation) by it and the nearest integer to this. value
giv<;.s the magnitude of the c~ interval. Broad class intervals ( i;e .• ICS$ n"mber
of classes) will yield -only rough estimates while for high degree of accuracy small
class intervals ( i.e .• large number of classes) are desirable.
CIauLimits
1;be class limits should be cOOsen in such a way that the mid-vaI~'of~ class
intezval and.actual average of the observations in that claSs interval are as near'to
each other as possible. If this is not the case then the classification gives a distorted
picCUre of the characteristics of the dala. Jf possible. class limitS stiould tie locaied
at the points which are multiple of 0, 2. s. 10 •••• etC •• sO that the midpoints of the
classes are the Common figures, viz .• O. 2. 5. 10 .•.• ele .• the figures capable of easy
and simple analysis.

2·1·1. Continuous Frequency Distribution.

If we deal with a continuous
variable, it is not possible to arrange the data in the class intervals of above type.
Let us consider the distribution of age in years. If class intervals are 15-19,
20-24 then the persons with ages between 19 and 20 years are not taken into
consideration. In such a case we form the class intervals as shown below.
Age in years
Below 5
5 or more but less than 10
10 or more but less than 15
15 or more but less than 20
20 or more but less than 25
and soon.
Here all the persons with ~y fraction of age are included in one group or the
other. For practical purpose we re-writethe above clasSes as
0-5
5-10
10-15
15-20
20-25
This form of frequency distribution is known as continuousj:-equency distribution.
It should be clearly understood that in. the above classes, the upper limits of
each class are excluded from the respective classes. Such classes in which the upper
limits are excluded from the respective classes and are included in Ihe immediate
next class are known as 'exclusive classes' and Ihe classification is termed as
'exclusive type classification.

2·2. Graphic Representation of a Frequency distribution.

It is often useful to represent a frequency distribution by means of a diagram which makes the

unwieldy data intelligible and conveys to the eye the general run of the observations. diagrammatic representation also facilitates the comparison of two or more

frequency distributions. We consider below some important types of graphic

representation.

Rccinstitution