📌 Snapshot
- Raw, unorganised data are arranged into meaningful classes so that further statistical analysis becomes possible.
- Classification may be chronological, spatial, qualitative or quantitative; variables are continuous or discrete; data are organised in a frequency distribution table.
- Technical terms — class limits, class interval, class mark, class frequency, range, tally marks, inclusive vs exclusive method — must be defined precisely.
- A frequency array is for discrete data; a frequency distribution is for continuous data; a bivariate frequency distribution records two variables together.
- CUET regularly tests definitions, formulas (especially class mid-point and range), and the inclusive/exclusive distinction, making this a high-yield chapter.
- This data structure underlies all later chapters (measures of central tendency, dispersion, correlation, index numbers).
📖 Detailed Notes
2.1 Core concepts
- Purpose of classification: raw data are highly disorganised and cumbersome; classification is the arranging or organising of things into groups or classes based on some criteria, bringing order and making statistical analysis possible (NCERT §1, pp. 22–24).
- Raw data: unclassified observations on a variable (e.g., marks of 100 students in Table 3.1; monthly food expenditure of 50 households in Table 3.2) that do not yield to statistical methods until summarised (NCERT §2, pp. 23–24).
- Chronological classification: data classified with reference to time (years, quarters, months, weeks) in ascending or descending order; e.g., population of India from 1951–2011, which constitutes a Time Series (NCERT §3, p. 25, Example 1).
- Spatial classification: data classified with reference to geographical locations such as countries, states, cities or districts; e.g., yield of wheat in different countries in 2013 (NCERT §3, p. 26, Example 2).
- Qualitative classification: based on characteristics (attributes) that cannot be measured — e.g., nationality, literacy, religion, gender, marital status — classified by presence or absence of the attribute (NCERT §3, p. 26, Example 3).
- Quantitative classification: based on measurable characteristics such as height, weight, age, income, marks — grouped into classes (NCERT §3, pp. 26–27, Example 4).
- Continuous variable: can take any numerical value — integral, fractional or irrational (e.g., height, weight, time, distance); values can be broken into infinite gradations (NCERT §4, p. 27).
- Discrete variable: takes only certain values; changes by finite "jumps" with no intermediate values (e.g., number of students in a class) (NCERT §4, pp. 27–28). A discrete variable can still take fractional values like 1/8 or 1/16 — what matters is the jump, not whether the value is a whole number.
- Frequency distribution: a comprehensive way to classify raw data of a quantitative variable, showing how different values are distributed across classes with their class frequencies (NCERT §5, p. 28).
- Class limits: the two ends of a class; the lowest value is the Lower Class Limit and the highest the Upper Class Limit (NCERT §5, p. 28).
- Class interval / class width: difference between the upper class limit and the lower class limit (e.g., 10 for class 60–70) (NCERT §5, p. 28).
- Class mid-point / class mark: middle value of a class = (Upper Class Limit + Lower Class Limit) ÷ 2; used to represent the class once data are grouped (NCERT §5, p. 29). All subsequent statistical calculations (mean, variance) are based on class marks.
- Number of classes: usually between 6 and 15; for equal class intervals, number of classes = range ÷ class interval (NCERT §5, pp. 29–30).
- Range: difference between the largest and the smallest values of the variable (NCERT §5, p. 30).
- Inclusive method: values equal to both the lower and upper limits of a class are included in that class (e.g., 0–10, 11–20, 21–30…) (NCERT §5, pp. 30–31).
- Exclusive method: an item equal to either the upper or the lower class limit is excluded from that class (e.g., 0–10, 10–20, 20–30…) (NCERT §5, p. 31).
- Adjustment in class intervals: to restore continuity in an inclusive distribution, subtract 0.5 from each lower limit and add 0.5 to each upper limit (e.g., 800–899 becomes 799.5–899.5) (NCERT §5, p. 32, Tables 3.4 & 3.5).
- Tally marks: class frequency is counted by putting a tally (/) against the class for each observation; tallies are grouped in fives for ease of counting (NCERT §5, pp. 32–33, Table 3.6).
- Loss of information: once data are grouped, individual observations are replaced by the class mark in calculations, leading to loss of information; but this is more than offset by the gain in comprehension (NCERT §5, pp. 33–34).
- Unequal class intervals: used when a large number of values are concentrated in a small portion of the range, or when the range itself is very wide (e.g., income) (NCERT §5, pp. 29 & 34, Table 3.7).
- Frequency array: classification for a discrete variable, where frequencies correspond to each integral value of the variable (e.g., size of households in Table 3.8) (NCERT §5, p. 35).
- Bivariate frequency distribution: frequency distribution of two variables together (e.g., sales and advertisement expenditure of 20 firms in Table 3.9); used later in correlation (NCERT §6, p. 36).
- Why organise data at all: raw collected data are like a stack of unsorted laundry — useless until folded and put in drawers. Classification is the folding-and-sorting step that prepares data for tabulation, diagrammatic presentation and statistical computation (NCERT §1, p. 22).
- Variable vs attribute: a variable is a measurable characteristic (income, weight); an attribute is a qualitative characteristic (literacy status, religion). Attributes are classified by counts; variables can be classified by counts AND by magnitude (NCERT §3, p. 26). CUET often tests this distinction with examples like "gender" (attribute) vs "height" (variable).
- Class limit terminology drill: in the class "40–50", 40 is the lower class limit (LCL), 50 is the upper class limit (UCL), 50 − 40 = 10 is the class interval (h), (40+50)/2 = 45 is the class mark (m) (NCERT §5, p. 28). All four terms can appear in a single MCQ.
- Equal vs unequal class intervals: NCERT recommends equal widths when the data are evenly spread, but allows unequal widths when (i) values are concentrated in a narrow range of the variable, or (ii) the overall range is so wide that equal classes would produce many empty cells (NCERT §5, pp. 29 & 34). Income distributions typically use unequal classes (e.g., ₹0–10k, ₹10–25k, ₹25–50k, ₹50k+).
- Continuity correction logic: in the inclusive series 0–9, 10–19, 20–29, the value 9.5 has no home — it falls in the gap. Subtracting 0.5 from each lower limit and adding 0.5 to each upper limit produces 0.5–9.5, 9.5–19.5, 19.5–29.5… closing every gap so that every possible value sits in exactly one class (NCERT §5, p. 32).
- Open-end classes: NCERT briefly mentions classes such as "below 100" or "1000 and above" — used at the extremes of an income or expenditure distribution. For computational purposes, the missing limit is usually assumed by symmetry with the adjacent class (NCERT §5, p. 34).
- Cumulative frequency is introduced implicitly: by adding successive class frequencies, one can answer "how many observations are below value X?" — a stepping stone to the median and percentile calculations in kest105 (NCERT §5, p. 33, implicit).
- Bivariate setup detail: in a bivariate frequency table, one variable is laid out along the rows (say, sales) and the other along the columns (say, advertisement expenditure); each cell records the joint frequency. Row and column totals are called marginal frequencies, and the grand total equals the total number of observations (NCERT §6, p. 36).
- Time-series example: NCERT references India's population growth over 1951–2011 to illustrate chronological classification — the year is the criterion and the population the variable. CUET sometimes asks students to identify whether a given table is time-series or cross-section; the rule is "time-series = many time points, one unit; cross-section = one time point, many units" (NCERT §3, p. 25).
- Spatial example: yields of wheat across countries in 2013 (Example 2, p. 26) — the criterion is location, the variable is yield, the time is held constant. A spatial table is a one-time-period geographic cross-section.
- Loss of information caveat: once class 60–70 has frequency 8, NCERT warns we no longer know whether those 8 students scored 60, 61, …, 69 or all clustered at 69 — we lose that detail forever. Statisticians accept this loss because the compactness gain (8 numbers instead of a list of 100) is far larger (NCERT §5, p. 33).
- Tally-marks five rule: tallies are bundled into groups of five — four vertical strokes plus a diagonal — because the human eye can count five-bundles much faster than individual strokes; this is purely an ergonomic convention (NCERT §5, p. 33).
- Discrete fraction example: shoe sizes in India jump in halves — 7, 7.5, 8, 8.5 — so "shoe size" is a discrete variable that can take half-integer values. The defining feature of discreteness is the gap between admissible values, NOT integer-ness (NCERT §4, p. 28).
- Number of classes — Sturges' guideline (background): while NCERT only says "6 to 15 classes", a common rough rule is k = 1 + 3.322 log₁₀ N, giving about 7 classes for N=100 — consistent with NCERT's range (NCERT §5, p. 29, contextual).
2.2 Definitions to memorise
| Term | Definition | Page |
|---|---|---|
| Classification | Arranging or organising things into groups or classes based on some criteria | 23 |
| Raw data | Unclassified data; observations not arranged in any order | 23–24 |
| Chronological classification | Data classified with reference to time | 25 |
| Time series | Sequence of observations on a variable taken over successive time periods | 25 |
| Spatial classification | Data classified with reference to geographical locations | 25–26 |
| Qualitative classification | Classification on the basis of presence or absence of an attribute | 26 |
| Quantitative classification | Classification of data on measurable characteristics into classes | 26 |
| Continuous variable | Variable that can take any numerical value, including fractional and irrational | 27 |
| Discrete variable | Variable that takes only certain values and changes by finite "jumps" | 27–28 |
| Frequency distribution | Classification showing values of a quantitative variable distributed across classes with frequencies | 28 |
| Class | A group of values into which observations are grouped | 28 |
| Class frequency | Number of values in a particular class | 28 |
| Class limits | Two ends (lower and upper) of a class | 28 |
| Class interval / class width | Upper class limit − Lower class limit | 28 |
| Class mid-point / class mark | (Upper class limit + Lower class limit) ÷ 2 | 29 |
| Range | Largest value − Smallest value of the variable | 30 |
| Inclusive method | Both lower and upper class limits are included in that class | 30 |
| Exclusive method | An item equal to the upper or lower class limit is excluded from that class | 31 |
| Adjustment factor | ±0.5 added/subtracted to restore continuity when converting inclusive to exclusive | 32 |
| Tally mark | Stroke (/) used to count frequency, grouped in fives for ease of counting | 33 |
| Frequency array | Classification of a discrete variable, with frequency for each integral value | 35 |
| Bivariate frequency distribution | Frequency distribution of two variables together | 36 |
| Loss of information | Trade-off in classification — individual values are replaced by the class mark | 33–34 |
| Equal class intervals | Class width constant across all classes | 29 |
| Unequal class intervals | Class widths that differ across classes, used when data are concentrated or range wide | 34 |
2.3 Diagrams / processes to remember
- Fig. 3.1 (p. 29): frequency curve — diagrammatic presentation of frequency distribution of marks of 100 students; class marks on X-axis, frequency on Y-axis.
- Fig. 3.2 (p. 35): frequency curve of distribution in Table 3.7 (unequal class intervals).
- Table 3.1 (p. 24): raw marks of 100 students in mathematics — base data used throughout this topic.
- Table 3.2 (p. 24): raw monthly household food expenditure of 50 households.
- Table 3.3 (p. 29): lower class limit, upper class limit, and class mark for ten classes 0–10, …, 90–100.
- Table 3.4 (p. 31) and Table 3.5 (p. 32): frequency distribution of incomes of 550 employees — before and after adjustment of class intervals (inclusive → exclusive form with ±0.5).
- Table 3.6 (p. 33): tally marking process for marks of 100 students; tallies grouped as //// with the fifth placed across.
- Table 3.7 (p. 34): frequency distribution with unequal class intervals.
- Table 3.8 (p. 35): frequency array of household size (discrete variable, values 1 to 8).
- Table 3.9 (p. 36): bivariate frequency distribution of sales and advertisement expenditure of 20 firms.
- Tree diagram (Example 3, p. 26): population branching first into Male/Female and then into Married/Unmarried — illustration of qualitative classification.
- Classification flow: raw data → choose criterion (time/space/attribute/measurement) → if measurable, choose continuous vs discrete → choose number and width of classes → use inclusive or exclusive method → tally → frequency distribution.
- Worked example — building a frequency table from raw marks: suppose 20 students score 12, 25, 33, 47, 51, 9, 18, 27, 36, 44, 55, 8, 22, 30, 41, 49, 17, 28, 39, 46. (i) Range = 55 − 8 = 47. (ii) With class width 10 we need k = 47/10 ≈ 5 → use 6 classes 0–10, 10–20, 20–30, 30–40, 40–50, 50–60 (exclusive method, upper limit excluded). (iii) Tallying: 0–10 → 2 (8, 9); 10–20 → 3 (12, 17, 18); 20–30 → 4 (22, 25, 27, 28); 30–40 → 3 (30, 33, 36, 39 — wait, that is 4) — actually 4; recount → 0–10:2, 10–20:3, 20–30:4, 30–40:4, 40–50:5 (41, 44, 46, 47, 49), 50–60:2 (51, 55), total = 20 ✓. (iv) Class marks are 5, 15, 25, 35, 45, 55 — these become the X-values for any subsequent mean calculation in kest105.
- Worked example — inclusive-to-exclusive conversion: take classes 0–9, 10–19, 20–29, 30–39 (inclusive). Apply ±0.5 correction → −0.5–9.5, 9.5–19.5, 19.5–29.5, 29.5–39.5 (exclusive, continuous). The class widths are now 10 each, no gaps, and a value of 9.5 lies exactly on the boundary of class 1 and class 2 — by exclusive-method convention it is placed in the next class (NCERT §5, p. 32 worked-out).
- Worked example — class mark for unequal classes: for classes 0–10, 10–25, 25–50, 50–100, the class marks are 5, 17.5, 37.5, 75. Note that for unequal widths, class marks are still mid-points of each class, but they are no longer equally spaced — a fact used when plotting histograms with variable-width rectangles in kest104.
- Frequency curve sketch (Fig. 3.1, p. 29): class marks 5, 15, …, 95 on the X-axis; frequencies (e.g., 1, 8, 6, 7, 21, 23, 19, 6, 5, 4) on the Y-axis; points joined by a smooth curve passing through them. The curve typically rises to a peak (mode region) and tails off on either side — anticipating the unimodal distributions of kest105.
2.5 Key formulas / structural ratios
| Formula | Meaning | NCERT page |
|---|---|---|
| Class mark m = (UCL + LCL) ÷ 2 | Mid-point representing the class | 29 |
| Class interval h = UCL − LCL | Width of one class | 28 |
| Range R = Xmax − Xmin | Spread of the entire data set | 30 |
| Number of classes k ≈ R ÷ h | Approximate number of equal-width classes | 29–30 |
| Adjustment for continuity = ±0.5 | Conversion of inclusive to exclusive form | 32 |
| Σ fᵢ = N | Sum of class frequencies equals total observations | 33 |
| Frequency relative = fᵢ ÷ N | Share of total observations falling in a class | 35 |
2.4 Common confusions / NTA trap points
- Range vs class interval: range is across the whole data set (largest − smallest); class interval is for a single class (upper − lower limit of that class). Distractors swap them.
- Inclusive vs exclusive method: in inclusive, both limits belong to the same class; in exclusive, one limit (usually the upper) is excluded. NTA tests this with statements like "10 is in 10–20" (true in exclusive form when upper is excluded; true in inclusive form when 10–19, 20–29 etc. are used).
- Class mark formula: it is the average of the two limits, not the difference, product or ratio.
- Discrete variable can still take fractions: e.g., 1/8, 1/16, 1/32 — what matters is that it "jumps" with no values in between.
- Statistical calculations after classification are based on class marks, NOT on the actual observed values.
- Frequency array is for discrete data only; frequency distribution with class intervals is for continuous data.
- Univariate vs bivariate: a bivariate distribution involves two variables, not multiple categories of one variable.
- Number of classes is typically between 6 and 15, not "as many as observations".
- Adjustment factor = 0.5, not 1 or 1.5.
- Class frequencies sum to N, not to 100.
- Time-series ≠ cross-section: time-series is over time (chronological); cross-section is across units at a point in time (spatial).
- Qualitative ≠ quantitative: qualitative classifies by attribute; quantitative classifies by measurement.
🎯 Practice MCQs
First 3 questions free · create a free account to unlock the rest — answers & explanations included, no payment needed
Q1. The class midpoint is equal to:
▸ Show answer & explanation
Answer: C
Q2. The frequency distribution of two variables together is known as:
▸ Show answer & explanation
Answer: B
Q3. Once raw data are grouped into classes, further statistical calculations are based on:
▸ Show answer & explanation
Answer: D
🔒 9 more practice MCQs
Create a free account to unlock every MCQ in this chapter — answers and explanations included. No payment needed.
Already registered? Just log in and they'll all appear here.
Q4. Consider: Statement I: A continuous variable can take integral, fractional and irrational values such as √2 or √3. Statement II: A discrete variable can never take any fractional value because it changes only in whole-number jumps.
▸ Show answer & explanation
Answer: A
Q5. Match the classifications with the basis on which data are grouped: | Classification | Basis | |---|---| | 1. Chronological | i. Geographical location | | 2. Spatial | ii. Presence or absence of an attribute | | 3. Qualitative | iii. Time | | 4. Quantitative | iv. Measurable characteristics |
▸ Show answer & explanation
Answer: A
Q6. To restore the continuity of data in an inclusive distribution (e.g., 800–899, 900–999), what adjustment converts it into exclusive form?
▸ Show answer & explanation
Answer: B
Q7. A researcher records the number of cell phones used per family in a city. The recorded values are 0, 1, 2, 3, 4, 5, 6, 7 only. The most appropriate classification is:
▸ Show answer & explanation
Answer: C
Q8. The range of a data set is defined as:
▸ Show answer & explanation
Answer: B
Q9. In the exclusive method, the value 20 is placed in which class — 10–20 or 20–30?
▸ Show answer & explanation
Answer: B
Q10. Which of the following best describes "raw data"?
▸ Show answer & explanation
Answer: B
Q11. The number of classes in a frequency distribution is normally:
▸ Show answer & explanation
Answer: B
Q12. Tally marks are usually grouped in:
▸ Show answer & explanation
Answer: C
📊 Previous-Year Questions
Practise with real CUET Economics previous-year papers — every question solved, with the correct answer and a step-by-step explanation.
View solved CUET PYQ papers →Ready to drill Economics?
Unlock all MCQs, chapter tests, mocks & PYQs for ₹199/year.
Get UniDrill Pro