Sex and Age

The Pupil Identifiers module contains information about the identity of a pupil, including gender and date of birth. This module is collected for all pupils registered at the school on the census day, and additionally for those pupils not on roll on census day, for whom information is collected on exclusions, absence and learning aims in previous terms.

Data collection

The standard data release contains gender and Month of Birth, but the full date of birth can be available as a sensitive release. Information on the date of birth measured to the day has been used in analysis of the impact of month of birth on academic attainment, see Dearden et al (2007).

Validity of measure

We think of gender as non-varying for most people; date of birth is non-varying. However, different sweeps of the data may yield different values for a few individuals. For example, we might collect several measures of each: one from each of Keystage 4, Keystage 2, and Keystage 1 sweeps, and also up to nine years of Pupil Census reports from schools.

If there are variations, there are different possible strategies:
  • Place greatest weight on the Census
  • Place greatest weight on the most recent reports
  • Place greatest weight on the modal report of their characteristic

ADVICE: to a degree it depends on the nature of the discrepancies, but the Census should generally be relied on most, and within that the most recent entry.

Cleaning the variables


Stata code for cleaning up pupil sex

This snippet of code deals with possible ambiguities in multiple entries on gender, and with some in numeric form and some as strings. Any dates for the census sweeps can be added to the first line of code.

local censusdates "spr10 spr09 spr08 spr07 spr06 05 04 03 02"
gen ppfemale=.
foreach i of local censusdates    {
        capture replace ppfemale=1 if ppfemale==. & gender_`i'=="F"
        capture replace ppfemale=0 if ppfemale==. & gender_`i'=="M"
        capture replace ppfemale=1 if ppfemale==. & gend_`i'=="F"
        capture replace ppfemale=0 if ppfemale==. & gend_`i'=="M"
        capture drop gender_`i'
        capture drop gend_`i'
        }
local keystage "ks4 ks2 ks1"
foreach i of local keystage      {
        capture replace ppfemale=1 if ppfemale==. & `i'_gender=="F"
        capture replace ppfemale=0 if ppfemale==. & `i'_gender=="M"
        capture replace ppfemale=1 if ppfemale==. & `i'_gend=="F"
        capture replace ppfemale=0 if ppfemale==. & `i'_gend=="M"
        capture replace ppfemale=1 if ppfemale==. & `i'_female==1
        capture replace ppfemale=0 if ppfemale==. & `i'_female==0
        capture drop `i'_gender
        capture drop `i'_gend
        capture drop `i'_female
        }
label var ppfemale "Pupil is female"

2. Age: Stata code for cleaning up pupil gender from 2002 to Spring 2010.

One useful check to run is:

gen check =2008- ageatstartofacademicyear_spr09
tab check yearofbirth_spr09

Almost all observations will have the same year, or one different, depending on the month of birth.


YearOfBirth_SPR09







check
1990
1991
1992
1993
1994
Total







1990
1
0
0
0
0
1
1991
1
3
0
0
0
4
1992
0
58
260
0
0
318
1993
0
0
9,354
19,326
0
28,860
1994
0
0
0
15
7
22







Total
2
61
9,794
19,341
7
29,205

In this sample, 340 or 1.2% do not match. Generally the advice would be to give precedence to the information from the Census.

This snippet of code sorts out the age variables across years:
  • incomplete annotation*

capture program drop age
program define age
  • rename variables to ensure consistency across years:
capture rename ks4_yearofbirth yearofbirth_ks4
capture rename ks4_monthofbirth monthofbirth_ks4
capture rename ks2_yearofbirth yearofbirth_ks2
capture rename ks2_monthofbirth monthofbirth_ks2
capture rename ks1_yearofbirth yearofbirth_ks1
capture rename ks1_monthofbirth monthofbirth_ks1
  • generate the final variables:
gen ppyearmonth=.
label var ppyearmonth "Pupil YearMonth of birth (YYYYMM)"
gen ppmonth=.
label var ppmonth "Pupil relative age (August=1)"
local censusdates "spr10 spr09 spr08 spr07 spr06 05 04 03 02 ks4 ks2 ks1"
local j=9
foreach i of local censusdates {
local j=`j'-1
  • to create the variable ppyearmonth we need to multiply the
  • year by 100 to create two spare digits to the right of the year
  • and then add the month
capture gen tempyear`j'=yearofbirth_`i'*100+monthofbirth_`i'
capture replace ppyearmonth=tempyear`j' if ppyearmonth==.
  • explain why 21 ...:
capture gen tempmonth`j'=21-monthofbirth_`i'
capture replace tempmonth`j'=tempmonth`j'-12 if tempmonth`j'>12
capture bysort tempyear`j': gen temp00`j'=_N if tempyear`j'<.
capture summ tempyear`j'
  • r(N) is a return code after the summarise command which contains
  • the number of observations
capture gen temp11`j'=r(N)
capture replace tempmonth`j'=. if (temp00`j'/temp11`j')<0.02
capture replace ppmonth=tempmonth`j' if ppmonth==.
capture drop yearofbirth_`i'
capture drop monthofbirth_`i'
}
replace ppmonth=13 if ppmonth==. & ppyearmonth<.
tab ppmonth, gen(ppage)
replace ppmonth=. if ppmonth==13
rename ppage13 ppagenonstand
label var ppagenonstand "Pupil NOT standard age in year"







SPSS code:
To come


Description of values across cohorts

Here is an example of a distribution of month of birth from a small sample of observations from the spring 2009 Census

MonthOfBirth_SPR09



Frequency
Percent
Cum.




1
2,427
8.31
8.31
2
2,126
7.28
15.59
3
2,470
8.46
24.05
4
2,367
8.1
32.15
5
2,498
8.55
40.71
6
2,510
8.59
49.3
7
2,605
8.92
58.22
8
2,594
8.88
67.1
9
2,463
8.43
75.54
10
2,437
8.34
83.88
11
2,316
7.93
91.81
12
2,392
8.19
100




Total
29,205
100

==

==

References


Dearden et al (2007), L., Crawford, C. and Meghir, C. (2007) When you are born matters: the impact of date of birth on child cognitive outcomes in England. IFS Commentary.