English as an Additional Language

The standard release of the NPD provides this variable in various forms. The main variables are taken from the Spring Census. Most recently (2009 and 2010) there have been two aggregated variables, “minor language group” and “major language group”, which classify pupils according to whether their first language is known/believed to be English or not. In previous years the variable has been defined slightly differently (e.g. “first language”, “mother tongue”). The child's actual language spoken at home is also collected by the census, but it is treated as a sensitive variable and so must be specifically requested as described elsewhere. The attainment data for Key Stages 2 and 4 also includes a language indicator variable.

Data collection

This data is collected as part of the Pupil Characteristics module of each termly Census. At nursery schools and special schools, language data is only collected if the pupil is 5 years or older and at primary schools, if the pupil was 5 years or older on the previous 31 August. Schools usually request this information from parents at the point the pupil joins the school.

Validity of measure

Since EAL is collected termly, it can vary over time, although researchers tend to view it as a fixed characteristic. Taking year 7 pupils in 2009 who have full census data from 2005 onwards as an example, 87.21% have a constant response of “English” and 6.79% a constant response of “other”. From the same data, 0.03% have responded both “English” and “other” at some time (845 pupils). Note that the above percentages would not include e.g. a pupil who had missing data one year, who had “believed to be English” for some years and then “English” for others, etc.

The level of missingness is arguably low. For the complete 2009 dataset for year 7 pupils, only 19 pupils have no response of “English” or “other” at any point, and of these all but one have some assessment (i.e. “believed to be…”) of first language. Taking a wider measure, 154 pupils have at least one response from “refused”/”information not obtained”/”invalid code”/”missing value”/”classification pending” for some year.

We might question how well this variable proxies for fluency in English.

Cleaning the variable

As noted above, the variable can take different values for the pupil at different collection dates. One option in order to define a fixed value is to give priority to the most recent response, unless this is missing or defined as uncertain, in which case take the next, etc. The code below follows this logic and also, as suggested by DfE user guides, prioritises census responses over values from attainment data.

Stata code to create a consistent EAL indicator:
gen ppeal=.
label var ppeal "Mother tongue not English"
capture destring(ks4_flang), replace
 
foreach i in languagegroupminor_spr10 languagegroupminor_spr09 languagegroup_spr08 ///
languagegroup_spr07 firstlanguage_spr06 ///
firstlanguage_05 firstlanguage_04 firstlanguage_03 flang_03 mothertongue_02 mton_02 {
    capture replace ppeal=1 if ppeal==. & `i'=="OTH"
    capture replace ppeal=0 if ppeal==. & `i'=="ENG"
    }
foreach i in languagegroupminor_spr10 languagegroupminor_spr09 languagegroup_spr08 ///
languagegroup_spr07 firstlanguage_spr06 ///
firstlanguage_05 firstlanguage_04 firstlanguage_03 flang_03 mothertongue_02 mton_02 {
    capture replace ppeal=1 if ppeal==. & `i'=="OTB"
    capture replace ppeal=0 if ppeal==. & `i'=="ENB"
    capture drop `i'
    }
foreach i in ks4_flang ks2_flang {
    capture replace ppeal=1 if ppeal==. & `i'==1
    capture replace ppeal=0 if ppeal==. & `i'==0
    capture drop `i'
    }



Description of values across cohorts
By age groups
Over time
Stability within pupil