Geographical location

The standard release of NPD provides the pupil's home local authority and lower layer super-output-area. A lower SOA is an approximate geo-location for the actual home and has a mean population of 1500. It is an area that is nested between 2001 Census Output Areas (OAs) and local authorities. SOAs give an improved basis for comparison across the country because the units are more similar in size of population than, for example, electoral wards. They are also intended to be stable, enabling the improved comparison and monitoring of policy over time. The 32,482 lower SOAs in England were generated automatically and released to the public in February 2004.

These standard-release variable are derived by DfE from the pupil's home postcode that is collected by schools. The northing and easting co-ordinates of the centroid of the pupil's home postcode is available as a special sensitive data request.

These variables have been used in four main ways by researchers:
  • to explore distances travelled to schools by pupils (e.g. Singleton et al., 2010; Watson and Church 2009)
  • to identify house moves by families (e.g. Allen et al., 2010; Marquis and Jivraj, 2009; Jivraj and Marquis, 2009)
  • to measure the characteristics of pupils living close to a particular school (e.g. Allen, 2007; Allen and West, 2010; Burgess et al., 2004)
  • to attach socio-demographic information to the pupil (Watson and Church 2009)

Data collection

Schools hold the pupil's full home address in their management information systems, such as SIMS, and the postcode forms part of this data collection. Most schools will try to ensure this information is up-to-date at least once a year by giving parents a print-out form to check their family details are correct. However, clearly some parents will not return the form.

The postcode is passed to DfE termly as part of the School Census and is stored by DfE as a string variable with variable length of 6, 7 or 8. The northing and easting coordinates for the postcode and the pupil's home lower super output area and home local authority are merged onto the postcode variable using the ONS Postcode Directory (see next section).

DfE will release the pupil's home postcode to researchers who (1) make a case for it; and (2) can show the data will be held securely. They will not release the postcode solely for the purposes of attaching geo-data since they can carry out that task at DfE. They also will not release the postcode to enable home-school distance calculations: these are included in the most recent census sweeps and DfE will calculate them for you in earlier sweeps.

Office for National Statistics Postcode Products

NSPL & ONSPD (variants of these datasets have previously been known as the NSPD and AFPD) directories enable researchers to link postcode to a range of higher geographies, and from there access additional geocoded variables. While the two directories enable the same process, they differ in the manner in which each postcode is mapped into a higher level geography, such as SOA or ward.

These directories can also be used to validate postcodes for accuracy as they are sourced from the Royal Mail Postcode Address File (PAF) and represent the best available source of current and terminated postcodes.

The ONS recommend that researchers use the ONS Postcode Look-up Directory (NSPL) rather than the ONS Postcode Directory (ONSPD). Postcodes are allocated to Output Areas based on the point in polygon process. The Output Areas are then allocated to each type/level of geography based on the population of the Output Area (where the greater proportion lies, or if split between more than two higher geographies, the greatest proportion of population). The ONSPD uses a point-in-polygon method to map postcodes into higher geographies. This method is not recomended for research use by the ONS.

Full documentation is available on the ONS website (link last accessed 05th April 2011).

The postcode products are available to download free of charge for academic users who have completed the Athens registration process. Registration is only available to those carrying out academic work in UK Higher and Further Education, and some Research Council staff. Postcode products can be accessed via UKBORDERS on the University of Edinburgh website.

Validity of measure

A pupil's full postcode is a fairly accurate geo-location for their house since it will typically only include around 15 households. However, it can include up to 100 households in very dense urban areas (see the ONS website for more details). It should locate them to within 100m (Harland and Stillwell, 2007).

The main threat to validity is that the postcode may be mis-coded or be out-of-date. The data description of the postcode variable below suggests that at least 0.5% are mis-coded since they cannot successfully be matched to a lower SOA.

Allen et al. (2010) argue that that the very large number of postcode changes between the last recorded postcode at primary school in year 6 and the first recorded postcode in secondary school shows that primary schools have failed to successfully maintain up-to-date records of the pupil's current address.

Where a pupil's postcode changes, this does not necessarily mean that they have moved house since Royal Mail regularly reclassify groups of housing. The approach used in Allen et al. (2010) to identify postcode reclassifications includes excluding house moves where at least 8 other pupils have an identical change in postcode and also excluding moves of less than 100 metres.

Cleaning the variables

Pupil postcode

Matching the home postcode in the School Census to the NSPL can determine whether the postcode recorded each year in the School Census is valid and not missing.

Stata code to change string length:
The postcode in NPD is a string of variable length (6, 7 or 8). You need a fixed length string of 7 or 8 characters to merge the postcode with various databases such as the NSPL. This code below changes the length to an 8 character string.

foreach i in 02 03 04 05 aut06 aut07 aut08 aut09 aut10 ///
spr06 spr07 spr08 spr09 spr10 sum06 sum07 sum08 sum09 sum10 {
                       capture egen temp1=ends(postcode_`i'), punct(" ") head trim
                       capture egen temp2=ends(postcode_`i'), punct(" ") last trim
                       capture gen temp3=length(temp1)
                       capture replace temp1=temp1+" " if temp3==3 | temp3==2
                       capture replace temp1=temp1+" " if temp3==2
                       capture gen postcodefixed_`i'=temp1+" "+temp2 if temp2!=""
                       capture label var postcodefixed_`i' "Pupil postcode in `i'"
                       capture drop temp*

SPSS code to change string length:
Alternatively, the spaces in the postcode variable could be removed from the School Census data and the NSPL before matching the data - where "post" is the variable name in the School Census and "postcode" is the variable name in the NSPL. The NSPL also includes higher geographic references which can be matched to the School Census including the Output Area (OA).

COMPUTE post_02 = REPLACE (post_02," ","").
COMPUTE postcode_02 = REPLACE (postcode_02," ","").

It is also possible to separate the postcode into "inward" (the first half) and "outward" (the second half) portions. The inward postcode consists of 1 or 2 letters, followed by 1 or 2 numbers (e.g. E5, BN2, TN22). The outward postcode consists of 1 number and 2 letters. After separation, the inward and outward postcode can be merged, and this also gives a postcode without a space.

Postcode cleaning:
Common errors in entering postcodes include using an O instead of a zero at the end of the inward portion, and a letter I instead of a 1 in the same position. These can be changed after isolating the inward postcode.

To ascertain whether the postcode is valid:
Following a match of the postcode variable from the NSPL a variable can be derived that determines whether the home postcode recorded in the School Census is valid, missing or invalid.

DO IF post_02 = ''.
COMPUTE postcodestatus_02= 1.
ELSE IF post_02 ~= '' AND postcode_02 = ''.
COMPUTE postcodestatus_02 = 2.
ELSE IF postcode_02 ~= ''.
COMPUTE postcodestatus_02 = 0.
VALUE LABELS postcodestatus_02 0 "valid" 1 "missing" 2 "invalid".

Using this code Marquis and Jivraj (2009) find that:
Valid Postcode (%)
Missing Postcode (%)
Invalid Postcode (%)
Total Records

Higher level geographic codes (local authority district and ward code) can be derived from OA matched from the NSPL:
STRING distcode_02 (a4).
COMPUTE distcode_02 = substr (oa_02,1, 4).
STRING wardcode_02 (a6).
COMPUTE wardcode_02 = substr (oa_02,1,6).

Geographic codes can be used with the postcode status variable to determine whether a pupil's has changed their home postcode once the School Census has been merged over time:
COMPUTE movetype0203 = 5.
ELSE IF (postcodestatus_02 > 0).
COMPUTE movetype0203 = 6.
ELSE IF (postcodestatus_03 > 0).
COMPUTE movetype0203 = 7.
ELSE IF (postcode_03 = postcode_02).
COMPUTE movetype0203 = 1.
ELSE IF (wardcode_03 = wardcode_02).
COMPUTE movetype0203 = 2.
ELSE IF (distcode_03 = distcode_02).
COMPUTE movetype0203 = 3.
COMPUTE movetype0203 = 4.
VALUE LABELS movetype0203 1 'no move' 2 'move within ward' 3 'move within district'
4 'move between districts' 5 'missing or invalid postcode 1st year' 69 'missing or invalid postcode 2nd year'.

Using coordinates matched from the NSPD the distance moved by pupils over time can be measured:
COMPUTE movedistance_0203=SQRT(((easting_02-easting_03) * (easting_02-easting_03)) +
((northing_02-northing_03) * (northing_02-northing_03))).

Description of values across cohorts

For an example cohort of 555,280 pupils in year 7 in 2008:
  • just four pupils have an entirely missing postcode field
  • 1,559 have a postcode that DfE failed to match into a lower SOA, suggesting that the postcode itself may be mis-coded
  • the pupils live in 363,384 unique postcodes
  • there are an average of 17 pupils in this cohort per lower SOA.

If the same example cohort of 555,280 pupils in year 7 in 2008 are matched back to all available years of NPD, the table shows the proportion of pupils who retain the same postcode throughout:

No matched UPN
Same postcode
Different postcode
Year 6 (2007)
Year 5 (2006)
Year 4 (2005)
Year 3 (2004)
Year 2 (2003)
Year 1 (2002)

Data Sources

The complete set of past and presents postcodes,Northings and Eastings for England are available at




Allen, R. (2007) Allocating pupils to their nearest school: the consequences for ability and social stratification, Urban Studies, 44(4)751-770.

Allen, R., Burgess, S. and Key, T. (2010) Choosing secondary schools by moving house: school quality and the formation of neighbourhoods, CMPO working paper No. 10/238.

Allen, R. and West, A. (2010) Why do faith secondary schools have advantaged intakes? The relative importance of neighbourhood characteristics, social background and religious identification amongst parents, British Educational Research Journal, forthcoming.

Burgess, S., McConnell, B., Propper, C. and Wilson, D. (2004) Sorting and choice in English secondary schools, CMPO working paper No. 04/111.

Harland, K. and Stillwell, J. (2007) Commuting to School in Leeds: How useful is the PLASC?, School of Geography, University of Leeds, Working Paper 07/2.

Harland, K. and Stillwell, J. (2007) Using PLASC data to identify patterns of commuting to school, residential migration & movement between schools in Leeds, School of Geography, University of Leeds, Working Paper 07/3.

Jivraj, S. and Marquis, N. 2009, "A comparison of internal migration data derived from the Pupil Level Annual School Census with the National Health Service Central Register and 2001 Census data" Centre for Census and Survey Research, The University of Manchester, Working Paper 2009-04

Marquis N. and Jivraj, S. 2009, "Preparation of Pupil Level Annual School Census data for the analysis of Internal Migration" Centre for Census and Survey Research, The University of Manchester, Working Paper 2009-03

Singleton, A.D., Longley, P.A., Allen, R. and O’Brien, O. (2010) Estimating secondary school catchment areas and the spatial equity of access, Computers, Environment and Urban Systems, forthcoming.

Watson, J. and Church, A., The Social Effects of Travel to Learn Patterns - A Case Study of 16-19 Year Olds in London, Local Economy 24:5 (August 2009), pp. 389-414.