Why predicting attainment fails – Confidence Intervals
How accurately can we set targets and predict performance of pupil’s exam attainment?
At this time of year the pressure is on teachers and leaders in school to know exactly how their young people will perform in the GCSE exams that they are currently sitting.
There is an expectation from leaders, governors and of course Ofsted to accurately track, monitor and predict pupil progress and hence exam performance.
Our school, like every other in the country, set targets for individual pupils at the start of the year and then ask the teachers to measure the pupil’s progress in comparison to this target grade, and to predict their final exam grade. These are collated and a range of performance measures for classes, subjects and the whole school are calculated.
But how accurate are these and how confident can we be in these predictions and targets? Our school has 155 young people in year 11. And some subjects have as few as 15 learners. One or two students having a bad day in the exam, or realising too late that they picked a subject that they really don’t enjoy, can have a large impact on the results. How much should we take this into account when setting targets and holding teachers and middle leaders to account?
This train of thought has led me to look at the use of confidence intervals. I’ve used the binomial proportion confidence interval to calculate the 95% confidence interval on the number of students predicted to achieve a C+ in each subject. This takes into account the size of the class.
I’ve ended up with results such as: English C+ = 79% +/- 6%. So we’re 95% sure they will end up with results between 73% and 85%. Computing, with a smaller number of students, comes in at 70% +/- 16%.
Is this an appropriate statistical measure to use?
The confidence intervals are quite large due to the small number of students involved. I’m not sure if this appropriate though? This measure is normally used when sampling a small part of a larger population. Is that what we are doing here? In that our Computing class is a small sample of the national group sitting that examination? Or are we actually sampling the whole population where that population is simply all the students studying it at our school?
If this is not that right measure of confidence to use in this case then what is?
Lots of questions that I’m hoping some of the data / statistics community can help answer. The stakes are high in schools now and the targets that departments and schools are set have high levels of accountability attached to them. It’s only fair that targets and predictions come with a suitable statistical window attached. How should that be calculated?