Teaching | Conceptual foundations: sharing the findings from the implementation of Gateway questions

Further to writing an overview of Gateway questions, a method of formative assessment that seeks to anticipate knowledge gaps by creating a sequence of multiple choice questions that focus in on a particular curricular concept (which can be found here), and a follow up blog to consider the implications of such an approach, as well as the problematic nature of assessing knowledge in a subject discipline such as English here, this is the final blog post of the series which explores the findings of the study and scope for further investigation.

Could Gateway questions act to assess the learning experience of a conceptual curriculum?

Whilst Wiliam (2015) asserted several key principles for forming multiple choice questions, this study seeks to establish principles for a different type of assessment question, entitled ‘Gateway questions’ to ascertain if they would be more fitting for the nature and construct of particular curricular subjects, or indeed, when looking to find fitting formative assessment to use at a classroom level for English, we might seek to measure plausibility of response as somewhat of a spectrum, as opposed to a definitive positive response. The study also sought to highlight the limitations of assessment data in schools, where assessment can often fail to align with the nature of the subject discipline. The concepts assessed regularly feature within statutory examination materials, and so this approach could be argued as a move away from more superficial, disjointed assessment and take us closer to a methodology that fits in with the day to day operation of school. (Allen, 2018).

I found more and more that the process of design was fundamental due to the nature of English as a subject, and trying to remain authentic to this throughout the design process. Bringing expert teachers together to debate out exactly what would be deemed as the key moments of a text, alongside where our own experiences of student knowledge gaps or misconception might have differed, was vital in agreeing a common language, and if I were to scale up such a study to use at departmental level, I would emphasise that drawing upon teacher expertise at the point of pre-design to establish such key moments as an important initial step. 

Data findings included:

  • Feedback summaries from the expert teacher discussion
  • Data and pattern identification of one sample quiz of the five quizzes completed
  • Data from the student perception survey

I compiled a process map which was informed by the key findings of Bartlett (1932) and Sweller (1998), drawing upon the importance of an established schema but also that cognitive load is a key consideration when committing new information to memory. I then used this as a blueprint to break down the preparation into a series of steps for my implementation, with thanks to David Goodwin:

I compiled what I believed to be common knowledge gaps that students identify when studying Priestley’s An Inspector Calls, but to ensure that the assessment would be reliable, consulted a wider network of English teachers to establish a shared common list.

This prompted me to determine what in fact a knowledge gap was, and that knowledge gaps were exposed by students as opposed to misconceptions: due to the complexity of the particular concepts that I wanted to explore- gender, social class- it was not always presented as a distortion in prior learning, but an unknown which perhaps acts as a fundamental component required for an improved conceptual understanding.

I then mapped out the various conceptual moments chronologically across the play- where there is an example in the text that students tend to commonly misunderstand, as a result of untaught knowledge in respect of the particular concept:

Partial conceptual map; chronological plot at the bottom, with key moments plotted out that expose the thematic ideas of the text.

These moments were defined as key character behaviour or interactions which would be deemed by a teacher as relevant to that particular concept. To approach the study through the lens of a project premortem (Klein.G., 2007), I focused on the gender conceptual thread for this study to ensure a narrow focus and small scale study for data collation. Whilst this could have implications upon the volume of data collated and analysed, arguably, it also ensured increased success towards establishing a set of principles for the design of the question, but also made data analysis of patterns a more iterative process. 

The next step was to present the quiz designs to a group of expert teachers, as a way of ascertaining how effectively I have identified anticipated knowledge gaps for students, but also to establish how plausible the responses were.

This discussion highlighted the nuance of social class when teaching literature, and how the teacher’s perception of class as a construct would influence the teaching of such a unit. The discussion also highlighted the impact of plausible distractors- that, even an experienced teacher could respond incorrectly as a consequence of the vocabulary choices made. This discussion led me to establish a series of principles for Gateway questions. Building upon the work of Brame (2013), I outlined that Gateway questions should seek to:

  • Expose a knowledge gap in connection to a particular concept explored within the text
  • Use sentence stems which, wherever possible, utilise quotations from the text to objectively frame plausible interpretation;
  • Pose four plausible responses that may move along a spectrum of plausibility, with one definitive response but depending upon the student’s understanding of the concept as a whole, other responses could be plausible, but arguably plausible as opposed to most plausible;
  • If constructed as part of a sequence of Gateway questions across a text or unit, should provide ‘Gateways’ of knowledge which means if a student can answer an earlier question correctly, this will increase the likelihood of getting a later question correct;
  • Provide the teacher with a reteach moment, leading to more informed, responsive teaching. 

Data has been collated from one of the five quizzes that explored the concept of gender within the play, which was a twelve question quiz. I segmented the quiz down into three components, to better measure the ‘Gateway’ nature of the quiz. A student would be scored ‘high’ for a 1-4 response count of 3 or more out of 4. 

Name:1-4 5-8 9-12 High 1-4 response
Student a133LOW
Student b133LOW
Student c131LOW
Student d142LOW
Student e233LOW
Student f343HIGH
Formative assessment: first sitting

All students completed the quiz for at least two sittings, at least two weeks apart providing an aggregated score to make comparisons against in regards to improvement. 100% of participants, either maintained or improved their score for the second testing episode. Where students only maintained their score, these were the lowest scoring students of the dataset. This could have been as a result of completing the test before and then utilising the reteach, or it might have been that they accessed revision materials, or returned to notes. Students were not informed that they would sit the assessment again at a later date, to help to mitigate against manipulation of scores.

Name:1-4 5-8 9-12 Improved overall scoreHigh 1-4 response
Student a442HIGHERHIGH
Student b324HIGHERHIGH
Student e443HIGHERHIGH
Student f444HIGHERHIGH

In addition, 100% of students demonstrated that if they scored positively for the earlier Gateway questions (‘earlier’ defined as the first component of three, questions 1-4), then there was an increased probability that they would be able to respond correctly to later questions. This inference was drawn as a result of looking at test 1, where students did not accumulate a score, which meant that the earlier (questions 1-4) score was low, and their later scores did not increase. For instance, 100% students scored 75% or less for their earlier component score for the first sitting of the assessment, with 83% scoring 50% or less. In comparison, in test 2, all students scored higher for their earlier score (questions 1-4), and all maintained or experienced an increase in the accumulation: students that scored higher for their earlier section, scored a higher total assessment overall.Whilst a small sample size, there are several inferences possibly to draw from the data itself.

We might infer from this dataset that using Gateway questions can aid student understanding around the key moments that we need them to understand to access later parts of a text or unit. The increased scores from one sitting to the next may indicate that the reteach episode has provided students with the required information that when retaking the assessment, aided their improved score. In addition, the increased likelihood of a higher score seemed to be indicative of the earlier score of the student: the more likely they scored positively in the earlier section of the test, the higher they scored overall. 

In the perception evaluation, students were asked a series of questions using NPS rating, in a bid to reduce a passive response. Students were provided with a 1-10 rating, with positive classified as 7 or higher, and detractor classified as 4 or lower. 100% of students rated 4 or less in response to the statement, ‘the quizzes were challenging,’ yet only one student scored 100% in the quiz itself. Only 20% responded positively to the statement, ‘The An Inspector Calls quizzes helped to improve my understanding of gender as a theme,’ yet 80% responded positively to the statement ‘quizzes like this would aid my understanding of other texts or units studied in English.’ 60% gave a positive score to the statement, ‘The description given when I got a wrong answer helped me when completing the quiz again at a later date.’ 

We can draw several information considerations in regards to the perception data, however, it is worth noting that the student completion rate was low, and only completed by those that actively engaged with the quiz itself. This might mean that lower scoring students would provide differing feedback, which would in turn distort these findings. However, using the data available, it appears that whilst it might aid a teacher to map conceptually, this might be too cognitively impactful for students to view as beneficial, as threshold concepts are incredibly complex (Meyer, J.H.F. and Land, R. 2005). Additionally, I would assume that the reteach narrative was a helpful principle within the design of Gateway questions to aid students when looking back on their scores, so that students not only deepen their understanding of the text, but also of assessment literacy surrounding the text, through a development of evaluative expertise (Carless, D. 2015). 

Gateway questions might be effective as a holistic strategy within the classroom, to help the teacher to make predictions of where students will encounter knowledge gaps, but also to help in providing students with valuable formative feedback through a re-teach episode, therefore improving student outcomes when an assessment is revisited in the future.

This exploration highlighted how little literature is available to English teachers in regards to common knowledge gaps; this is perhaps due to the nature of the subject- it relies upon historical contextualisation, knowledge of social constructs over time, and in this instance, a grasp of the nuances of gender inequality itself, in addition to the process of literature interpretation being, on the whole, rather subjective. However, Gateway questions possibly act to crystalise a selection of key moments within a test, to help teachers to draw student attention to particular concepts, but also to help them to determine what they spend more or less time focused upon in the classroom, teaching in a more iterative way than perhaps before. Perhaps by building foundations of such moments, it enables there to be a more distilled demonstration to the somewhat perplexing nature of analysis and interpretation for students.

Ultimately, this affirms the importance of expert subject specialists to make such choices in what should be taught; a secure subject-level curriculum where such concepts will be revisited and recognised as familiar by students; and scope to design assessment that is perhaps more fitting to the slightly more subjective aspects of the subject itself. 

Key takeaways:

  • Gateway questions can aid students in addressing conceptual knowledge gaps across a text or unit;
  • A spectrum of plausibility may be a more effective way of assessing student knowledge; 
  • Formatively assessing students with an explicit reference to overarching concepts may yield benefits in the longer term study of our curriculum.


Didau, D. & Rose, N. (2016) What Every Teacher Needs to Know about Psychology. John Catt. 

Atkinson, R. C., & Shiffrin, R. M. (1968). Chapter: Human memory: A proposed system and its control processes. In Spence, K. W., & Spence, J. T. The psychology of learning and motivation (Volume 2). New York: Academic Press. pp. 89–195.

Soderstrom, N. C., & Bjork, R. A. (2015). Learning versus performance: An integrative review. Perspectives on Psychological Science, 10(2), 176-199.

Rosenshine, B. (2003). High-stakes testing: Another analysis. Education Policy Analysis Archives, 11(24), 1–8.

Van der Vinne, V., Zerbini, G., Siersema, A., Pieper, A., Merrow, M., Hut, R. A., … & Kantermann, T. (2015). Timing of examinations affects school performance differently in early and late chronotypes. Journal of biological rhythms, 30(1), 53-60.

Allen, R. (2018) What if we cannot measure progress, accessed at https://rebeccaallen.co.uk/2018/05/23/what-if-we-cannot-measure-pupil-progress/ on 28th December 2020.

Yeh, S. S. (2007). The Cost-Effectiveness of Five Policies for Improving Student Achievement. American Journal of Evaluation, 28(4), 416–436. https://doi.org/10.1177/1098214007307928

Blake, H., (2020) Three Hurdles Students Must Clear to Maximize Assessment’s Benefits, available at https://theeffortfuleducator.com/2020/05/20/hurdles/ 

Robertson, A., (6th October 2020)  #045 – Tim Oates: Myth and misunderstanding of assessment in England[\045]. In The Centre of Education and Youth podcast. https://cfey.org/podcast/045-tim-oates-myth-and-misunderstanding-of-assessment-in-england/

Christodoulou, D. (2017) Making Good Progress? The Future of Assessment for Learning. Oxford University Press. 

William, D. (2011). Formative assessment: Definitions and relationships. Studies in Educational Evaluation, 37(1), 3-14.

Harlen, W. (2006). On the relationship between assessment for formative and summative purposes. Assessment and learning, 2, 95-110.

Hendrick, C., & Macpherson, R. (2017). What Does this Look Like in the Classroom?: Bridging the Gap Between Research and Practice. John Catt Educational Limited.

Wiliam, D. (2015). Designing great hinge questions. Educational Leadership, 73(1), 40-44. 

Christodoulou, D. (2016) Why  speaking at the Festival of Education, Wellington College, accessed at https://www.youtube.com/watch?v=qLpAalDaqQY  

J Sweller, ‘Cognitive load theory’, in ‘Psychology of Learning and Motivation’, Volume 55, 2011, pp. 37–76

McDermott, L. C. (1991). Millikan Lecture 1990: What we teach and what is learned—Closing the gap. American journal of physics, 59(4), 301-315.

Fletcher-Wood, H., (2019) Ensuring students respond to feedback: Responsive Teaching 2019 update (blog). Retrieved 29th February 2021 from https://improvingteaching.co.uk/2019/06/30/ensuring-students-respond-to-feedback-responsive-teaching-2019-update/ 

Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional science, 18(2), 119-144.

Smith, C. D., Worsfold, K., Davies, L., Fisher, R., & McPhail, R. (2013). Assessment literacy and student learning: the case for explicitly developing students ‘assessment literacy’. Assessment & Evaluation in Higher Education, 38(1), 44-60.

Bjork, E. L. and Bjork, R. A. (2011) ‘Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning’, in M. A. Gernsbacher, R. W. Pew, L. M. Hough and J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (2nd edition). (pp. 59-68). New York: Worth. 

BARTLETT, FREDERIC C. 1932. Remembering. Cambridge, Eng.: Cambridge University Press.

Sweller J (1998) Cognitive load during problem solving: Effects on learning. Cognitive Science (12): 257–285.

Klein, G. (2007). Performing a project premortem. Harvard Business Review, 85(9), 18-19. 

Brame, C. (2013) Writing good multiple choice test questions. Retrieved from https://cft.vanderbilt.edu/guides-sub-pages/writing-good-multiple-choice-test-questions/.

Carless, D. (2015). Excellence in university assessment: Learning from award-winning practice. Routledge.

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.