Peer discussions and response technology: short interventions, considerable gains
- Side: 19-30
- DOI: https://doi.org/10.18261/issn.1891-943x-2017-01-02-03
- Publisert på Idunn: 2017-06-21
- Publisert: 2017-06-21
- Creative Commons (CC BY-NC 4.0)
Student response systems are commonly used in combination with peer discussions during lectures. Research has shown that the number of correct answers increases when the same question is re-asked after discussion. This may occur because unconfident students copy the answer from their peers. To preclude this, the authors added a second, similar question to answer individually, disguised as a new case. The authors found a Cohen’s d effect size of 0.66 (N: 147) for eight valid interventions which is 65 percent above the average effect of interventions aimed at increasing student performance.Keywords: Clickers, Lecture, Peer discussions, Student response system, Higher education
Several studies have found lecturing as a teaching method to be ineffective in promoting student learning compared to student-active ways of teaching (Deslauriers, Schelew, & Wieman, 2011; Hake, 1998; Hrepic, Zollman, & Rebello, 2007; Knight & Wood, 2005; Yoder & Hochevar, 2005). However, digital tools such as Student Response Systems (SRS) can be used to increase student activity in traditional lecture settings (Blasco-Arcas, Buil, Hernandez-Ortega, & Sese, 2013; C. Campbell & Monk, 2015; Graham, Tripp, Seawright, & Joeckel, 2007; Heaslip, Donovan, & Cullen, 2014). SRS enables students to answer multiple choice questions by using a wireless remote control, called “a clicker”. The students’ answers can then be displayed in a histogram on a big screen for the lecturer and students to see.
When using this technology for formative purposes, students are commonly asked to discuss with their peers and answer subject-related questions during lectures. The lecturer typically follows up on the student answers and provides them with his own explanations. Previous studies have found that such approaches can be useful for creating opportunities for formative assessment and self-monitoring (Egelandsdal & Krumsvik, 2015; Krumsvik & Ludvigsen, 2012; Ludvigsen, Krumsvik, & Furnes, 2015), increasing student attention (Cain, Black, & Rohr, 2009; Rush et al., 2010; Sun, 2014), promoting student engagement (Blasco-Arcas et al., 2013; C. Campbell & Monk, 2015; Graham et al., 2007; Heaslip et al., 2014) and enhancing student retention and performance (Boscardin & Penuel, 2012; J. Campbell & Mayer, 2009; Kay & LeSage, 2009; Keough, 2012; Lantz, 2010; Mayer et al., 2009; Nelson, Hartling, Campbell, & Oswald, 2012; Shaffer & Collura, 2009). However, some parts of such interventions might be more useful for these purposes than others.
As a backdrop for this study, we surveyed a group of first year psychology students on their perception of the feedback resulting from different parts of clicker interventions (N: 173). These students had participated in five two-hour lectures on Qualitative Methods (each lecture comprising 4–6 clicker interventions) during the spring semester of 2014. At the last lecture they were asked, using the response system, if any of the following parts of the intervention had contributed to improving their content understanding: a) the use of clicker questions and visualization of answers, b) the peer discussions or c) the lecturer’s follow up of the clicker questions. The students could choose between different combinations of these three alternatives or answer that no parts of the interventions had contributed to improving their content understanding. The student answers showed that 93 % of the students found some part of the clicker interventions useful for improving their content understanding. Seventy-three percent found the use of clicker questions and visualization of answers useful for improving their content understanding, while 84 % experienced the lecturer’s follow-up of the clicker questions as useful for this purpose. The least favored part was peer discussions which 61 % of the students experienced as useful for improving their content understanding. We found this curious since previous studies have indicated that the use of peer discussions in lectures promotes student learning (Crouch & Mazur, 2001; Mazur, 1997; Porter, Bailey Lee, Simon, & Zingaro, 2011; Rao & DiCarlo, 2000; E. L. Smith, Rice, Woolforde, & Lopez-Zang, 2012; M. K. Smith et al., 2009).
Hence, in light of these findings we investigated the effect of peer discussions on student learning by conducting a quasi-experiment as an integrated part of a three-hour lecture (3x45 minutes) on Qualitative Methods in the spring of 2015. This was done to answer the research question:
Do peer discussions in combination with clicker questions at a university lecture enhance students’ content understanding?
Studies on the testing-effect (Roediger & Karpicke, 2006) have shown that the mere use of questions in instruction can enhance students’ retention, and the findings of J. Campbell and Mayer (2009) confirm this in two lab experiments combining lecturing with clicker questions. The same study also found increased student performance on a knowledge transfer test compared to a control group. Another study by Mayer et al. (2009) found that the use of clicker questions in lectures increased students’ exam performance by 1/3 of a grade compared to lectures using questions without clickers and lectures without questioning.
Although these studies show that the use of questioning in itself can have a positive effect on student learning, the inclusion of peer discussions might be even more beneficial. Several studies have shown that peer discussions during lectures increase student performance on conceptual questions when the same questions are put to a revote after discussion (Crouch & Mazur, 2001; Mazur, 1997; Rao & DiCarlo, 2000; E. L. Smith et al., 2012). However, such an improvement might occur simply because less knowledgeable students copy the answers from their peers without any learning taking place.
This issue was addressed by M. K. Smith et al. (2009) in a study carried out in an introductory genetics course. Using a quasi-experimental design, data were collected throughout the semester as an integrated part of 50-minute classes. In total, 16 interventions were conducted. In each intervention, the students were asked to respond individually to a conceptual question before discussing and re-answering on the same question. The students were then asked a second isomorphic question and answered individually. According to M. K. Smith et al. (2009, p. 123), “Isomorphic questions have different ‘cover stories’, but require application of the same principles or concepts for solution”. In other words, the second question requires the same conceptual understanding to elicit a correct response, but it is disguised as a new case, thereby precluding the students simply copying the answer from their peers. Their findings showed that the peer discussions on average increased the percentage of correct answers by 21 % on the second isomorphic question. Influenced by the research design of M. K. Smith et al. (2009), we used a similar design to study whether peer discussions enhance students’ content understanding in a qualitative methods lecture.
Research design and methods
During a three-hour lecture, we employed ten interventions in which the students were asked isomorphic question pairs covering the main themes of the subject. In each intervention, we first posed a question that the students answered individually using the “clickers” (Q1). Then the students discussed their answers for two minutes with their peers before re-answering the same question (Q1ad). Afterwards we posed a second question that required approximately the same understanding to answer correctly disguised as a new case (Q2). The students answered this question individually in order to measure the difference in student answers pre- (Q1) and post- (Q1ad and Q2) peer discussion. The interventions were conducted at the end of each lesson: three at the end of the first lesson, four at the end of the second lesson and three at the end of the last lesson.
The questions addressed different practical cases (presented by text), requiring students to apply their content understanding to interpret and answer. The topics of the questions pertained to research design, research questions, research methods, validity, and reliability within the subject of qualitative methods. Three different kinds of questions were used in the question pairs, as illustrated in Table 1 (correct answers are highlighted). The first kind consisted of the same questions used on two different cases in Q1 and Q2 with the same answer alternatives (used in pairs 1, 4, 5, 7, 9 and 10). The second kind consisted of the same question in Q1 and Q2, but with different answer alternatives (used in pairs 2 and 3). The third kind consisted of the same question in Q1 and Q2 with different cases and answer alternatives (used in pairs 6 and 8). The questions were constructed for the quasi-experiment and had not been previously used. However, the question types were similar to the ones used in the previous qualitative methods lectures in 2014 and 2015.
We also surveyed the students about their attitude toward peer discussion at the beginning of the lecture, and about how they perceived the peer discussions. Answers were given on a five-point likert scale.
Focus group interviews were conducted two days after the quasi-experiment in order to validate the findings. In these interviews, the students were provided with preliminary results and asked to comment on these, focusing particularly on anomalies.
The population studied comprised psychology students at the University of Bergen attending a three-hour lecture (including two 15 minute breaks) in Qualitative Methods as a part of a one-year introductory program. This was the last of five lectures given on the subject. The total number of students participating in the interventions declined slightly during the three-hour lecture: 203 students participated in the first intervention and 171 students participated in the last one. 145 students participated in all ten interventions. 24 of the participating students were interviewed in focus groups two days after (8 students per group).
Data from the quasi-experiment were analyzed in SPSS. The analysis used a paired sample T-test comparing the pre- (Q1) and post-question (Q2) for each question pair. We also created a sum index variable for all Q1- and Q2-questions and compared the means. In addition, we created a sum index variable for all variables except question pairs 1 and 8 – which were excluded because they were not considered valid. Average percentages of correct answers per class were calculated from the sum index mean values for Q1, Q1ad and Q2 for the 8 valid question pairs. A Cohen’s d analysis was conducted on each question pair and the sum indexes to calculate effect sizes2 Using Dr. Lee A. Becker’s effect size calculator: http://www.uccs.edu/~lbecker/.
The survey data were analyzed using descriptive statistics, and Pearson Correlation was used to consider the relationship between the survey variables and an index variable for the difference in student performance between the Q2- and Q1-questions for the 8 valid question pairs.
Prior to data collection, the ten isomorphic question pairs were read by two fellow researchers and a research assistant (psychology student) and adjusted based on their feedback. In order to eliminate our own preferences, we randomly assigned the order of Q1 and Q2 in each question pair.
After the data was collected, question pairs 1 and 8 were deemed invalid because Q1 and Q2 were not as isomorphic as intended.
In pair 1, the case we used in Q1 resembled a case used in a previous lecture. Several of the students in the focus group claimed that they remembered the “correct” answer from this lecture, and therefore chose this alternative when answering Q1. However, this answer was incorrect in our case (and 46 % of the students chose it). We believe that this interference resulted in an extraordinarily large effect size (Cohen’s d: 1.41) because the case in Q2 did not cause the same confusion (86 % of the students answered correctly on Q2 as opposed to 29 % on Q1).
Question pair 8 was deemed invalid because alternative 1 in Q2 was considered too viable to be a clearly incorrect answer. This was illustrated both by a clear split in the students’ answers between alternatives 1 and 4 (the latter being the correct one) and by several of the focus group students who stated that they selected alternative 1 immediately without reading the other alternatives because it seemed so likely that it was the correct one.
The other question pairs showed no anomalies and were not considered problematic by the students in the focus groups. Thus, 8 question pairs were included in a valid index variable. However, for transparency, our result section presents both valid and non-valid questions. The non-valid question pairs and index are labeled in Table 2.
|Pair||Q1 M (SD)||Q2 M (SD)||Mean difference: Q2 – Q1 (SD)||T||Cohen’s d||N|
|1 (not valid)||0.29 (0.45)||0.86 (0.35)||0.57 (0.59)||13.86***||1.41||203|
|2||0.58 (0.50)||0.67 (0.47)||0.10 (0.64)||2.10*||0.19||200|
|3||0.72 (0.45)||0.88 (0.32)||0.16 (0.48)||4.66***||0.41||202|
|4||0.55 (0.50)||0.75 (0.43)||0.21 (0.56)||5.17***||0.43||199|
|5||0.71 (0.45)||0.97 (0.18)||0.26 (0.48)||7.50***||0.76||200|
|6||0.79 (0.41)||0.87 (0.34)||0.07 (0.47)||2.26*||0.21||203|
|7||0.65 (0.48)||0.90 (0.30)||0.25 (0.54)||6.15***||0.62||175|
|8 (not valid)||0.80 (0.40)||0.67 (0.47)||–0.13 (0.58)||–3.01**||–0.30||176|
|9||0.57 (0.50)||0.48 (0.50)||–0.09 (0.65)||–1.87||–0.18||174|
|10||0.71 (0.46)||0.70 (0.45)||–0.01 (0.61)||–0.26||–0.02||171|
|Sum index: Pairs 1 to 10 (not valid)||6.49 (1.69)||7.83 (1.55)||1.34 (1.80)||9.00***||0.83||145|
|Sum index: Pairs 1 and 8 excluded||5.39 (1.45)||6.31 (1.33)||0.92 (1.56)||7.12***||0.66||147|
***p < .001, ** p <.01, *p<.05
Considering the valid findings presented in Table 2, six out of eight interventions had a statistically significant effect on student performance. These interventions were conducted in the first (2 – 3) and middle (4–7) parts of the lecture. The interventions in the last part (9–10) showed no statiscally significant effect. In total, the sum index for the 8 valid interventions had an effect size of 0.66 (Cohen’s d).
The mean difference for the 8 valid Q1- and Q2-questions was 12 %, while the mean difference for the Q1- and Q1ad-questions was 26 % (see Fig 1). However, the histogram showing the student answers on the Q1 questions was displayed in a brief flash (half a second) before changing to Q1ad. Thus, the change in correct answers in Q1ad might have been affected by this. However, this is not likely to have influenced the students’ answers to the Q2 questions, since they reperesented different cases.
As illustrated in Fig. 2, we only found a one percent difference in student improvement (Q2–Q1) between the four easiest and the four most difficult question pairs (based on the number of correct answers to Q1). The mean difference between Q1 and Q2 for the four most easy questions pairs was 12 %, while the mean difference between Q1 and Q2 for the four most difficult question pairs were 13 %.
Prior to the interventions, the students were asked to what extent they had started studying on their own in the subject besides attending lectures and seminars (M: 3.21, SD: 1.13, N: 207) and to what extent they believed peer discussions in lectures helped them learn the subject matter (M: 3.64, SD: 1.13, N: 206). Using Pearson Correlation we found no statistically significant relationship between these two variables and the degree of improvement of students’ performance on the Q2-questions (Q2–Q1). After the interventions, the students were asked to what extent they experienced learning from the peer discussions in which they had participated (M: 3.89, SD: 1.11, N: 174). Seventy-four percent of the students experienced learning from the peer discussions to an above moderate extent, 13 % to a moderate extent, and 13 % to a less than moderate or no extent. The Pearson Correlation showed a positive weak correlation (r: .17, p: .05, N: 145) between this variable and the degree of improvement of students’ performance on the Q2-questions (Q2–Q1).
The students were also surveyed about the quality of the discussions. A total of 86 % answered that they spent time justifying and explaining their answers in the discussions. Eight percent answered that they only superficially discussed which alternative they should choose, but did not discuss why, and six percent answered that they did not participate in the discussions.
Discussion and conclusion
In concurrence with M. K. Smith et al. (2009), we found that peer discussions do improve student performance on isomorphic questions (Q2) even though the subject area and the type of questions posed in our study were different. However, the mean difference between the Q1- and Q2-questions in our study is lower. While M. K. Smith et al. (2009) found a mean difference of 21 % between the Q1- and Q2-questions, we found a mean difference of 12 %. This discrepancy might be due to several factors. For one, differences between the subjects, and thus the questions asked, in the two studies are likely to affect the outcome. In a subject like “qualitative methods”, the difference between correct and incorrect answers is usually more debatable than in a subject like “genetics”. Thus, new knowledge might be harder to transfer from one case to another in our study. Considering that our questions made the students use their content understanding on different practical scenarios, variations in contextual factors between the Q1- and Q2-cases might have influenced the students’ answers as well. A student from the focus groups illustrated this by arguing for a theoretically incorrect alternative as the most practical, sound choice based on his interpretation of a Q2-case.
A second factor might be the level of difficulty of the questions. In the study by M. K. Smith et al. (2009) an average of 52 % of the students answered correctly on the Q1-questions, while the average was 67 % in our study. Thus, there is less room for improvement on Q2-questions in our study. M. K. Smith et al. (2009), for example, found a mean difference between Q1 and Q2 of 32 % on their most difficult questions. On the other hand, we only found a one percent difference in student improvement between the four easiest (average of 75 % correct on Q1) and four most difficult questions (average of 58 % correct on Q1) in our study.
A third influential factor might be the length of the lecture. Studies show that students’ attention varies during lectures (Wilson & Korn, 2007) and decreases as a function of time (Risko, Anderson, Sarwal, Engelhardt, & Kingstone, 2012). In our focus groups, several students stated that they were tired at the end and had trouble concentrating and carefully reading the “clicker” questions. Although studies indicate that “clicker” interventions can help students maintain their focus (Cain et al., 2009; Rush et al., 2010; Sun, 2014), three hours is a long time span for a lecture. Hence, a decrease in the students’ attention and concentration might explain why the last three interventions showed no statistically significant effects.
Regardless, a mean difference of 12 % between Q1 and Q2 indicates that several students did improve their content understanding through the discussions in a way that enabled them to interpret and use their new knowledge on new cases. This argument is strengthened by a Cohen’s d effect size of 0.66 for the 8 valid interventions. As noted elsewhere, “Bialo & Sivin-Kachala (1996) and Kulik & Kulik (1991) have observed that an effect size ≥ 0.3 is commonly considered educationally meaningful” in educational technology studies (Spector et al. 2014, p. 10417). Relative to Hattie’s (2009) average effect size value of 0.4 related to student achievement resulting from interventions, the peer discussions in our study resulted in an effect size that is 65 % above average. In other words, the effect of the peer discussions can be considered quite considerable, especially taking into account that each intervention lasted for only three minutes (one-minute individual answers and two minutes of discussion).
So if peer discussions are so effective, why did only 61 % of the students in our 2014 survey experience peer discussions as useful for improving their content understanding? One explanation could be that the discussions were simply more effective in our 2015 study than in the 2014 lectures. For instance, 74 % of the students answered that they experienced learning from the peer discussions to an above moderate extent in the 2015 survey. Although this result is hard to compare with the student answers from 2014 since the students did not answer the same survey question3The 2014 survey adressed students’ perceptions of the most useful part of several interventions conducted over five lectures, and the 2015 survey addressed how students experienced the peer discussions in the quasi-experiment conducted at one lecture., it seems to indicate that the participants in the 2015 study did perceive the peer discussions as more useful than the students in 2014. On the other hand, the increased used of, and focus on, peer discussions in the 2015 lecture might have had a positive influence on the student opinion in this survey. Yet as we did not measure the effects of the interventions in 2014, the effect of these interventions compared to the ones in the 2015 study remains an open question. We suspect, however, that the effectivness of the discussions were at least remotely similar since we used the same kind of clicker questions on the same topics both years with two similar student groups (first year psychology students).
Another aspect may be that there is a discrepancy between the students’ subjective experience of learning and their actual learning outcome. In our 2015 study, we only found a weak positive relationship between the students’ experience of learning from the peer discussions and actual improvement of their performance on the Q2-questions. Hence, some students might have benefited more from the discussions than they realized. On the other hand, some students might also have benefited from the peer discussions less than others. In a study analyzing 361 recorded peer discussions of clicker questions, James and Willoughby (2011) found that the students in 62 % of the conversations either discussed incorrect ideas (not captured by the lecturer’s multiple choice questions) or gave a clicker answer that was inconsistent with the ideas that they discussed. In another study, Egelandsdal and Krumsvik (2015) found that slightly above half of the student group perceived peer discussions of clicker questions as useful in supporting their self-monitoring. Their focus group interviews revealed two major challenges related to the discussion sequences: (1) not all students had peers to discuss questions with and (2) the quality of the discussions varied. These findings indicate that benefits from peer discussions are related to student commitment and engagement, and most likely, the peers with whom they are engaging.
In our 2015 survey, only six percent of the students reported not having anyone to discuss with and eight percent experienced the discussions as superficial. However, the amount of students participating in the lecture declined by 16 % during the three hours. Hence, some students may have left because they did not experience the interventions as useful. Moreover, the lecture was not mandatory and some students did not attend at all. This illustrates a need to study how clicker questions (with different levels of difficulty) and peer discussions affect the learning of students with different studying habits.
In conclusion, on the eight valid post-questions (Q2) there is an average improvement of 12 %. It is unlikely that this improvement occurred by chance since the order of the questions was randomly assigned. We also found a Cohen’s d effect size of 0.66. This is 65 % above the average outcome of interventions aimed at promoting student achievement (see Hattie 2009). These findings indicate that the interventions contributed considerably to enhancing the students’ content understanding. Consequently, our findings support previous research showing that the use of clicker questions in combination with peer discussions promotes student learning (Crouch & Mazur, 2001; Mazur, 1997; Porter et al., 2011; Rao & DiCarlo, 2000; E. L. Smith et al., 2012; M. K. Smith et al., 2009). However, we are aware of the small scale nature of our study and that our findings only show immediate learning gains. We recommend that further research considers whether, and how, such immediate learning gains are related to students’ long term learning in a broader context. There is also a need to study diverse question types (including the use of video cases) with different levels of difficulty in various subject areas when it comes to the use of isomorphic questions, preferably making use of control groups to validate the questions and compare results. In addition, it is important to note that we did not study peer discussions in conjunction with the follow-up by the lecturer. To get a fuller picture of the benefits of clicker interventions, it is necessary to study these parts in relation to one other. Some studies have indicated that combining peer discussions with lecturer follow-up can increase student performance even further (M. K. Smith, Wood, Krauter, & Knight, 2011; Zingaro & Porter, 2014).
|1||Translated from Norwegian.|
|2||Using Dr. Lee A. Becker’s effect size calculator: http://www.uccs.edu/~lbecker/|
|3||The 2014 survey adressed students’ perceptions of the most useful part of several interventions conducted over five lectures, and the 2015 survey addressed how students experienced the peer discussions in the quasi-experiment conducted at one lecture.|