Meredith E. Kneavel (LaSalle University), Joshua D. Fetterman (Chestnut Hill College), Ian R. Sharp (Chestnut Hill College)
Psychology is unique among the sciences because much psychological subject matter cannot be directly observed. Psychologists often define “invisible” constructs, like emotion or cognition, in terms of observable, measurable, and agreed upon criteria. These operational definitions allow psychologists to “see the invisible” and keep psychological theories testable and falsifiable. Because of this, operational definitions are foundational methodological concepts for the field of psychology and are featured prominently in various psychology courses. Unfortunately, students often struggle to grasp the nature and importance of operational definitions and sometime find discussion of this topic dry and boring. In order to combat this, we suggest a classroom activity that demonstrates the importance of rigorous operational definitions and can also be tied to several different psychological concepts that capture student’s attention. This activity illustrates the necessity of operational definitions to students, while also engaging them in broader psychological content that is, perhaps, more reflective of their motivation for enrolling in the course. It also offers a rare opportunity to watch cartoons during class, which students (and their teachers) may appreciate.
The purpose of this demonstration is to illustrate the importance of operational definitions for behaviors and constructs in psychological research. It has been recognized that exact operational definitions of psychological concepts can be difficult (see Marx, 2010) which is the point of the exercise discussed here. The overall demonstration takes approximately twenty minutes and utilizes a Looney Tunes clip. Any clip depicting physically aggressive behavior will be sufficient though we have used Rabbit Season, Duck Season Trilogy in the past. Instructions to students consist only of ‘count the number of aggressive acts that you observe.’ And no definition of “aggressive act” is provided. The video is a little less than five minutes, and, at the end, the instructor asks students to share how many aggressive acts they recorded.
After the exercise, the instructor should gather the aggression scores and lead a discussion of how students defined aggression. It is important to record the aggression scores (mostly the lowest and highest in the range) for later in the demonstration. We recommend recording the number of aggressive acts from each student in an Excel spreadsheet, where the mean and standard deviation can be quickly calculated. If anonymity is preferred, Poll Everywhere or similar tools allow students to submit their ratings via cell phone and have the results projected to the class. Poll Everywhere can be programmed to create automatic bar graphs to illustrate the range of responses.
Following the sharing of the number of aggressive acts observed, the instructor can facilitate a discussion addressing why students recorded different scores which often leads to a discussion of how and why aggression was viewed differently. For instance, there may be a gender difference in the conceptualization of aggression. This can lead to a discussion of how researchers may operationalize aggression as physical, nonphysical or relational (Crick & Grotpeter, 1995). Following this discussion, the class can then come to a consensus about what aggression is and how it can be operationalized. At this point, it is helpful to go back to the video and ask whether certain acts are considered aggression or not. This helps to refine the class’ operational definition and can start a conversation about inter-rater reliability.
Following the agreed upon class definition of aggression, the instructor can then re-show the video and instruct the class to ‘count the number of aggressive acts observed’. The instructor can run a comparison of the ranges or standard deviations for the two sets of numbers to illustrate the spread in scores between the first trial and the second trial. Typically, after the class has agreed upon an operational definition, the range of scores is much smaller with students generally agreeing on about fifteen to twenty acts of aggression. If there are any outliers, this can lead to a very interesting discussion as it usually means that a student may have had a sudden insight about aggression that wasn’t shared in the original formulation of the definition. Illustrating the change in the range of scores highlights the importance of having an agreed upon operational definition.
This technique is primarily valuable in demonstrating the concept of operational definitions but has secondary uses in reinforcing or illustrating concepts such as gender differences in perceptions of aggression, measures of dispersion (range and standard deviation), inter-rater reliability, and difficulties in assessment and observational research. Because class time is valuable, this short activity is particularly useful as it allows the flexibility to incorporate multiple concepts into one demonstration.
Gender Differences Adaptation
Most research indicates that males are more physically and verbally aggressive than females (Archer, 2004; Card, Stucky, Sawalani, & Little, 2008; Hyde, 1984). Females tend to exhibit more relational aggression (Card et al., 2008; Ostrov & Keating, 2004), especially during the teenage years (Archer, 2004). However, overall gender difference in relational aggression is small and seems to depend on data collection methods (Archer, 2004; Card et al., 2008; Eagly & Steffen, 1986). Nonetheless, if the clip primarily depicts physical aggression (as most cartoons do), gender differences in the number of aggressive acts that students record should appear. Gender differences can be illustrated by having students count aggressive acts (as described above), or by having students make Likert scale ratings of the aggressiveness of characters or both. It is possible that gender differences may be found using one measurement technique but not the other. Furthermore, this demonstration could be modified to focus specifically on gender differences in aggression by showing two clips, one that depicts physical aggression and one the depicts relational aggression. and discuss the gender disparity.
Misinformation Effect Adaptation
Research indicates that human memory is quite fallible (Chan, Jones, Jamieson, & Albarracin, 2017; Loftus, 2005; Loftus & Pickrell, 1995), particularly where eyewitness testimony is concerned (Wells & Olson, 2003). Indeed, faulty eyewitness testimony is partly responsible for the distressingly high number of wrongfully convicted individuals who are later exonerated through the use of DNA evidence (Wells & Olson, 2003). Typically, in research on false memories, individuals are shown a video and later given incorrect information or asked leading questions about what they saw. Often people will erroneously recall the incorrect information as having come from the video (Loftus, Miller, & Burns, 1978), or reconstruct their memories of the video to be more consistent with the leading questions (Loftus & Palmer, 1974). These flaws in memory can be discussed in the context of the operational definitions activity described above. If individuals cannot agree on what they saw in the first place, it is not possible for their assessments to be accurate (in the same way that reliability is a prerequisite for validity). After making counts of the number of aggressive acts that students saw in the cartoon, half of the class could be asked to make a rating of how aggressively the individuals “fought” during the video, while the other half could be asked to make a rating of how aggressively the individuals “interacted” during the video (students should be unaware that the class has been asked two different questions until after the demonstration has concluded). Those who read the word “fought” should make higher ratings of aggressiveness due to the leading nature of the question even though all members of the class will have seen the same video.
Clinical Applications Adaptation
Inter-rater reliability is of significant importance in a variety of clinical applications. For example, evidence of poor inter-rater reliability in the administration of symptom severity outcome scales has led to negative or failed clinical trials where the treatment otherwise would have outperformed a placebo (Kobak, Feiger, & Lipsitz, 2005). In our course on Psychological Assessment, we use videos streamed from the Internet demonstrating clinician-administered, semi-structured diagnostic and severity scales (e.g., the Montgomery-Asberg Depression Rating Scale [MADRS]). In one demonstration, students are provided with the Structured Interview Guide for the MADRS [SIGMA] and asked to rate the ten items. Once the students have rated the ten items, scores are collected and an intra-class correlation coefficient (ICCs) generated. Then each item is reviewed with a discussion of discrepancies in scoring and the use of the instructor’s scores as a gold standard. This scale is particularly useful in discussing interrater reliability because the ten items requires the rater to consider the intensity, frequency, and duration of multiple constructs of depressive symptoms. Each of the ten items are rated from 0-6 and untrained undergraduate students tend to demonstrate a large range of scores within each of the items. The ten items are then discussed, and students are asked to explain how they arrived at their scores, often providing fruitful examples of why ratings differed. This is also an opportunity for students to discuss the administration of the scale and illustrate various important interviewing techniques (e.g., avoiding leading questions, clarifying ambiguous information). The cartoon video can be used in advance of the introduction of the clinical scale as a means of illustrating the importance of operation definitions. The video can be used to reinforce concepts of interrater reliability by systematically reviewing acts of ‘aggression’. The class can then go back through the video together and discuss the specific acts until there is agreement between raters.
The adaptability and utility of the demonstration spans multiple courses and can be molded to fit the number, type and level of student. The demonstration can be utilized in a research methods course or in a content specific course, such as a social psychology.
Archer, J. (2004). Sex differences in aggression in real-world settings: A meta-analytic review. Review of General Psychology, 8, 291-322. DOI: 10.1037/1089-26220.127.116.111
Card, N. A., Stucky, B. D., Sawalani, G. M., & Little, T. D. (2008). Direct and indirect aggression during childhood and adolescence: A meta-analytic review of gender differences, intercorrelations, and relations to maladjustment. Child Development, 79, 1185-1229
Chan, M. S., Jones, C. R., Jamieson, K. H., & Albarracin, D. (2017). Debunking: A meta-analysis of the psychological efficacy of messages countering misinformation. Psychological Science, 28, 1531-1546. DOI: doi.org/10.1177/0956797617714579
Crick, N. R., & Grotpeter, J. K. (1995). Relational aggression, gender, and social-psychological adjustment. Child Development, 66 (3), 710-722. doi.org/10.2307/1131945
Eagley, A. H., Steffen, V. J. (1986). Gender and aggressive behavior: A meta-analytic review of the social psychological literature. Psychological Bulletin, 100, 309-330.
Kobak, K. A., Brown, B., Sharp, I., Levy-Mack, H., Wells, K., Okum, F., & Williams, J. B. W. (2009). Sources of unreliability in depression ratings. Journal of Clinical Psychopharmacology, 29, 82-85. DOI:10.1097/JCP.0b013e318192e4d7
Kobak, K. A., Feiger, A. D., & Lipsitz, J. D. (2005). Interview quality and signal detection in clinical trials. American Journal of Psychiatry, 162(3), 628-628. doi:10.1176/appi.ajp.162.3.628
Hyde, J. S. (1984). How large are gender differences in aggression? A developmental meta-analysis. Developmental Psychology, 20, 722-736.
Loftus, E. F. (2005). Planting misinformation in the human mind: A 30-year investigation of the malleability of memory. Learning and Memory, 12, 361-366.
Loftus, E. F., Miller, D. G., & Burns, H. J. (1978). Semantic integration of verbal information into a visual memory. Journal of Experimental Psychology: Human Learning and Memory, 4, 19-31.
Loftus, E. F., & Palmer, J. C. (1974). Reconstruction of automobile destruction: An example of the interaction between language and memory. Journal of Verbal Learning and Verbal Behavior, 13, 585-589. DOI: doi.org/10.1016/S0022-5371(74)80011-3
Loftus, E. F., & Pickrell, J. E. (1995). The formation of false memories. Psychiatric Annals, 25, 720-725
Marx, M.H. (2010). Operational Definition In Weiner, I.B. Y Craighead, W. E. (Eds.), The Corsini Encyclopedia of Psychology (p. 1129). Hoboken, NJ: Wiley.
Ostrov, J. M., & Keating, C. F. (2004). Gender differences in preschool aggression during free play and structured interactions: An observational study. Social Development, 13, 255-277.
Wells, G. L., & Olson, E. A. (2003). Eyewitness testimony. Annual Review of Psychology, 54, 277-295. DOI: 10.1146/annurev.psych.54.101601.145028