Annotations written by the GSAF students

Bringle, R. G., & Hatcher, J. A. (1996). Implementing service learning in higher education. The Journal of Higher Education, 221-239. 

[Annotated by Laura Pryor]

The article discusses service learning programs in a higher education context. In this article, service learning programs are defined as “a course-based service experience that produces the best outcomes when meaningful service activities are related to course material through reflection activities,” (222). Thus, the program context in which assessment and evaluation activities are described is curricular service learning programs on university campuses.

Specifically, the authors of this article created a guidebook for implementing a service learning program and describe this guidebook in much of their article. Therefore, the article outlined two different evaluation and assessment processes: 1) The process of creating a guidebook for a university-based service learning program, and 2) The recommended assessment and evaluation processes for a university-based service learning program. The team conducting these processes was comprised of professors from Indiana Purdue University who were establishing an Office of Service Learning on Campus.

The authors were motivated to assess and evaluate service learning programs because of the recognized need to create quality service learning programs. Existing research shows that service learning has a positive impact on personal, attitudinal, moral, social, and cognitive outcomes (Boss, 1994). Thus, the authors primarily evaluated the existing state of service learning programs in order to come up with a model and guidelines for other universities to follow. To evaluate the existing programs, the authors did the following: 1) Gathered materials from regional and national service learning conferences; 2) Reviewed relevant literature; 3) Collected information from various service learning programs at various stages of implementation; 4) Collaborated with other service learning stakeholders via listserv.

Using the aforementioned techniques, the authors designed the Comprehensive Action Plan for Service Learning (CAPSL). CAPSL outlines the steps needed for service learning program implementation. These steps are broken down by the four main constituents: institution, faculty, community, and students. In addition to planning and implementation steps, CAPSL outlines key evaluation and monitoring activities for each of the four constituents (see Appendix). Through implementing CAPSL, the authors hope that universities develop service learning programs that: 1) Develop effective citizenship among their students; 2) Address complex needs in their communities through the application of knowledge, and 3) Form creative partnerships between the university and the community.


Buchan, V., Rodenhiser, R., Hull, G., Smith, M., Rogers, J., Pike, C., & Ray, J. (2004). Evaluating an assessment tool for undergraduate social work education: Analysis of the Baccalaureate Educational Assessment Package. Journal of Social Work Education40(2), 239-253.

[Annotated by Chia Okwu]

This article presents an assessment package, The Baccalaureate Education Assessment Package, (BEAP), was developed to assist undergraduate social work educators with program evaluation. The BEAP combined the Association of Baccalaureate Social Work Program Directors (BPD) and BEAP because there was acceptance from the BDP association board’s stakeholders. This assessment package includes multiple measures and multiple sampling including in the Colorado State University-University of North Dakota model.

The package consists of 6 instruments for student, graduate, and employer feedback and addresses questions related to program processes and curriculum. The BEAP objectives include assisting baccalaureate social work programs with meeting accreditation assessment standards and providing a national database for comparative purposes. In 2001, approximately 30% of baccalaureate programs nationwide were using the package.

The BEAP instruments were designed to respond to help programs answer the 5 following questions:  First, are social work education programs delivering what they say are? That is, do survey participants report that students are developing the knowledge, skills, and values the program curriculum seeks to deliver? Second, are institutions delivering the program to whom they say they are? What are the specifics of the student body? Third, what are student and alumni perceptions of program process and climate? Fourth, do students' values change during the process of their education in the major as measured by the Social Work Values Inventory. Fifth, based upon feedback from graduating students, alumni and employers, how do program graduates fare in the employment market, and do they seek additional education, licensing, and professional development? To answer these questions, the BEAP was created to be used continuously with each entering and exiting class.

In terms of assessment use, the BEAP is designed to: Ask important questions; Reflect institutional mission; Reflect programmatic goals and objectives for learning; Contain a thoughtful approach to assessment planning; Is linked to processes such as budgeting and planning; Is linked to decision making about the curriculum; Encourage involvement of individuals on and off campus; Contain relevant assessment techniques; Include direct evidence of learning; Reflect what is known about how students learn; Share information with multiple audiences; Lead to reflection and action by faculty, staff and students; Allow for continuity, flexibility and improvement in assessment.


Cavanagh, M. et al. (2014) “The Effect Over Time of a Video-Based Reflection System on Preservice Teachers’ Oral Presentations.” Australian Journal of Teacher Education 39(6).

[Annotated by Vivek Rao]

The power of video and reflection in student oral presentation training is unquestioned, but there are significant gaps in the literature related to best practices in implementing such reflection-based training. In this study, Cavanagh and his team sought to examine the longitudinal effects of iterative viewing and reflection of student presentation videos. The study was anchored in a postgraduate teaching diploma program at MacQuarie University in Australia (similar to a teaching credential program in the United States). Of 61 students enrolled in the program, 41 students completed all assignments. The study spanned one year of the program, with four presentations spaced throughout the program. This element – four presentations over time, for a total of 164 videos – afforded the researchers the chance to examine longitudinal effects of presentation reflection. The study intended to guide use of video reflection at MacQuarie to maximize student teacher benefit, and to potentially inform best practices in the Australian Professional Teaching Standards curriculum.

Cavanagh et al used a rigorous set of metrics to examine the longitudinal effect of presentation viewing and reflection. A team of five assessors viewed ten videos to calibrate a scoring rubric, and then each assessor viewed all 164 videos and scored student performance. Based on ANOVA analysis, the team determined that after two video reflections, student performance gains reached saturation. Despite the clear evidence presented, the authors positioned the study more as a research exercise rather than an assessment, so no information on course / curriculum impact could be determined. 


Feldman, A., Divoll, K. A., Rogan-Klyve, A. (2013). Becoming researchers: The participation of undergraduate and graduate students in scientific research groups. Sci. Educ. 97, 218–243. doi: 10.1002/sce.21051 

[Annotated by Beth McBride]

The setting for this study was a National Science Foundation funded interdisciplinary collaboration among geologists, microbiologists, environmental engineers, and science educators to study the natural remediation of acid mine drainage at an abandoned pyrite mine. The project had five PIs, all of whom were professors at a large, public, research-intensive university. Four of the professors were scientists or engineers, and each oversaw a research group that included undergraduate, masters’, and doctoral students, as well as practicing middle or high school science teachers. The fifth PI is a professor of science education and author of this paper. The author of this paper also has a research group, which included two of the coauthors of this paper.

This study sought to understand how graduate and undergraduate students learn to do science by participating in research groups.  This is a follow-up to a previous study on the structure of research groups. This paper approaches the topic using theory rather than a practical need for the assessment. The purpose here, from the author’s point of view, seems to be to both further knowledge in the learning sciences using communities of practice and apprenticeship models as a theoretical base, and to improve the training of future scientists. The authors note that there is a lack of research on the relationship between the work of scientists in research groups and the training scientists receive, but do not have concrete plans for improving student training or mentorship.

The focus of this research is the function of the research group in helping students become scientists. Data collected was in the form of interviews. Ten students were interviewed, with a full range of students represented (undergraduate, masters, doctoral), and there were at least two students represented from all four research groups. Interviews were one hour long, used broad, open-ended questions and utilized follow-up questions, and the coding for interviews was influenced by grounded theory.

The study found that mentoring was distributed among the research group, with more advanced students contributing to it, and that students’ expertise grows along the dimensions of methodological and intellectual proficiency.


Malinak, S.M., Bayline, J.L., Brletic, P.A., Harris, M.F., Iuliucci, R.J., Leonard, M.S., Matsuno, N., Pallack, L.A., Stringfield, T.W., & Sunderland, D.P. (2014). The impacts of an “organic first” chemistry curriculum at a liberal arts college. Journal of Chemical Education91, 994-1000.

[Annotated by Max Helix]

This project looks at the impact of a significant, program-level curriculum change at a liberal arts college (Washington & Jefferson).  The core content of the first four semesters of chemistry instruction was drastically rearranged, with the primary aim of moving exposure to organic chemistry earlier in the sequence.  Approximately 150 students a year start this core group of chemistry courses, but only about 20-25 of those are chemistry or biochemistry majors, while the majority are on track for either biology or pre-health.  The curriculum change was made with these latter students in mind.  Organic is the area of chemistry most directly relevant to biology and other life science courses, and it was hypothesized that earlier exposure will better prepare these students for their biology classes, particularly those taken very early in their college career.  The lead author is an associate professor of chemistry, and the primary aim of the paper seems to be showing that the new curriculum meets some of these goals without being significantly worse than the previous set of courses by any of their standards.  The overall study looks at a number of mostly quantitative variables (e.g., course completion rates) over a 15-year period, with the curriculum change falling approximately in the middle of this timeframe.

The broad motivation for the project is stated clearly in the paper: “to assess the new chemistry curriculum at W&J in comparison to the previous curriculum, looking specifically at the impact on the chemistry program and affiliated programs like biology and prehealth, student perceptions, and student outcomes.” (p. 995).  The results of this assessment will most directly inform whether or not the new curricular standards are retained by the chemistry department at Washington & Jefferson.  Additionally, they can potentially be generalized to other schools that are thinking of making a similar change.  However, the authors caution against overgeneralization of their results: “The goal here is not to advocate specifically for an ‘organic first’ curriculum; a conclusion such as this is out of reach because there is simply no way to account for all variables that impact student outcomes. It is possible, however, to consider student perceptions and outcomes in order to address a very simple but critically relevant question: how were various constituencies of the college—the students, the chemistry department and affiliated programs, and admissions—affected by this curricular change? Our conclusion is that the net impact has been positive.” (p. 995).

The focus of the assessment was mostly quantitative variables, with a heavy emphasis on course completion rates, enrollment in various majors, acceptance rates to professional schools, and standardized test scores (MCAT and the Major Field Test in Chemistry, which is an exit requirement for chemistry and biochem majors).  Some qualitative data was also gathered regarding student perceptions of the first semester chemistry course.  Broadly speaking, almost none of these variables showed statistically significant differences when comparing students who were exposed to the older vs. newer curricula.  The one major difference seemed to be that completion of the introductory biology course increased from about 62% with the old curriculum to around 77% since the change.  This provides evidence that at least one of the goals of this curricular change has been met, although it is also confounded with the fact that the biology course has had a different demographic make-up (fewer first-year students) since the change.  Still, it appears that overall, the new chemistry sequence at least “does no harm” compared to a more traditional sequence, and it may indeed leave students better prepared for their first biology course.


Molee, L. M., Henry, M. E., Sessa, V. I., & McKinney-Prupis, E. R. (2010). Assessing learning in service-learning courses through critical reflection.Journal of Experiential Education33(3), 239-257.

[Annotated by Laura Pryor]

This evaluative study sought to understand the use of a specific assessment tool, the DEAL model, for assessing student learning in undergraduate service learning courses. The authors were motivated to embark on this evaluation due to the lack of systematic and rigorous tools for assessing learning in service learning courses. Reflection is a key part of service learning pedagogy (Ash & Clayton, 2004). Therefore, the incorporation of a tool that uses reflection to assess student learning may fill the void in service learning assessment.

Specifically, the authors selected the DEAL model as the tool for assessing student learning through reflection.  The DEAL model incorporates three steps: 1) Describing the service learning experience; 2) Examining this experience in light of the specified learning objectives for academic enhancement, personal growth, and civic engagement, 3) Articulating learning through reflections. The evaluation included a team of researchers trained to be subject matter experts on the DEAL model, as well as the authors of the article. The team designed the evaluation in order to answer the following questions: 1) On average, what was the depth of learning and the level of critical thinking students achieved as measured by the DEAL model? 2) Does utilization of the DEAL model enhance students’ depth of learning and level of critical thinking over time within a semester? 3) Does class level influence the depth of learning and the level of critical thinking as measured by the DEAL model? 

The team collected samples from undergraduate courses at two universities – one freshman course and one upper-level course. Near the end of the semester, students submitted first drafts of their reflection papers; professors used the DEAL rubric to challenge students thinking and provide feedback. At the end of the semester, student submitted a final reflection paper. The team of ‘subject matter experts’ used rubric and consensus coding to independently score students time one and time two reflections. T-tests were used to compare first draft scores with final draft scores and assess any changes in learning. T-tests were also used to determine differences between students of different levels. Means, standard deviations, and percentages were calculated to determine overall depth of learning and critical thinking. Authors found that the assessment rubrics of the DEAL Model enabled a reflection illuminating what students are learning, how well they are learning it, and even how their learning compares to learning in other pedagogies. Therefore, results from this evaluation were used to promote the use of the DEAL Model when assessing service learning courses.


Norris J. M., & Pfeiffer, P. (2003). Exploring the Uses and Usefulness of ACTFL Oral Proficiency Ratings and Standards in College Foreign Language Departments. Foreign Language Annals36(4), 572-581.

[Annotated by Irina Kogel]

This study was undertaken at Georgetown University by faculty within the German Department (GUGD). The authors begin by establishing the growing trend of using Oral Proficiency Interviews based on the ACTFL Proficiency Guideline as an end of program assessment and/or graduation requirement within college foreign language classrooms. Taking this trend into consideration, the study aims to assess how accurately OPIs show students’ language gains, and to make alternate suggestions for assessment as appropriate.

In the course of the study (1999-2002), the German Speaking Test (GST), a simulated OPI was administered to over a hundred students. As per ACTFL protocol, two certified raters evaluated each GST exam, and a consensus was reached in cases of disagreement by recourse to a third rater. The GST scores were then cross-referenced against the number of semesters students had been enrolled in German language classes at GUGD. The study results did not show significant correlation between GST scores and the actual curricular level of the student. Furthermore, it showed a quick gain to intermediate-low/mid (the standard used in many universities as the end of program requirement), with progress through subsequent levels taking place slowly over several semesters of instruction.

Given the results, the authors are concerned that the ACTFL OPI test does not adequately assess what makes the study of foreign languages worthwhile, and that current OPI benchmarks set too low of a bar. However, the study identifies positive uses for the OPI within curricular development. For example, more emphasis was placed on different speaking genres when the results of the GST showed that some students were underperforming. The GST results also serve as a corpus of student-produced speech that can be further analyzed to test hypotheses about effectiveness of the curriculum.

Ultimately the authors conclude that while external evaluation tools such as the GST are valuable for providing an external check on program efficacy, allow for a measure of comparability between different German programs, and give students a way to convey language competencies to other academic units or employers, they should not be the sole assessment for student language competency and should not serve as the final measure of whether university language requirements had been fulfilled. Specifically, gains in intercultural competence or other modalities (writing, reading and listening) are not properly addressed by the test. The authors are also concerned that extended emphasis on the GST scores overburdens individual students rather than having the classroom be a site of collaboration between students and teachers.  


Prunuske, A. J., Wilson, J., Walls, M., & Clarke, B. (2013). Experiences of mentors training underrepresented undergraduates in the research laboratory. CBE-Life Sciences Education12(3), 403-409.

[Annotated by Beth McBride]

This assessment was done to better understand mentor’ experiences working with undergraduates. The purpose was to inform and improve mentorship for minority students in order to improve retention rates in science fields, specifically health sciences. The assessment was done because there are many national grant programs meant to provide funding and support for minority students pursuing higher education in STEM fields, and most of these programs include a mentored undergraduate research experience. However, while it is understood that mentorship is important, there is not much research on how mentors should support underrepresented mentees.

The subjects in this study were selected based on their participation in mentoring programs designed to increase the enrollment in graduate school of students from underrepresented groups in science. The program enrolls eight new students per year. The mentors had been at the institution for anywhere from 1 to 46 years and were from medicine and pharmacy professional schools and from biology and chemistry undergraduate departments. Among mentors, three were at the rank of professor, seven were associate professors, two were assistant professors, and three were graduate students. The sample included seven female and eight male mentors. Mentees were undergraduate students. The project leads were two researchers, one with experience in qualitative research and one with experience in laboratory research; these two researchers conducted the interviews for this study.

There were 15 mentors in the group, and each of them participated in a semi-structured interview. Most questions were open-ended, and participants were able to direct the flow of conversation. Interviews lasted between 40-90 minutes. Interviews were coded by a team of three researchers who used an inductive approach. The assessment focus in this case is on the specific program at the University of Minnesota that aims to keep minorities in the STEM pipeline. The researchers proposed a mentor-training program using the results from their interviews. The program includes six steps in training mentors to work with minorities. These steps were developed based on findings from interviews about the struggle mentors have with conceptualizing underrepresentation and how diversity impacts the mentorship relationship.


Schulz, A. S. (2007). The Challenge of Assessing Cultural Understanding in the Context of Foreign Language Instruction. Foreign Language Annals4(1), 9-26. 

[Annotated by Irina Kogel]

This article explores a difficulty that arises in assessing the development of students’ intercultural competence in the foreign language classroom. Rather than sharing the outcomes of a specific assessment study, the article identifies a problem in foreign language assessment and seeks to offer possible solutions. The author proposes an intervention that is specifically targeted at US students of German as a foreign language, with suggested adaptations for use at the high school and college level. The article’s suggestions can readily be adapted for other languages, however, and the literature review provided at the beginning of the article would be informative for anyone interested in questions of teaching and assessing (inter)cultural competence.

The motivation for the article stems from a lack of clarity about how to teach and assess the teaching of culture in the foreign language classroom. The author argues that although there is a consensus that culture should be incorporated into language teaching, there is a marked absence of research into how best to accomplish this. Not only that, but a 1999 study of high school foreign language teachers shows a lack of consistent definitions of culture among language teachers and indicates that cultural content is typically not well integrated with language instruction, presented only in a superficial or unsystematic fashion, and generally does not constitute a large component of the overall class assessment. 

The author underscores that the concerns identified by the article have specific ramifications for various stakeholders in schools/universities and in the community as a whole: firstly, The Standards for Foreign Language Learning in the 21stcentury extensively incorporate goals focused on cultural knowledge and intercultural competence, and secondly, employers consistently highlight intercultural competence as a key quality when seeking employees for the new global economy. As a result, developing intercultural competence and assessing the success of teaching this skill is of vital importance for language programs, instructors, students, and employers engaged in international business.

The author therefore proposes five fundamental objectives for fostering intercultural exchange and awareness among students, and presents a template for an ongoing portfolio assessment, which serves as a powerful tool for both developing and assessing this tricky element of language learning. The culture portfolio is offered as a possible antidote to typical assessment materials that, counter to the goals of intercultural competence, steer students toward stereotyping or generalization. Furthermore, it can serve as both a formative and summative assessment tool, and allows for the evaluation of both product and process.


Szteinberg, G.A., & Weaver, G.C. (2013). Participants’ reflections two and three years after an introductory chemistry course-embedded research experience. Chemistry Education Research and Practice14, 23-35.

[Annotated by Max Helix]

For this study, a large randomized experiment was done to compare the student outcomes of two different lab curricula.  These labs are part of the second-semester general chemistry course at Purdue.  Approximately 650 students enrolled in this course in Spring 2007, and they were randomly assigned to either a traditional lab section or a section using newer labs developed by the NSF-funded Center for Authentic Science Practice in Education (CASPiE).  The lead author on this paper is a professor of chemistry and science education; she is also part of the CASPiE project.

The officially stated reason for this study is to assess the “long-term effects of the CASPiE program on students.”  However, it is fairly clear that the point of this study is to show that the CASPiE labs are superior to traditional “cookbook” labs in which students follow very detailed instructions and show little independent thinking.  Assessment of this program is probably also linked to its funding.  In addition to the treatment/control difference, the authors are also interested in showing that at least some benefits of undergraduate research positions can be achieved through a course-based experience.  While research positions in ongoing labs are usually limited to a small, highly motivated, and self-selected group, course-based research curricula have the opportunity to expand more authentic research experiences to a broader array of students.

There have been a number of publications based on this experiment, but this paper focuses on student perceptions of the two lab curricula.  The primary data used for this analysis were Likert scale responses to surveys given at the beginning and end of the course.  These data were supported throughout the publication by quotations from interviews done directly after the course and follow-up interviews conducted 2-3 years later.  Overall, the surveys showed that students in the CASPiE lab were more likely to report a belief that they understood and remembered the work done in their labs, a feeling of self-confidence or accomplishment, and a belief that they were participating in “authentic” research.  They were less likely to agree that the lab manual gave explicit instructions (an intentional design principle behind the newer labs), that the labs were well organized (which the authors rationalized by saying this was the first year for the new curriculum), or that the lab helped them in the lecture (a somewhat expected negative consequence).  These results will likely be used to justify full implementation of the CASPiE curriculum.


Thaler, N., Kazemi, E., & Huscher, C. (2009). Developing a rubric to assess student learning outcomes using a class assignment. Teaching of Psychology36(2), 113-116.

[Annotated by Chia Okwu]

This assessment article sets out to develop a rubric to assess several of the Psychology Department at California State University, Northridge (CSUN) undergraduate student learning outcomes (SLOs). A focus group of faculty members initially developed the survey. The faculty randomly sampled 20 percent (N = 55) of the final written manuscripts from several sections of a research methods course and trained 2 graduate-level raters to use the rubric to score the students’ papers.

The assessment findings revealed statistically significant interrater reliability and convergent validity coefficients. The strong interrater reliability suggests that the meaning of each item was adequately transmitted to the raters. The item’s low correlations can be attributed to the relatively obscure nature of the item itself.  The authors found that items that depended strongly on writing ability were most similar to the actual grade of the manuscript, and items that focused on statistics and research methodologies were dissimilar.

Overall, the purpose of this study was to follow Halonen et al.’s (2003) footsteps and to empirically develop a rubric that assesses achievement of certain learning outcomes using psychology students’ research manuscripts. The researchers hope that their findings will demonstrate the potential of designing rubrics to assess psychology SLOs in both an objective and reliable manner.


Yadav, A. et al. (2010). “Lessons Learned: Implementing the Case Teaching Method in a Mechanical Engineering Course.” Journal of Engineering Education. 55-69.

[Annotated by Vivek Rao]

In this work, Yadav et al seek empirical evidence that case studies enhance student outcomes – specifically conceptual understanding – in engineering courses. The researchers study two sections of a required controls class in the mechanical engineering department at Purdue University, encompassing total of 73 students. It appears that the study was motivated by the interest in contributing to the literature, as the lead author is a professor of education psychology at Purdue, and there is no indication in the article that the study’s findings help shape course- or curriculum-level aspects of mechanical engineering at Purdue. The outcomes of the study were thus intended to add to the scholarly dialogue on the role of case studies in engineering education, with a specific focus on student response to in-class usage of case study as a replacement for a traditional lecture.

The team used an A/B research design with two separate sections of students, with 40 students in the first section, and 33 in the other. Section A was taught topic I via lecture, while section B was taught via case study (the intervention). For topic II, the intervention was reversed. Students responded to a survey questionnaire, which used a five-point Likert scale across several questions to gauge their conceptual understanding. ACNOVA was applied to determine the significance of three factors – class, intervention, and topic – to the measured understanding. Results indicated that there was no differentiable effect of case study or lecture on student conceptual understanding; however, there was a significant effect of class as a factor, suggesting that instructors had a greater impact on student outcome than content. No indication of the study being used for curriculum or course development was provided – perhaps because it was heavily inconclusive.