benefits of clickstream analysis
1 corresponds to an event generated by a particular student, identified via his or her (anonymized) Student ID. There is suggestive visual evidence that the treatment students were, on average, engaged with course materials earlier in the day than control students in the first 2 weeks of the course (when the treatment was active), though this does not stand up to statistical tests. Interestingly, heatmap plots of the average behavioral patterns of students indicated that procrastination was related to course grades (Fig. https://doi.org/10.1177/2332858416674007. Baker, R., Evans, B., & Dee, T. (2016). Identifying customer trends: Thanks to clickstream information, companies can see the path customers have taken in order to get to their site. Metacognition and Learning, 6(2), 83–90. Fischer, C., Pardos, Z., Baker, R. S., Williams, J. J., Smyth, P., Yu, R., … Warschauer, M. (2020). Request a free brochure of our analytics solutions to gain comprehensive insights into the benefits and capabilities of our clickstream analytics solutions portfolio. PubMed Google Scholar. Beattie, G., Laliberté, J. W. P., Michaud-Leclerc, C., & Oreopoulos, P. (2017). Sie helfen dabei, diese Fehler zu entdecken, sie zu korrigieren oder zu beseitigen. For example, researchers have built early detection systems for dropout or poor course performance, which can help instructors allocate their attention to the most at-risk students (Baker, Lindrum, Lindrum, and Perkowski, 2015; Bosch et al. Access and success with less: Improving productivity in broad-access postsecondary institutions. 23–32). JEDM Journal of Educational Data Mining, 1(1), 3–17. Yet, analyses of these data often require advanced analytic techniques, as they only provide a partial and noisy record of students’ actions. 2018; Roll and Winne, 2015). Self-regulated learning strategies & academic achievement in online higher education learning environments: A systematic review. This indicates that high-performing students are more likely than low-performing peers to follow the course sequence intended by the instructor (e.g., watching all videos from Module 1 before all videos from Module 2), while low-performing students are more likely to watch videos out of the intended order (e.g., watching videos from Module 4 before videos from Module 1). https://doi.org/10.1007/BF00138396. The authors used these daily counts to determine whether and when a student changed this relative activity level during the course. Winne, P. H., & Jamieson-Noel, D. (2002). Students who received an A are shown in the left panel, with moderate levels of engagement every day throughout the course. A snapshot of the type of data that is provided by a learning management system (LMS; Canvas) for clickstream events. Clickstreams can be stored on the server that supports a website, as well as by a user’s own web browser. Exploring students’ calibration of self reports about study tactics and achievement. American Economic Journal: Economic Policy, 3(3), 62–81. Levy, Y., & Ramim, M. M. (2013). This includes the identification of student subpopulations with respect to their use of online resources (Gasevic, Jovanovic, Pardo, and Dawson, 2017) or students’ engagement patterns in MOOC environments (Guo and Reinecke, 2014; Kizilcec, Piech, and Schneider, 2013). https://doi.org/10.1145/3170358.3170381. https://doi.org/10.1080/00223980309600625. This distinction was true for both the preview and review daily activity but the relationship was stronger for review activity. Misra, R., & McKean, M. (2000). Learning management systems comparison. To give an example, these two URLs (canvas.uci.edu/pages/segment-5 and canvas.uci.edu/files/89283) show the potential complexity of determining content from URLs. As a result, understanding students’ self-regulatory behaviors and identifying effective ways to scaffold these behaviors is imperative for improving online learning outcomes. (2017). Also, without a deep understanding of the context, researchers might fail to notice nuances of instructional design that actually play important roles in shaping students’ learning behavior. Bitpipe.com . 6–14). Educational Psychology Review, 16(4), 385–407. 2020). This image shows how data is sent to Stream Analytics, analyzed, and sent for other actions like storage, or presentation: The authors found that being assigned to treatment had no effect on measured procrastination, spacing, or the composite time management score; students in the treatment group exhibited very similar engagement patterns to students in the control group. Whitehill, J., Williams, J. J., Lopez, G., Coleman, C. A., & Reich, J. The raw log files contain a record for each student click event in the form of student ID, time-stamp, and URL. Third, clickstream data allow for novel analyses that aim to advance understanding of how to identify and cluster student subgroups, as well as to personalize interventions to support learning processes. A comparison of active and passive procrastination in relation to academic motivation. The Future of Children, 23(1), 187–209. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference, (pp. Li, Q., Baker, R., & Warschauer, M. (2020). Hahn, C., Cowell, J. M., Wiprzycka, U. J., Goldstein, D., Ralph, M., Hasher, L., & Zelazo, P. D. (2012). Trueman, M., & Hartley, J. In Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale, (pp. Computers in Human Behavior, 52, 293–306. Enabling real-time adaptivity in MOOCs with a personalized next-step recommendation framework. These details can reveal even more when they belong to registered users – here, demographic information is made available, enabling the company to create targeted ads and other, more personalized offers. Since that time, click stream data analysis has emerged as a powerful and cost-effective tool that can benefit businesses in the following ways: 1. In studies like the above two examples, extensive data exploration and pre-processing, as well as careful decisions about measurement, are necessary to obtain valid conclusions. Let’s take a look: 1. Sitzmann, T., & Johnson, S. K. (2012). Second, the clickstream measures of time management were better predictors of students’ performance in the class than were the self-reported measures (Li et al. Carrell, S. E., Maghakian, T., & West, J. E. (2011). It could be that a feature on that page isn’t working as it should, or the website is asking users for information that they aren’t ready to give at that juncture. A plausible secondary mechanism by which the treatment could have affected outcomes is by inducing the treatment students to spend more time on their classwork. For instance, if a majority of users utilized the same search term to reach a site that led them to the brand’s website, the company can eliminate the middle man and ensure that their site is optimized for that search term. In Paper presented at the meeting of the International Educational Data Mining Society, Spain. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. For example, individual instructors at the same school may enable or disable specific LMS features, and the general structure of the available materials may vary across courses. A form of Web analytics (see separate entry), clickstream analysis is the tracking and analysis of visits to websites. https://doi.org/10.1016/j.compedu.2010.07.025. Park, J., Yu, R., Rodriguez, F., Baker, R., Smyth, P., & Warschauer, M. (2018). Schellings, G., & Van Hout-Wolters, B. (2019) and Park et al. College students academic stress and its relation to their anxiety, time management, and leisure satisfaction. Short- and long-term effects of students’ self-directed metacognitive prompts on navigation behavior and learning performance. (2018), The number of task counts per day, for each of the 5 weeks, averaged over the students in each grade group. Carrell, Maghakian, and West, 2011; Goldstein, Hahn, Hasher, Wiprzycka, and Zelazo, 2007; Hahn et al. While these findings provide suggestive evidence that SRL skills play an important role in the learning process, most previous studies have relied on student self-reported instruments to measure SRL skills and investigate the role of SRL in online learning. In Proceedings of the third international conference on learning analytics and knowledge, (pp. This paper is designed to help instructors, administrators, and institutional researchers understand the basic concepts of working with clickstream data and the promising ways in which such data can affect the instructional design and student learning. Crossley, S., Paquette, L., Dascalu, M., McNamara, D. S., & Baker, R. S. (2016). The Internet and Higher Education, 27, 1–13. Converting the raw logs into streams of preview/review events consisted of (i) manually identifying both the relevant URLs for file downloads and the relevant times (such as that of a lecture or exam) associated with each file and then, given this information, (ii) creating a program to automate the assignment of a “preview” or “review” label to each individual download event by each student. Lighter-shaded hashes indicate fewer clicks and darker-shaded hashes indicate more clicks. But clickstream data doesn’t end there. (1996). “Just as drivers can take different roads to arrive at the same destination, customers take different paths online and end up buying the same product,” Qubole noted. Perry, N. E., & Winne, P. H. (2006). In the case of our first example study, detecting changes in student behavior may be a more relevant study for a course that is offered for a longer term, and in the case of our second example study, using a weekly vector of daily counts to examine patterns of procrastination patterns is only possible when a course has a weekly repeated structure (that is, roughly the same deadlines, activities, and assignments each week). The left panels show aggregated daily counts for non-procrastinators and the right panels for procrastinators. There is consistent evidence that student online performance is associated with self-reported SRL skills overall. Data mining for web personalization. Students in the increased group had a higher probability of passing the course, while the opposite was true for the decreased group; students who increased the activity (relative to all of the students in the course) at some point during the course term had a higher chance of passing the class than those who did not. In contrast, students who received a C, D, or F are shown in the right most panel, with high levels of engagement, indicated by dark shading, on Fridays and much lower levels on other days. 2016; Kazerouni, Edwards, Hall, and Shaffer, 2017; Levy and Ramim, 2013), and how close together work sessions are (e.g., Baker et al. For instance, if a large number of users exit the site on a certain page, it could be very telling about the page itself. Future work could test these hypotheses in a causal framework. Third, the remaining clicks from each student were sorted by timestamp and converted to a tokenized sequence, in which each click was assigned the unique ID for the content page it pointed to. Retention in online courses: Exploring issues and solutions—A literature review. https://doi.org/10.1016/j.jebo.2010.11.010. https://doi.org/10.1080/03055698.2014.899487. Consequently, these data are not always accessible or useful for course instructors and administrators. (2005). Developmental Science, 15(3), 408–416. For many years, educational researchers have tried to crack the “black box” of learning, by better illuminating the learning processes that lead to learning outcomes. Distance Education, 37(3), 317–332. https://doi.org/10.1186/s41239-020-00187-1, DOI: https://doi.org/10.1186/s41239-020-00187-1. auch Logfile-Analyse). As we discuss in the “Using clickstream data to understand SRL” section, self-reported data may not be effective measures of SRL, as many individuals suffer from self-report bias and past memories are often insufficient for students to accurately recall past behavior or predict future events. 272–279). This finding highlights that understanding the actual pathways that students take could help instructors to identify and redirect struggling students or to redesign courses better aligned with students’ navigational behaviors. The state of educational data mining in 2009: a review and future visions. Because of the structure of the course and the available LMS data, only clickstream data related to lecture watching was analyzed in this study. Such behavioral differences outside of researchers’ focus will be invisible in the cleaned sequences and, like any other omitted variables in traditional education research, might bias researchers’ conclusions about the “interesting” clicks. (2018), Time of course interactions, control and treatment students, first and second weeks (left panel) and third through fifth weeks (right panel) of 5-week Physics course. Elvers et al. The authors provided a randomly selected half of the students in a for-credit online physics class with the opportunity to schedule when they would watch the lecture videos in an otherwise asynchronous, unscheduled class. We (Park et al. Best of all, this is only the beginning when it comes to clickstream data. Clickstream Analytics can be used as a powerful tool to generate valuable business insights from the data logs collected from the online platforms. These studies thus allow us to illustrate the promises and potential challenges of working with clickstream data in authentic education settings. Clickstream data can provide partial measures of other potential mechanisms (such as student time on task) and do not provide any purchase on understanding other potential mechanisms. 2015; Michinov, Brunot, Le Bohec, Juhel, and Delaval, 2011), they are particularly critical to success in online, hybrid, and flipped courses, as these classes require a high degree of independence and autonomy (Bawa, 2016; Jaggars, 2011; Jenkins and Rodriguez, 2013; Park et al. Second, significant heterogeneity in behavior and variability over time also often complicates analyses. The discussions of these four papers reflect the findings of extant literature by highlighting the ways in which clickstream data can be particularly helpful in this context—by providing time-varying measures, by showing how students use course resources, and by illuminating which specific behaviors interventions affect—but also allow us to demonstrate the unique challenges and considerations that come with working with clickstream data. 5). Pintrich, P. R., & De Groot, E. V. (1990). University of California, 2060 Education, UC, Irvine, Irvine, CA, 92697, USA, Rachel Baker, Di Xu, Jihyun Park, Renzhe Yu, Bianca Cung, Fernando Rodriguez, Mark Warschauer & Padhraic Smyth, University of Tübingen, Tübingen, Germany, You can also search for this author in Avoiding procrastination through time management: An experimental intervention study. A randomized experiment testing the efficacy of a scheduling nudge in a Massive Open Online Course (MOOC). One major line of research on using clickstream data is to measure student SRL behaviors with the goals of better understanding and supporting SRL (Roll and Winne, 2015). https://doi.org/10.1257/pol.3.3.62. Does training on self-regulated learning facilitate students’ learning with hypermedia? Significant research has been done in studying clickstream data to understand the navigation behavior of users after visiting a Web site. Before we get into the actual application of clickstream data, it’s important to understand what this information is and where it comes from. Chu, A. H., & Choi, J. N. (2005). For instance, by figuring out which paths users most frequently take on a site and which […] Field and online experiments on self-control. Web usage mining for web site evaluation. We then provide a synthesis of four of our own recent research studies that use clickstream data to examine student behaviors and outcomes in online classes. This ersatz measure of time-on-task is an example of how clickstream data can provide some, but not sufficient, insight into student behavior online. However, collecting, analyzing and churning out valuable insights from this data logs in an optimal time frame is quite challenging. Mining big data in education: Affordances and challenges. Using these details, the company can look to create the most efficient path possible for customers, ensuring that they quickly and effectively locate the product they’re looking for and can easily complete their purchase. In addition, internet service providers and online advertising networks also have the capability to record and store clickstream information. 2020). (2017). Through discussions of four studies, we provide examples of the complexities and particular considerations of using these data to examine student self-regulated learning. By automatically recording students’ interactions with online course materials, clickstream data provide a valuable new source of information on student learning behaviors. Seo, E. H. (2013). It is often implemented in the context of a broader market research strategy, such as the analysis of overall Web traffic and visitor data, as well as with other types of data sets used in conjunction with … https://doi.org/10.1016/j.compedu.2009.05.010. As shown in the first subgraph (left panel), the pages generally followed the intended order from Module 1 to Module 4 for students who earned an A. This type of information provides a visual trail of user activity with detailed feedback. In Proceedings of the 8th IEEE Pacific Visualization Symposium, (pp. 2005). https://doi.org/10.2224/sbp.2013.41.5.777. In traditional learning environments, SRL, including students’ time management skills, is mainly measured by student retrospective self-report, which can neither capture how SRL unfolds nor provide timely measures to examine how SRL changes with environmental factors. 2018) to examine students’ time management skills. In addition, recent work has interrogated the extent to which clickstream measures provide valid inference about various SRL constructs in two ways: (1) by examining whether students’ perceptions about their self-regulated learning correspond to their click patterns, and (2) by examining the extent to which clickstream measures complement self-reported measures in predicting student course performance. Improving measurements of self-regulated learning. “Basket analysis helps marketers discover what interests customers have in common, and the common paths they took to arrive at a specific purchase.”. AERA Open, 2(4), 1–18. Shi, C., Fu, S., Chen, Q., & Qu, H. (2015). The first approach is based on aggregate non-temporal representations of the clickstream information per student, in which information is combined over time. Bosch, N., Crues, R., Henricks, G. M., Perry, M., Angrave, L., Shaik, N., … Anderson, C. J. Unlike most empirical papers that focus on results and outcomes, the discussions in this paper focus on the process of using clickstream data to understand student learning processes. H-statistic comes from a Kruskal-Wallis test. Even if the researcher is able to obtain click-data on user activity from the website outside of the LMS, matching student information on the external website poses another challenge. Analyzing early at-risk factors in higher education e-learning courses. Lim, J. M. (2016). https://doi.org/10.1037/0022-06220.127.116.110. In this webinar, Jon King, real-time data expert and author of the book Operationalizing the Data Lake, will discuss. While such data only provide a partial and noisy record of a student’s actions, they enable practitioners and researchers to collect information at scale about how students interact with online education resources and thus promise more objective and richer insight into the learning experience than many other methods. Long answer: Clickstreams will tell you about user behavior. From the time a student generates a click event to the time a researcher obtains a representation of that same click event, that information may have passed through a pipeline of different pieces of software. Clickstream analysis is used to understand users on numerous levels, even down to the individual user. RB oversaw the entire submission and revision process, co-wrote the section describing clickstream data, co-wrote the sections describing the design and outcomes of interventions using clickstream data, and helped with data analyses for the studies using clickstream data to describe and intervene on student time management in online classes. For instance, clickstream data only capture students’ interactions with online materials. 2019). While there is a growing volume of studies that use clickstream data to measure student self-regulatory behaviors, rarely do these studies provide a detailed discussion about the complexities of constructing behavioral measures, the importance of contextual factors required to interpret clickstream data in meaningful ways, and the many caveats associated with these data. For example, researchers may choose to ignore clicks that are not directly related to the behavior of interest. Clickstream Virtuelle „Fußspur“ des Users im Online-Angebot eines Anbieters, bzw. Computers & Education, 56(1), 243–252. Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., & Loumos, V. (2009). Bannert, M., Sonnenberg, C., Mengelkamp, C., & Pieger, E. (2015). Understanding student procrastination via mixture models. The benefits and caveats of using clickstream data to understand student self-regulatory behaviors: opening the black box of learning processes. Left: rows sorted by the number of total clicks, Right: rows are first grouped as three different behavioral groups, and then ordered by the chronological location of the changepoint per student within each group. To find out more about how this process can benefit your business’s unique needs, contact the experts at Aunalytics today. 7). For instance, instructors can monitor which resources students use most and test different designs that might allow them to better calibrate the course, either to emphasize important resources that are valuable but under-utilized by students or to provide more resources that students favor, affording more targeted guidance and feedback (Bodily and Verbert, 2017; Diana et al. Fourth, it is important for users of clickstream data to be careful about monitoring data quality. FR managed the grant that resulted in much of this work contributed to the literature review sections, particularly the self-regulated learning overview. The work presented in this paper was supported by the Investigating Virtual Learning Environments grant from the National Science Foundation (grant # 1535300). Cambridge: National Bureau of Economic Research. “Every time he or she clicks on a link, an image, or another object on the page, that information is recorded and stored. Clickstream-Analysen haben eine große Bedeutung im Bereich der Internet-Marktforschung. Essentially, good clickstream data clearly defines a full set of events which allows you to get a complete picture of customer behavior. Moreover, unlike self-reported measures that are usually collected at only one or limited time points, these measures can be used to investigate how student SRL behaviors unfold over time and to explore how personal and environmental factors influence SRL behaviors. The authors declare that they have no competing interests. This “noise” may exist in different dimensions, such as at the student level or at the click level. 456–460). Therefore, the authors used the total number of clicks per week as a proxy for time on task and found no evidence that the treatment induced students to spend more time engaging with the course platform. 1993). JP conducted some analyses using clickstream data to describe student behavior and time management and wrote sections describing these studies. Multiple approaches, such as self-report questionnaires, observation, and think-aloud protocols have been used to measure SRL, with self-report questionnaires being the most widely used (Schellings and Van Hout-Wolters, 2011; Winne, 2010). High Availability Implementation on the AWS platform, with Cambridge Technology’s (CT) quick and secure architecture. https://doi.org/10.1145/3027385.3027403. Educational Studies, 40(3), 352–360. Educational Psychologist, 45(4), 267–276. For example, students with extremely low levels of activity (0 or 1 total clicks) were … Visualisation tools for supporting self-regulated learning through exploiting competence structures. https://doi.org/10.1016/j.iheduc.2015.04.007. Burger, N., Charness, G., & Lynham, J. These data can be used to define and identify behavioral patterns that are related to student learning outcomes, suggest behavioral changes to students for greater success, and provide insights regarding the mechanisms by which education interventions affect student outcomes. Macan, T. H., Shahani, C., Dipboye, R. L., & Phillips, A. P. (1990). The particular decisions must be driven by the specific context of the course under study. These tokenized sequences were then ready for modeling. In practice, most URLs can be readily assigned to categories such as “grades,” “file downloads,” “assignments,” or “quizzes.” This type of clickstream data can also be combined with LMS-provided information about additional student activities, such as the text content of search queries, text context in forum discussions, or interactions between students. The Motivated Strategies for Learning Questionnaire (MSLQ) developed by Pintrich et al. 2018; Lykourentzou, Giannoukos, Nikolopoulos, Mpardis, and Loumos, 2009; Whitehill, Williams, Lopez, Coleman, and Reich, 2015). https://doi.org/10.1145/3183654.3183681. 2018; Park et al. The authors read and approved the final manuscript. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education, (pp. Although there are other ways to collect this data, clickstream analysis typically uses the Web server log files to monitor and measure website activity. https://doi.org/10.1016/j.iheduc.2015.11.003. zwischen Online-Angeboten verschiedener Anbieter. Thus, clickstream measures of a number of SRL planning behaviors, such as procrastination, cramming, and time-on-task, are potentially incomplete and noisy due to data availability. You most likely have conducted some form of clickstream analysis already. 104–109). (2011). Personalized content is growing, just check out some major online retailers. Understanding, evaluating, and supporting self-regulated learning using learning analytics.