Understanding student learning pathways in traditional online history courses: utilizing process mining analysis on clickstream data Understandingstudentlearningpathways

Purpose – This study explores ongoing research into self-mapped learning pathways that students utilize to move through a course when given two modalities to choose from: one that is instructor-led and one that is student-directed. Design/methodology/approach – Process mining analysis was utilized to examine and cluster clickstream data from an online college-level History course designed with dual modality choices. This paper examines some of the results from different approaches to clustering the available data. Findings – By examining how often students interacted with others, whether they were more internal or external facing with their pathway choices, and whether or not they completed a learning pathway, this study identified five general tactics from the data: Individualistic Internal; Non-completing Internal; Completing, InteractiveInternal;Completing,Interactive,andReflectiveandCompletingExternal.Furtheranalysisofwhenstudentsusedeachtacticledtotheidentificationoffourdifferentstrategiesthatlearnersutilizedduringclass sessions. Practical implications – The results of this analysis could potentially lead to the creation of customizable design models that can assist learners as they navigate modality choices in learner-centered or less-structured learning design methodologies. Originality/value – Fewcoursesaredesignedtogivethelearnerstheoptionstofollowtheinstructororcreate their own learning pathway. Knowing how to identify what choices a learner might take in these scenarios is evenlessexplored.PreliminarydataforthispaperwasoriginallypresentedasapostersessionattheLearningAnalyticsandKnowledgeconferencein2019.

The Self-Mapped Learning Pathways (SMLPs) instructional design methodology is a course design process with the goal of allowing learners to develop an individualized pathway throughout a course that has options for instructor-led and student-directed modalities. Learners can change and mix modalities at any point through the duration of the course. To date, this option has mostly been utilized in massive open online courses (Crosslin, 2018). This study seeks to understand how learners navigate these options when they are a part of a traditional 15-weeks college course. Process mining analysis was initially utilized to quantitatively document the clickstream artifact evidence of the pathways that learners mapped through the mixture of instructor-directed and less-structured options.
Literature review SMLP courses were first conceptualized as dual-layer courses in 2014 (Crosslin, 2018). The first dual-layer course was the Data Analytics and Learning Massive Open Online Course (DALMOOC). The goal of designing courses with two layers was to grant learners the option to choose which design epistemology they wanted to participate in as they moved through the course. The two epistemologies that were chosen for DALMOOC were instructivism and connectivism. Instructivism was defined as an epistemological framework where course activities and content are determined by the instructor as a way to impart knowledge from the instructor to the learners (Porcaro, 2011). Connectivism was defined as an epistemological framework where course participants are learning through nebulous, shifting connections they make with other people and machines as they make sense of topics and skills of interest (Siemens, 2005). In general, instructivism is operationalized as an instructor-centered approach, while connectivism is operationalized as a learner-centered approach.
The design goal of DALMOOC was to situate both of these epistemological modalities next to each other in the course, allowing learners to choose which one they wanted to engage with. The main difference between this structure and other similar personalized learning structures (such as branching scenario lessons or adaptive learning software) is that course participants could change modalities as necessary, depending on their interests or changing needs as they learn (Crosslin, 2018). For example, learners could choose to follow the instructivist modality at first because they are unfamiliar with the topic. As learners gain familiarity, they could leave the instructivist modality and connect with other learners to examine the same topic from a different sociocultural context. Additionally, learners could check back with the instructivist modality from time to time to see what the instructor is teaching, while mixing this predetermined content with the new content they are discovering through the connectivist modality.
Initial research into DALMOOC as well other courses that adopted the dual-layer model indicated that learners appreciated the ability to customize their learning pathways (Crosslin, 2018;Crosslin and Dellinger, 2015;Crosslin et al., 2018), even if technologic limitations hindered some of those choices (Ros e et al., 2015). Other researchers also noted the need to focus on self-regulated learning (Dawson et al., 2015) and understanding how learners moved through the course modality options.
Due to these results, along with feedback from course participants, the dual-layer concept was dropped in favor of focusing on how learners decide to map their own way through the learning choices. This decision led to renaming the concept as SMLP. This change in focus was an intentional move toward a "heutagogical model that gives ownership and agency to the learner and respects their preferred approach to learning" (Bali and Caines, 2018, p. 17). Heutagogy is a self-determined learning theory that focuses on learners practicing how to learn about a topic rather than on memorizing or practicing specific reified knowledge that has been pre-determined by the instructor (Blaschke, 2012). However, as many learners are more accustomed to a single instructivist pathway in all courses (Onyesolu et al., 2013), this option is included as one of the choices in a SMLP course so that learners can utilize it as scaffolding toward their preferred approach to learning (or to even choose it as their preference if they so desire). Very little research has been conducted into the patterns that form from the learners' self-mapping activities, and as to why they made the choices that led to these patterns.
Learner agency is often viewed as a critical goal of formal educational efforts (Manyukhina and Wyse, 2019). Teachers do not want to create learners that are always dependent on others for their educational needs. However, learner agency involves students knowing how to make deliberate, intentional choices in their educational activities (Manyukhina and Wyse, 2019). By allowing students the option to make their own choices, the research team hopes to gain insight into their choices in order to refine course options to better promote learner agency. Understanding how learners map their own pathway options will supply course designers and instructors with valuable insight into how they can help (or hinder) learner agency efforts.
Additionally, the field of learning analytics has demonstrated interest in exploring learner choices, pathways and the impact of personalized content. Some examples of this include: providing customized content and activities by clustering and using the K-Means algorithm to predict learners' cognitive states (Troussas et al., 2020), personalizing learning pathways through data clustering (Iatrellis et al., 2020), examining temporal or sequential relationships of processes involved in self-regulated learning choices (Saint et al., 2021) and utilizing a sequence clustering algorithm to group students by similar learning pathway choices (Patel et al., 2017). These studies and others often focus on classifying, recommending and personalizing an instructor-led pathway that contains pre-selected learning resources (Ramirez-Arellano, 2019). Some studies have provided empirical evidence that the personalization of learning pathways can improve learning (De Smet et al., 2016;Yang et al., 2012). When students were allowed to make some of their own pathway decisions through tinkering, Berland et al. (2013) found that those pathways were different, but that there was not a clear pathway that worked better than others. Therefore, this paper seeks to expand tinkering to include complete pathway control while also following the work of others in the learning analytics field to classify and begin to understand these choices.

Methods
One of the challenges to designing a course that allows learners to take their own preferred approach to learning is that little is known about what they would do when given the choice to follow the instructor, map their own pathway or mix the two options. This study set out to examine trace activities in the clickstream data to see what insights this analysis would yield into the choices learners made.

Research question
Based on the need to understand how students engage with SMLP, this study investigated one primary research question: (1) What patterns, clusters or characteristics of students' pathway choices can be determined from process mining analysis of available clickstream data?

Research context
The research team implemented the SMLP methodology into fully online History of Civilization course sections offered at a four-year, public institution during three course offerings in 2017. These sections initially made use of the Blackboard learning management system and later included ProSolo (http://prosolo.ca/), a competency-based learning platform Understanding student learning pathways that promotes self-directed and social learning. Doing so provided students with greater flexibility to navigate through course material given the linear constraints of Blackboard and the transition was seamless through single sign-on.
Learners originally had three pathways to choose from: one that was created by the instructor, one that focused on different geographic locations and one that was based on various historical themes such as political organization or religion/philosophy. In order to access materials, learners clicked on direct links that were integrated into their chosen pathway. In the second phase of the study, students accessed course material in ProSolo where Blackboard primarily served as the entry point and gradebook. The research team presented the content in an instructor-suggested order, but learners had the option to pick other paths through the material and were scaffolded with written instructions. Learners set goals for their own pathway through a course unit, and then reflected on the process after completion three times over the duration of the course. Near the end of the study, the research team moved all materials back into Blackboard only due to budgetary constraints.
In addition to allowing learners to create their own pathway through course content, the course design team also provided learners with the opportunity to choose their own assessments through assignment banks. Learners also had the option to propose their own type of submission and provide evidence of content mastery according to their own strengths, interests and professional goals. Assessment choices varied over the three semesters of the study, but there were also some standard assignments that all learners had to complete to show the development of specific skills required by historians.
Process mining analysis of clickstream data Process mining consists of a set of techniques for analyzing data coming from event logs. Process mining analysis was initially chosen because recent research has found it can be helpful in identifying and detecting process patterns in self-regulated learning events (Bannert et al., 2013), patterns in learning behavior (Jovanovi c et al., 2017) and learning strategies (Saint et al., 2018).
Data used in this study consisted of trace-data generated by students in the ProSolo platform. Learning sessions were represented by a sequence of learning events. A learning event is viewed as an occurrence centered on one learning action (Matcha et al., 2019). The pMinerR tool was used to generate the process model of all learning sessions. This model is based on the First-Order Markov Model (FOMM), which calculated the probabilities of transitioning from each state (learning event) to another, with an assumption that the next state depends only on the current state, but not the previous ones.
Based on the generated FOMM model, the learning sessions (representing sequences of learning events) were clustered by using the Expectation Maximization (EM) algorithm. These clusters are referred to as tactics as proposed by Fincham et al. (2018). The learning event frequency in each cluster, the distribution of learning events in each cluster (grouped by the order in which each one occurs in a learning session) and the process model of all learning sessions from the specific tactic (cluster) were analyzed in order to examine each tactic (learning session cluster).

Preprocessing findings
While various other data analytic methods could possibly yield informative data for future studies, process mining analysis produced excellent results for answering the research question. This section will explore some of the results from the final analysis.

Data
The data collected consists of time-stamped learning events that occurred within a specific learning session of a student. Each learning event has the following variables: (1) actorIDanonymized Identification number (ID) of a user who triggered the learning event (2) sessionID -ID of the learning session within which the learning event occurred (3) eventtype of learning event (4) timestamptime at which the learning event occurred (5) countnumber of learning events within a learning session (6) orderordinal number of the learning event within a learning session Additionally, each learning event type from the dataset has a unique description: (1) COMMENT_ACTIVITY: Comment on a learning activity

Data preprocessing
Since the number of learning events per learning session varied, the outliers (i.e. the overly short sessions with only one event and overly long sessions above the 95th percentile in terms of the number of events) were excluded. After the exclusion, the number of learning events per learning session ranged from 2 to 33. There were a total of 22,427 learning events generated within 2,704 learning sessions by 99 students. Figure 1 depicts the process model for all learning events generated from the probabilities of the FOMM model.

Clustering learning sessions
By following the approach from Matcha et al. (2019), five different clusters were identified as tactics: (1) Tactics are typically defined as the sequence of actions that a learner utilizes to complete a certain learning event.
(2) Process mining utilizes time-stamped learning action data to model connections between learning events for each student.
(3) The density of the connections between various learning actions is used to determine the number of tactic clusters.
Analysis of Tactic 1: individualistic internal (N 5 789, 29.18%) This tactic represents a traditional learning flow focused on internal content usage, average activity completion and minimal interaction. It is the most dominant tactic, comprising mostly of content access and activity completion learning events. Learning sessions mostly start from the homepage, advancing to the library or directly to the competency description page where students enroll in (start learning) a competency. After that, learners navigate to the activity details page. There is a relatively small proportion of activity completion learning events in this tactic as well as smaller numbers of interaction learning events like commenting. These aspects all seem to imply an instructivist learning pathway that focuses on activity and content with a small amount of interaction but a notable need to complete activities.
Analysis of Tactic 2: non-completing internal (N 5 679, 25.11%) Tactic 2 represents a semi-traditional learning flow focused on internal content usage with a small but notable amount of interaction. It is also one of the bigger clusters comprising almost exclusively of content access learning events. Students navigate to the competency page either directly from the homepage or from the library page.
In these learning sessions, students mostly do not complete activities, but there are also interactive learning events present. All of these observations seem to imply an instructivist learning pathway that flows through content with less activity completion and slightly more need for interaction. Understanding student learning pathways Analysis of Tactic 3: completing, interactive internal (N 5 334, 12.35%) This tactic depicts a learning flow focused on completion with a noticeable interaction and internal content usage. It is one of the smaller clusters where students explore the learning content and then complete it (complete activities). Students are also interested in the assessments they have received for the completed competencies (completed in previous learning sessions, not the ongoing ones). Additionally, there is a higher presence of interactive Analysis of Tactic 4: completing, interactive and reflective (N 5 649, 24%) Tactic 4 groups learning sessions focused on internal content completion with substantial interaction and reflection where students interact with other students via comments they post on activities. Activity completion learning events are dominant here, meaning students reflect and share their reflections after they complete a learning activity. Distinctive from other clusters is a presence of credential completion learning events, which can also suggest that students reflect on their learning after they complete an entire unit (represented by a credential here). These aspects would also appear to imply an instructivist learning pathway, but one with more extensive interaction and reflection, possibly indicating extensive preexisting knowledge.

Understanding student learning pathways
Analysis of Tactic 5: Tactic 5: completing external (N 5 253, 9.36%) This tactic represents the smallest cluster of learning sessions in which students complete activities that are mostly related to external content (over internal content). Here, students mostly navigate to the external content and then come back to ProSolo to complete the activity. The chart for this tactic also indicated scattered social activities. All of these factors would appear to imply a connectivist, heutagogical and slightly social learning pathway.
Detecting learning strategy based on tactics A learning strategy is defined as the use of learning tactics by students to perform specific tasks to achieve their educational goals. According to Kovanovi c et al. (2015), agglomerative hierarchical clustering is one approach for detecting learning strategies in an online learning context. Therefore, students were clustered based on their use of specific learning tactics. Input into the clustering algorithm is the count of each learning tactic utilized by a student, along with the total number of tactics used. The Euclidian distance metric was used as a similarity measure for the clustering algorithm. The data was first normalized since Euclidean distance is very sensitive to the changes in the differences. Students were grouped into four clusters, meaning four learning strategies were identified: Strategy 1 (N 5 46), Strategy 2 (N 5 11), Strategy 3 (N 5 10), Strategy 4 (N 5 32). In order to interpret different strategies, Figure 2 shows the distribution of tactic use within each strategy that was observed. Also, for each strategy, a separate process model was generated based on the FOMM capturing the sequences of students that belong to a specific strategy. Process models are presented in Figure 3.
Strategy 1 was used by students with a low engagement level. These students mostly used Tactic 1 (Individualistic internal) in the beginning of semester with a high probability to continue using it throughout the term and with a moderate probability to transition to use Tactic 2 (non-completing internal).
Strategy 2 groups students with similar behavior to students from the previous group. The difference is that students in Strategy 2 had a much higher level of engagement. Here, there was the same probability for a student to start the semester with either Tactic 1 (Individualistic internal) or Tactic 2 (Non-completing internal). If a student started with Tactic 1, there was a high probability to continue using it throughout the semester and a moderate probability to transition to Tactic 2. If a student started with Tactic 2, they either stayed in the same tactic (the highest probability) or switched to Tactic 1 or Tactic 5.
Students that belonged to Strategy 3 were highly engaged students that almost exclusively started their semester with Tactic 4 (Completing, interactive and reflective). Learners either stayed using the same tactic or switched to using Tactic 5 (Completing external), depending on the type of activities offered in a specific week of the semester.
Strategy 4 grouped students with similar behavior as in the previous strategy, but with a lower level of engagement. Students mostly started with Tactic 4 (Completing, interactive and reflective) and stayed with the same tactic throughout the semester. There was a moderate probability to switch to using Tactic 5 (Completing external).
Discussion, limitations and future study So far, the analysis results have yielded insight into which tactics and strategies learners choose to take when given the choice to follow a predetermined internal activity pathway or an open-ended self-directed external activity pathway. Strategies 1 and 2 tended to be instructivist pathways, with Strategy 1 being individualistic and Strategy 2 adding more engagement and interaction. Strategies 3 and 4 tended to be more connectivist and interactive, with Strategy 3 being the most interactive and likely to engage with external activities when compared to Strategy 4.
Due to the student-centered nature of the design, analysis of tactics and strategies against grades and demographic factors were determined to possibly introduce too much instructor bias into the data. Future study could dig into what could be done to improve grade performance for each tactic or strategy (rather than determining which one might be "best"), but the goal for this study was to focus on what pathways learners would create as they self-determined their pathways.
For students utilizing Strategy 1, course designers should look for how they can support individualistic learners that stay on the instructor-led modality. A small chance exists that learners will not complete activities, so steps need to be taken to reach out to see what they need to complete their pathway as the course progresses. Reflection and goal setting activities can also possibly help learners complete more activities.
In Strategy 2, learner tactics are similar to those observed in Strategy 1 in that learners work individually to complete the course mostly in the instructivist modality, but they still like to interact with other learners in the process. For course designers, this combination means making opportunities for interaction that are helpful for those that need it, but not necessary for course completion so as to complete with Strategy 1. Strategy 2 also has a   possibility of learners not completing activities, so it would also benefit from the same designs that help learners continue to work toward completion.
Strategy 3 seems to be the optimal methodology for learners to take full advantage of the possibilities with SMLP while still completing the course. These students were highly engaged, tended to complete activities, were highly reflective and also utilized external resources. Identifying the learners that utilized this strategy and studying what activities they completed could possibly give insight into pathway options that could help learners with other strategies. However, caution should be taken to not make it seem like this strategy is the preferred pathway for the course.
Students that utilized Strategy 4 were similar to those in Strategy 3, but with fewer interactions. This aspect mostly highlights the need to allow choice, as not all learners want or need to interact with other learners. Since students were still likely to complete activities, this strategy could possibly lend support to the idea that levels of interaction should be left up to the learner in a pathways course.
Because of the individualized nature of SMLP, all of the results in this study are integrally related to the context of the History course itself. Discussing the context apart from the design Understanding student learning pathways and the learner choices is nearly impossible, as is demonstrated throughout this entire paper. However, course designers for other learning contexts can still gain insight into learner choices by examining the results presented here.
The identified tactics and strategies also raise other questions that need deeper exploration, such as why did some learners skip any form of content and go straight to activity completion? Were students just clicking through the class to get done because they were not trying, or did they have a high level of pre-existing knowledge that was not being addressed in the course design? Future research into SMLP should seek to address these questions and others.

Conclusion
This study sought to understand the patterns, clusters and characteristics of students' pathway choices by performing process mining analysis on the available clickstream data from a History course at a public university. Five tactics were identified based on varying levels of instructiveness, internal or external focus, levels of completing and amount of reflection. Out of these five tactics, four learning strategies were identified that pointed to whether or not learners chose to map their own pathway, follow the instructor-led pathway or mix the two. Various ideas for supporting learners in each of these learning strategies were also discussed.
The work in this study can shape the future of the analytics and learning fields by providing a data-driven basis for guiding future individualized learning tools. The theoretical design of this course gave learners different options to choose which modality to use as they molded their individual learning pathway. By developing systems that can identify learning tactics and strategies, educators can then create dynamic interactive interfaces that can provide suggestions and guidance for learners that are still trying to figure out what options to choose while simultaneously generating an up-to-date artifact of their learning pathway. The challenge for this system would be in creating something that is open-ended enough to allow for flexibility, while still providing enough structure for those that are new to the idea.