Integration of biosensor to a window-based control system for user emotion detection to static and dynamic visual contents of webpages


Purpose
Basic capturing of emotion on user experience of web applications and browsing is important in many ways. Quite often, online user experience is studied via tangible measures such as task completion time, surveys and comprehensive tests from which data attributes are generated. Prediction of users’ emotion and behaviour in some of these cases depends mostly on task completion time and number of clicks per given time interval. However, such approaches are generally subjective and rely heavily on distributional assumptions making the results prone to recording errors. This paper aims to propose a novel method – a window dynamic control system – that addresses the foregoing issues.


Design/methodology/approach
Primary data were obtained from laboratory experiments during which 44 volunteers had their synchronized physiological readings – skin conductance response, skin temperature, eye movement behaviour and users activity attributes taken by biosensors. The window-based dynamic control system (PHYCOB I) is integrated to the biosensor which collects secondary data attributes from these synchronized physiological readings and uses them for two purposes: for detection of both optimal emotional responses and users’ stress levels. The method’s novelty derives from its ability to integrate physiological readings and eye movement records to identify hidden correlates on a webpage.


Findings
The results from the analyses show that the control system detects basic emotions and outperforms other conventional models in terms of both accuracy and reliability, when subjected to model comparison – that is, the average recoverable natural structures for the three models with respect to accuracy and reliability are more consistent within the window-based control system environment than with the conventional methods.


Research limitations/implications
Graphical simulation and an example scenario are only provided for the control’s system design.


Originality/value
The novelty of the proposed model is its strained resistance to overfitting and its ability to automatically assess user emotion while dealing with specific web contents. The procedure can be used to predict which contents of webpages cause stress-induced emotions to users.



Introduction
Emotion recognition based on user experience of web applications and browsing is very useful in a lot of ways. Physiological readings can be captured as a form of distributional data. Such data are used by web designers and developers in enhancing navigational features of web pages. Also, rehabilitation therapists, mental-health specialists and other biomedical personnel often use computer simulations to monitor and control the behaviour of patients (Chen et al., 2000;Skadberg and Kimmel, 2004). Marketing and law enforcement agencies are perhaps two of the most common beneficiaries of such datawith the success of online marketing increasingly requiring a good understanding of customers' online behaviour. For long, law enforcement agents have also used human physiological measures to determine the likelihood of falsehood in interrogations (Isiaka, 2017;Nielsen, 1994;Zhai and Barreto, 2006;Ugur, 2013;Filipovic and Andreassi, 2001;Smith et al., 1999).
There are presently different types of biosensors that can measure the emotional state of arousal to static and dynamic webpages, this paper is limited to the basic ones such as skin conductance response (SCR), pupil dilation (PD) and skin temperature (ST). These physiological responses (SCR) is a function f(s) of electrical changes s of the skin as a result of sweat. These sensors measure the electrodermal activity as it grows higher during states such as interest, attention or nervousness, and lower during states such as relaxation or boredom [equation (1)], depending on the task the user is involved in: where j can be any emotional arousal state. This expression can be further expanded in equation (2). The emotional state f(s) can be stress, neutral or relaxed mood which is equivalent to 0, 1 and s, if j (positive or negative affect) is substituted in the equation; s is a neutral mood that is neither 1 nor 0. User experience can be reflected and determined through their attitude or behaviour (Ugur, 2013). A negative attitude towards a complex application shows a poor experience, whereas a positive attitude towards a complex application shows a good experience (Saint-Aime et al., 2009). The three possible emotional states discussed here that can be experienced by a user during an interactive session could be expressed as stress, relaxation and a neutral mood, and can be demonstrated in the following concept: Some websites have the potential to induce mixed emotions such as anger and frustrations which mostly make the user uncomfortable and dissatisfied at that moment; this kind of reaction from the users induce emotion which is normally termed as stress, especially when they are using the application for the first time. If a user is relaxed during an interactive session with an interface then we say the user finds the application less complex and easy to deal with (Nielsen and Molich, 1990;Nielsen, 1994;Healey and Picard, 2005;Isiaka, 2017). Distinguishing and finding the similarities between stress, relax and neutral affect can be achieved by conducting an experiment that involves the use of physiological measuring sensors to monitor the users' reaction and collect the data. The stimuli eliciting psychological stress that involve stimuli conditions in the form of webpages with static and dynamic contents were considered; these are laboratory based stimuli situations achieved by deactivating some contents on the webpages and taking note of how this affects the users.

Objectives
The main objectives of this paper includes: elicit user interaction and physiological response related data from sampled users using biosensors in ergonomic laboratory; developing a window-based dynamic control system for detecting user emotions on webpages; integrate the control system to biosensor for generalised data extraction and analysis; and making an analytical comparison with neural network and logistic regression.
Most of the time users do not feel comfortable talking about their experiences in a usability study, such as letting people know what they really think or feel about a particular interface. This might be due to the fact that they feel it is socially inappropriate or they feel that they are the problem rather than the interface, this has been noted in older participants (Gross et al., 1997). Objective measures do not rely on a user's experience or assessment, rather they record and measure time and task completion (Bergstrom and Schall, 2014) as user attributes, one novel approach we applied was adding physiological attributes. Physiological response measurements allow for further collection of objective measures of performance (Nielsen, 1994;Nielsen and Molich, 1990;Vermeeraen et al., 2010;Sauer and Sonderegger, 2009), rather than asking participants if they find a task difficult or if they were surprised or their attention was divided when visual stimuli like dynamic content suddenly appeared on screen. The SCR (Andreassi, 2000) can be used to measure their reaction. Objective skin conductance data, when combined with eye-tracking data, can give a different view of the UX, such as a user experiencing emotional arousal with the sudden appearance of dynamic content (Bergstrom and Schall, 2014). Sometimes users may subjectively rate what they feel as non-excited, not-amused, not-interested, or not-stressed, but their physiological response readings may reveal that at that point in time emotional responses occurred which indicates an increase in amplitude that signifies excitement, amusement, interest or stress (Davis, 1990;Kolakowska et al., 2013;Vasalou et al., 2004). The physiological measures used for the purpose of this study are briefly discussed below, most of the user attributes used for the development of the proposed model are also mentioned.
1.2 Pupil dilation and eye movement PD does not only reveal changes in light intensity, it is also a measure of underlying cognitive processes ( Figure 1) as user interacts with visual contents. It provides indices of attention, interest or emotion which are correlated with mental workload and arousal. The variations in pupil change and the average pupil change for a given time interval are considered to be important when measuring eye movement and the behaviour of users in reaction to visual stimuli (Iqbal et al., 2004).
Eye movement is the behaviour of the eye during interaction; the eye gaze pattern is a measure of behaviour. The movement of a user's eyes is based on fixations (location of a user's eye gaze), saccades (rapid movements of the eye from one fixation to another) and fixation duration (length of time a user fixates on a particular area) ( Figure 2) (Bergstrom and Schall, 2014). These parameters are essential when modelling user interaction and physiological response synchronisation, because they are important attributes in terms of emotion detection. The eye movement data obtained includes the PD and fixations captured by the eye tracker. The derived variable is the saccade size D that gives the Euclidean distance between two fixation points (x n , y n ) and (x m , y m ): where x n , y n are fixation points on the vertical plain of a webpage and (x m , y m ) are the fixations on the horizontal plain. For the SCR, the threshold is used to distinguish one consecutive peak from another and defines the tonic phase (baseline). Optimal response (X) on the SCR can be detected based on a given threshold that corresponds to a participant's response at onset and half recovery time of SCR: variables. This is applied to the physiological signal P k such that: X k is the resulted data points by resampling the raw SCR signal P k , taking a window size or polynomial order of 2n þ 1 in P k , for each time interval T k . Each physiological measure undergoes this process depending on how noisy the data is.
ST changes according to blood circulation at the surface of the skin through body tissue (Kamon et al., 1974;Mindfield, 2014). The resultant ST is obtained using equation (4).
To bridge the gap between the rate of emotional response in average users compared to most experienced users requires synchronised events in user interaction and the simulation of these physiological response process (Brandt, 1999;Cooley et al., 1999;Widyantoro et al., 1999;Castaneda et al., 2007;Schneider-Hufschmidt et al., 1993).

Integrating the wireless biosensor to the control system
The eye tracker and Q-sensor has wireless capabilities that enables physiological readings and eye movement data to be synchronised and visualised in a system containing the control system application which operates as a standalone. This can be done in real time with the webpages from the eye tracker visualised on the control system's browser inclined interphase. Optimal response reading from the Q-sensor are mostly seen to correlate to visible spikes and web contents as visualised from the control system. Each user's reading can be generated using this process in real time (Figure 3). Section 2 describes the experiment set and data collection.  Integration of biosensor

Method
The proposed model involved two modules: the first module generates the user attributes from the wireless sensors which includes the SCR sensor (Q-sensor) that also measures ST and the eye tracker (Tobii eye tracker) (Kim et al., 2015;Bixler et al., 2015) (Figure 3), which measures the eye movement behaviour, the second module made predictions on the users' emotional response from the captured user attributes based on the control system for modelling physiological process of users in reaction to web stimuli. Methods such as the neural network and logistic regression are both predictive models which were used for comparison. Before the data collection commences, the experiment was assigned reference number CS77, which was approved by the University of Manchester Senate Committee on the ethics of research on human beings. A total of 44 participants (12 female, 32 male) were recruited from the workers and regular users of the web group, age between 18 and 48 and above. They were recruited through advertisement and recommendations from the University of Manchester. The study took less than 10 min.
The tasks in the study were straightforward and designed in such a way that we can easily detect optimal response and in the manner that users were accustomed to, this included typing words into text-box content and clicking on icons. The participants interacted with static and dynamic web contents by completing six straightforward tasks (Table 1), each of which was designed to encourage interaction with an element. Some of these elements were represented as  Google-Search "Locate Manchester University" Google-Suggest "Locate Manchester University" National-Rail-Enquiries-Search "Look for a train-route from London to Manchester" National-Rail-Enquiries-Suggest "Look for a train-route from London to Manchester" iGoogle "On the CNN.com box, locate news stories" "Read the displayed text contents" Yahoo Portal "Locate the entertainment, sports, news or stories" "Read the displayed text contents" IJCS static information, while others were dynamic. The task "search" encouraged the users to interact with static contents, whereas the task "suggest" encouraged interaction with dynamic contents such as automated lists (ASL). The participants were sited, each facing a TOBII 1750 eye tracker. The webpage data and users' eye movements which include fixations, saccades and PD were recorded. The participants placed their two middle right fingers on the wireless SCR Q-sensor [ Figure 4(b)], leaving the right hand free to perform tasks such as keystrokes and mouse manipulation. Analysis software embedded with the eye-tracker was used to record the eye movements and fixations. The physiological readings can also be visualised with the control system integrated to the eye tracker, the system wirelessly receive data from the Q-sensor (Figure 3) in synch with the users' eye movement. The users commenced with the index page, with links to the task allocated webpages for a total time of less than 10 min; interaction with each page was less than 120 s. Data was collected objectively without interrupting data collection, and exported to the control system.

Window-based dynamic control system
For user attributes on data saved from the sensors, the entire system is represented by the expression in equation (6), this is the control system's model fit to the data with a default prediction focus 4 min from the original emotional stress levels of the user (class labels): where y(t) is the response variables (stress levels) that determine coefficient of physiological reactions with computed variables u(t) which represents the data matrix Z m,p , each input variable has p-values less than the default critical point (0.05).
The control system is designed in such a way that data from the sensor is imported from the index page ( Figure 5). The system consists of two modules: the first module computes the user attributes while the second module make predictions from the user attributes. The steps for the algorithm are stated in Isiaka (2017). From the procedure, to compute the user attributes, X is set as a place holder for the physiological data from the SCR/ST sensor and also PD/eye movement from the eye tracker while Y is the place holder for the eye Figure 4 Wireless SCR Qsensor Integration of biosensor movement data from the Eye tracker sensor. The main aim is to generate a data set Z m,p with m instances and p number of attributes. The first step is to set the initial conditions; each participant i = 1: 44 interacts with each webpages j = 1: 6 from which the user attributes are computed and used to update the matrix Z m,p that serve as the secondary data ( Figure 6). The correlates of optimal response are used to classify the status of participants for each user attribute the status of participants. "Correlates" here represents events from sensor and eye tracker that occurred at time of optimal response (peaks) of SCR. To execute the steps in the algorithm, each participant's generated data was considered based on differences in the baseline (b i ). For each person, the baseline is different, thus increases in amplitude (a i ) is computed based on a set threshold. As the baseline (b i ) for individual users is different, the latency for response time is particularly distinct. The average latency is determined by calculating each delay in a user's SCR's amplitude, which involves taking the time readings at points corresponding to minimum index of high tonic phases of the response signal. This determines the delay for each increase in amplitude of SCR. The learning curve (Figure 7) in the model for cross validation is used to visualise the error in the learning process of the model.

Results
The system integrates physiological readings and eye movement behaviour to produce a single interface where the stress points on the webpages can be seen. For example, a participant felt stressed while looking at ASL on AOI(1) and looking off screen from the Google-suggest page which appeared as blue transparent dots. The result of the spikes in the physiological readings generates an integrated interface with the users' affect state located on the webpages. The status of the user is derived from the computed secondary data. These computed parameters were obtained from physiological readings that Integration of biosensor correspond to eye movement and fixations on a webpage. The increase and decrease in amplitude of SCR correspond to user activity. The average peaks, latency and amplitude were computed for the SCR; likewise the mean PD and the mean skin temperature (MST) of users' responses to the different webpages. The emotional response denoted by stressed, neutral and relaxed mood is indicated by the transparent blue, purple and red spot on the webpages, e.g. a participant experienced stress emotion while looking at ASL on Google page, national rail enquiry page and picture content on yahoo page; a neutral mood is seen on two pages while a relaxed mood is on a Google page.
To compare the control system's model to neural network and logistic regression, each of the model was trained on a sequence of sub-samples and tested on the remaining part of the data with different test splits using the emotional responses as labels for all iterations. Table 2 shows the best accuracy of all training sets. An optimal logistic regression model was selected from runs Cross-validation error curve for logistic regression, PHYCOB I and neural network from the split with best performance IJCS involving the forward, backward and stepwise models and likewise for the neural network model on different schematic structures. The choices were based on the cross-validation error plot in Figure 7. For each split, the training/testing set was used as it indicates the best performance for all splits. The performance result shows that the control system has high accuracy on all sets of simulated data, except at simulated sets M3 and M4 (Table 2), where it shows the worst performance of all the training sets.
For test sets with high performance models, the average number of predicted emotional stress levels with the true and false positive class for the three models is shown in Table 3 below. The predicted classes were obtained by projecting each model training set on the test set (new data), i.e. 30% of the original data.
The optimal models for the logistic regression and neural network with the least errors are then compared to the control system's best output. The performance of system using 70% training gives a significant performance of 0.90. This implies that the model learns more with a higher number of training sets than test sets compared to the other models. Even with high test sets, there is still an indication of highly significant performance and the model seems to resist overfitting owing to regularisation by taking a smooth function of the variables. The crossvalidation error of the training/test set for logistic regression gives the least error between 0.08 and 0.13 at the 54th iteration while the error training/testing set for the neural network model gives the least error of 0.14 at the 59th iteration. The learning curve shows that the model error decreases as the number of training sets increases. The variables that were optimal for each of the forward, backward and stepwise methods for logistic regression indicate the "mean peak" (MPeak), "MST" and "MappedFX" (eye movement) of the webpages as being the best Integration of biosensor parameters for the optimal model with p-values less than the critical value of 0.05 while for neural network the best features include MPeak, MST and SaccadeSize (Saccade size).

Conclusion
This paper focuses on integration of biosensor to a window-based dynamic control system for detecting user emotional responses to web contents which were used has stimuli. The system can also serve as secondary indicator of user stress level based on individual interaction with each webpage. The novel approach we adopted was to determine the physiological correlates of user interaction to webpages by developing an algorithm that also serves as tertiary indicator of stress level in users. To implement the control system, we first conducted an experiment in both real time and with a delay. In real time, it involved participants who are familiar with surfing the web. Data was exported from sensors (SCR and eye tracking) that measure the physiological response of users, including the SCR, ST, eye movement (fixations, saccade, PD), while they interact with six webpages. The system then computes the physiological attributes. In delay, it reads in each individual data and computes the physiological parameters mentioned from the readings, which help to group the emotional stress status of users to each webpage. It also identifies which physiological attributes have the most effect on user interaction to dynamic and static contents; this happens to be the average peak in the SCR of users (MPeak), MST and mapped fixation on the x-coordinate of the webpages (MappedFX). These attributes were also confirmed in other standard techniques. To test the model's reliability and significance, we compared the model to other standard techniques such as neural network and logistic regression. This paper opens the way to possible benefits in terms of predicting human behaviour in respect to the visual experience and internet security by using the tool as an alarm trigger for sending alerts on unauthorised access or abnormal activities online. A control link can also be generated to provide instant access to data visualisation at time of physiological response generation.