Behavioral data assists decisions: exploring the mental representation of digital-self

Purpose – The behavioral decision-making of digital-self is one of the important research contents of the network of crowd intelligence. The factors and mechanisms that affect decision-making have attracted the attention of many researchers. Among the factors that in ﬂ uence decision-making, the mind of digital-self plays an important role. Exploring the in ﬂ uence mechanism of digital-selfs ’ mind on decision-making is helpful to understand the behaviors of the crowd intelligence network and improve the transaction ef ﬁ ciency in thenetworkof CrowdIntell. Design/methodology/approach – In this paper, the authors use behavioral pattern perception layer, multi-aspect perception layer and memory network enhancement layer to adaptively explore the mind of a digital-self and generate the mental representation of a digital-self from three aspects including external behavior, multi-aspect factors of the mind and memory units. The authors use the mental representations to assist behavioral decision-making. Findings – The evaluation in real-world open data sets shows that the proposed method can model the mind and verify the in ﬂ uence of the mind on the behavioral decisions, and its performance is better than the universal baselinemethods formodeling user interest. Originality/value – In general, the authors use the behaviors of the digital-self to mine and explore its mind, which is used to assist the digital-self to make decisions and promote the transaction in the network of CrowdIntell. This work is one of the early attempts, which uses neural networks to model the mental representationof digital-self.


Introduction
The network of CrowdIntell (Chai et al., 2017;Wang et al., 2019) refers to a complex selforganizing ecological network formed by multiple intelligent subjects in the physical space, as well as their behaviors, consciousness and information, which are mapped one by one to the digital-selfs in the information space and interconnected. It is a three-dimensional fusion and deep superposition space of information space, physical space and psychological space.
The intelligent phenomena (Li et al., 2017b) in the information space are not only large scale, but also deeply interconnected, widely interconnected and diversified in forms, which exist in the digital-selfs. To realize the interaction, cooperation and development between numerous digital-selfs, first of all, we need to fully, truthfully, correctly and synchronously project the intelligent subjects in the physical space and their mind in the psychological space into the information space. That is, we need to build a mental model and an interconnection model of digital-selfs .
In recent years, with the development of machine learning and deep learning, deep representation learning has attracted more and more attention and provided new ideas, methods (Xudong et al., 2019) and guidance for the development of CrowdIntell network. Through the powerful representation ability of neural network, a hidden vector representation of potential space is generated for each digital-self, so that each intelligent subject in the physical world corresponds to a dense vector and can express rich semantic information. A good representation can meet the needs of tasks and transactions in CrowdIntell network, such as a social network discovering new friends or a recommendation system recommending potential goods. So, one of the most critical questions in representation learning is: How can we learn representations that satisfy our needs?
The behavior of digital-self is influenced by its mind (Wang et al., 2019). It is a reasonable way to explore the mind from the behavior of digital-self. Mind is a complex abstract concept, which includes the interests, preferences, intentions and other related factors of the intelligent subject, and may even be influenced by social opinions or friends. For example, for a Web celebrity cake, users may not like to eat cake, but still buy Web celebrity products. Then the user's behavior is implied by the influence of the underlying factors from a number of aspects, what called the mind.
We use the mind, which has been explored, to assist decision-making to facilitate the efficiency of CrowdIntell network. For example, when going shopping in the mall, everyone hopes to find the goods they need as soon as possible. In addition, when there is no clear demand, they will always be attracted by some goods that conform to their aesthetic taste and meet their potential intention (Yoshida et al., 2020). Our goal is to explore the mind of intelligent subjects (users) in CrowdIntell network, retrieve the items most relevant to the mind and recommend them to the users, so as to facilitate transactions. At the same time, if the network can accurately provide optimal suggestions for users to make decisions, which will catalyze users' dependence on CrowdIntell network, and promote the development of CrowdIntell network and the intelligent degree of the crowd.
In this paper, we propose a novel method called adaptive multi-aspect mental exploration (AM 2 E). It uses behavioral patterns perceptual layer, multi-aspect perceptual layer and memory network enhancement layer to explore the mind of the digital-self and assist the digital-self to make decisions. Specifically, the authors use transformer encoders to encode behaviors, model dependencies and potential connections between behaviors. The potential representation of different aspects can be generated by multi-dimensional attention model modeling, which enables more differentiated attribute information reflecting the user's mind to have a greater impact on the corresponding potential representation. Then the complex and comprehensive mental representation is generated by the adaptive fusion module. In addition, the ability of mental representation is enhanced with the assistance of memory network. The result is a representation of a digital-self, i.e. the mind of a digital-self.
Our contributions in this paper can be summarized as follows: We propose to explore the mind, to assist the digital-self to make decisions by mining the mental representation of the digital-self through its behaviors and finally to improve the efficiency of trading in the network of CrowdIntell. We explore the mind from the behavioral patterns and the multi-aspect factors behind the behaviors, and also enhance the mental representation through memory networks. Forming a mental representation framework for digital-selfs. The experiments in real-world open data sets show that using mental representations to assist decision-making has a significant improvement compared with universal interest modeling baseline methods.

Related work
With the development of society, human beings have entered the era of CrowdIntell network (Li et al., 2017b). We are in an environment where everything is interconnected. To realize the interaction, cooperation and evolution of various digital-selfs in CrowdIntell network, the establishment of mental model is very important. The mental model is expected to lay a foundation for the theory of modeling and simulation in the research of crowd science and engineering. More specifically, mind is crucial in the decision-making (Kumar and Bishnu, 2019) process, and exploring mental representation can improve transaction efficiency.

Intent representation
Regarding the representation of user intent in behavioral items (Jingwei et al., 2019), it has always been a research hotspot. DIN  model captures the user's interest points hidden in behavior items by introducing the attention mechanism. STAMP  captures the user's long-term overall and current short-term interest preferences from behavior items. There are also efforts Li et al., 2019;Cen et al., 2020) to capture multiple interest (or intent) representation in user behavior items. Although our work is deeper than modeling intent, which aims to explore the user's mind, there is still a lot to learn from and refer to.

Traditional method
Matrix factorization (MF) (Xiangnan et al., 2017;Steffen et al., 2012) is the most widely used method. This method can obtain the hidden factor vector of each user and each item to estimate the user's predicted score for a certain item through the inner product between the vectors. The implicit vector of the user can be simply understood as the user's mind. The principle behind this is to find the items related to the user's mind. BPR-MF (Steffen et al., 2012) uses MF with the pairwise Bayesian personalized ranking (BPR) loss. NeuMF (Xiangnan et al., 2017) uses both neural network architecture and MF to model linear and nonlinear user-item characteristics.

Sequential method
Different from the traditional method (Liu et al., 2015(Liu et al., , 2017Peng et al., 2018), the user's interactive behavior is strictly time sequential in the CrowdIntell network. Sequential methods use historical behavioral data arranged in chronological order to model mental representation for assisting decisions. Earlier applications in the sequential method (He and McAuley, 2016;Rendle et al., 2010) are based on Markov chains (MCs), which were designed Behavioral data assists decisions to model sequential dependencies between user behaviors. A classic model FPMC (Rendle et al., 2010) combines the two methods of MF and MC. With the wide application of deep learning technology, the recurrent neural network (RNN) shines brilliantly in the field of sequence problems. Two variants of RNN, LSTM (Hochreiter and Schmidhuber, 1997) and GRU (Kyunghyun et al., 2014) are widely used. A large number of RNN-based works (Gharibshah et al., 2020;Bal azs and Alexandros, 2015;Li et al., 2017a) has been explored as a decision assistant tool. Among them, GRU4Rec (Bal azs and Alexandros, 2015) has attracted attention as a pioneering work. In addition, many new models have been proposed through variants of GRU. For example, add personalization (Quadrana et al., 2017), contexture (Smirnova and Vasile, 2017) and attention mechanism (Li et al., 2017a). Simultaneously, the convolutional neural network-based sequential model has been explored. Caser (Tang and Wang, 2018) proposed to embed a sequence of recent items into the latent spaces and to learn sequential patterns using both horizontal and vertical convolutional filters. To solve the shortcoming that the perception of the model is limited by the size of the convolution kernel, NextItNet (Yuan et al., 2019) has also been proposed inspired by temporal convolutional networks. MIAR (Zhang et al., 2021) uses lightweight attention modules in convolution to extract fine-grained features of users' multi-interest representation.
Because of the great success of the BERT model and transformer in the field of NLP, the attention mechanism (Vaswani et al., 2017) has been incorporated into the sequential filed recently. It is different from using the attention mechanism as an additional component of the original model [such as attention combined with attention (Li et al., 2017a)]. Recently, pure attention-based models SASRec (Kang and McAuley, 2018) and BERT4Rec (Sun et al., 2019) have been proposed. These models rely on self-attention mechanisms to model sequential patterns between sequences. Apart from this, HGN  adopts adaptive gating network to model sequential features.

Problem statement
CrowdIntell network projects the interactions between intelligent subjects (users) and intelligent objects (items or products) in the physical space onto the network environment. However, at the present stage, the development of CrowdIntell network is in its infancy. The interaction behavior in the network is extremely sparse, and the stability and robustness of the network are poor. For the healthy development of the network, we need to facilitate transactions in the network to obtain a more comprehensive representation of the digital-self and a more stable interconnection structure.
We aim to explore the mind from the behavior and the multi-aspect factors behind the behavior, and to use mental representation to assist the digital-self in making decisions. In other words, we will use the mind to conduct behavior retrieval and recommend the behaviors most relevant to the mental representation to users, which are used to assist decision-making and improve the trading efficiency of CrowdIntell network. We can define it as follows.
Definition 1: Given CrowdIntell network, we focus on one of the specific sub-domains (i.e. explore the mind to promote trading efficiency). In this paper, U, V and G represent a set of user, behavioral item and item attributes, respectively. An item a [ V may have multiple attributes, which we denoted as G a [ G.
Definition 2: For a specific user, u [ U is associated with a sequence of historical behaviors, and we can sort the behavior records in chronological order The index i for X u i denotes the relative time index.

IJCS 5,2
Definition 3: With the above notations, our goals are as follows. Given a user u and u's historical behavior sequence B u , our purpose is to infer u's mind by considering comprehensively, and recommend a list of behavioral items that maximize the completion of the transaction.
We summarize the process by the following formula. The input data is processed through an encoder to explore mental representation. Then we can get the prediction score of candidate behavioral items through inner product, and sort the candidate behaviors by the score. The top-K items with the highest score will be recommended to the user. The prediction score can be simply expressed as follows: where f enc represents encoder, e a is the implicit vector of behavioral item a. Prediction scorê r u;a is used to measure the user u interaction probability to item a.

Proposed approach
In this section, we introduce the proposed method AM 2 E, which incorporates a multi-aspect perceptual module to learn multi-aspect faction representation and an adaptive mind fusion module to aggregation fine-grained mind representation. More importantly, the memory network and transformer encode are introduced, respectively, which significantly improves the ability to learn features. The overall architecture of AM 2 E is shown in Figure 1.

Input layer
User ID and item ID are one-hot encoded, and item attributes data is multi-hot encoded, that is, an item may correspond to more than one attribute. The input of the model is the ID Figure 1.
The model architecture of AM 2 E Behavioral data assists decisions coded representation after data preprocessing. It mainly contains the following data: the user input u, u's historical behavior sequence B u and auxiliary information related, such as the filling matrix and the actual length of the sequence.

Embedding layer
The original input of user and item IDs have very limited representation capacity, as higher characteristic dimension and extremely sparse. Through a special full-connection layer, the features obtained from the input layer are transformed into a dense low-dimensional vector representation. Embedding the user u's ID characteristics as u em 2 R d , for user u at time step t, we retrieve the input embedding matrix E u;t ð Þ 2 R LÂd by looking up the previous L behavioral items X u tÀL ; Á Á Á ; X u tÀ1 . Where d is the latent dimensionality. The experiment shows that the position information in the sequence also has an obvious promoting effect on the downstream tasks. Here, the sinusoidal position coding function PE is used to map the project position to position embedding. The embedded matrix using positional encoding is defined as follows: At the same time, the interactive item attributes sequence is transformed into an embedded matrix with dimension L * d by aggregating multiple attributes of each item, which is expressed as C u;t ð Þ 2 R L*d . The common aggregation methods include maximum pooled aggregation and average pooled aggregation, and average aggregation is adopted here.

Behavioral pattern perception layer
By measuring the similarity of different behavioral items in the sequence, this layer models the potential relationship between behavioral items in the sequence, and then models behavioral patterns and the collaboration relationship between items. Transformer encoders are used as encoders here. Previous studies have shown that transformer encoders can effectively capture various types of sequence dependencies (such as point-level dependencies and group-level dependencies) and long-term dependencies.
When calculating the similarity of different items, various methods can be selected, such as inner product, multi-layer perceptron and addition and subtraction. Here, inner product is chosen to measure the similarity. The attention mechanism is calculated as follows: where Q represents the queries, K represents the keys and V represents the values. The factors ffiffiffi d p play a regulatory role so that the inner product is not too large, otherwise softmax may be invalid.
We use the transformer framework to model the interaction between the items in the sequence and their contextual neighbors. Given the item representation of the b À 1 sequence H b-1 , the output of the transformer encoder at layer b is defined as follows: where the projection matricesW Q i ; W K i ; W V i 2 R dÂd=h ; W h 2 R dÂd . FFN(s²) represents feed-forward network, h represents the number of heads and b represents the number of IJCS 5,2 layers. Here, we omit the residual network, dropout and layer normalization strategies in the formula for convenience.
In our experiments, we can repeat the basic structure of the transformer encoder several times to obtain long-term and complex dependencies. The first self-attention block can consider similarities and potential connections between previous items. On this basis, we can model more complex relationships by stacking multiple attention blocks.
In general, the coding process can be summarized as: where f T-enc (s²) represents the abstract transformer encode function, and H u;t ð Þ 2 R L*d represents the result of the original sequence encoded by the encoder. After obtaining the output H (u,t) at the last layer, we obtain the representation of H u tÀ1 2 R d as the user's most recent interaction item X u tÀ1 , which is denoted by h for simplicity. In addition, h also takes into account the influence of previous items and the synergistic effect of items in the sequence. The experiment also shows that the representation effect of using h as the sequence is better than average pooling or maximum pooling.

Multi-aspect perceptual layer
The multi-aspect perceptual layer is divided into two parts. The first part is multi-aspect perceptual module, which measures the different attributes of the behavioral items and abstracts the attributes into different aspects of the user's mental factors. The second part is the adaptively fusion module, which considers the recent item adaptively and fusion user's multiple factors to generate hybrid mind representation.
4.4.1 Multi-aspect perceptual module. The multi-aspect perceptual layer measures the different influences of different items in the sequence on the multi-aspect and combines the general preferences of users, through the multi-dimensional attention module assign weights to obtain the representation of users' minds in different factors.
This enables the more differentiated attribute information that reflects the user's mind to have a greater impact on the corresponding mental factors. At the same time, the general preferences of users will also have an impact on the behavior of users. For example, if a user likes goods with beautiful appearance, he will be inclined to choose goods with beautiful appearance no matter he buys daily necessities or electronic products. In the layer, the attribute characteristics of user's historical interaction behavior and user's general preference are integrated to explore user's mind. Formally, we can express this process as: where the model parameters W u 2 R dÂd ; W k 2 R dÂk ; tanh Á ð Þ represent a nonlinear activation function, represents element-wise product, I U represents a full ones matrix for dimensions such as C (u,t) . One of the most important hyper-parameter k controls the number of aspects. S I 2 R LÂk represents a multi-dimensional attention score and Z u;t ð Þ 2 R kÂd is a mental matrix representation of multiple factors, with each row in k rows representing a specific aspect.
4.4.2 Adaptively fusion module. For users, there are multiple factors affecting their behavioral decisions. However, items that are highly correlated with recent behaviors tend Behavioral data assists decisions to generate more feedback from users. Therefore, to maintain the diversity and novelty of recommendations, we need to give higher weight to the mental factors related to the user's recent interactions in the next recommendation, as well as other factors. The process of adaptive fusion is defined by the following formula: where the model parameters W mi 2 R dÂd ; Relu Á ð Þ represent a nonlinear activation function. S Z 2 R k represents the similarity score between the user's most recent interaction behavior and the user's mind factors representation and I u;t ð Þ 2 R d represents the mixed mind representation after adaptively fusion.

Memory network enhancement module
Adaptively fusion module uses the attributes of the items to model the user's multi-aspect mind factors in an implicit way (that is, it does not explicitly indicate what the mind factors are). An item has multiple attributes, indicating that there may be different users interacting with it for different purposes. While it is possible to model items through their attributes, it is not effective to model more complex underlying relationships and items associations. For example, the appearance of a commodity is beautiful, which is not well reflected in the attributes of the commodity. To model the deeper characteristics of items, we use a memory network to encode the complex underlying characteristics of items in the internal storage through read and write operations. By means of the associative addressing scheme in the memory network, the feature level fusion is realized by adaptively discovering the item features related to the user mind. Concrete, this article uses the key/value pair memory network, formally defined as M K 2 R jVjÂd m ; M V 2 R jVjÂd m , respectively. Therefore, the enhanced mind of using memory network is defined as follows: where d m is the dimension of the memory unit. M u;t ð Þ K ; M u;t ð Þ V 2 R LÂd m represents the key/ value of memory units corresponding to the given previous L behavioral items. Given the user's mixed mind as a query, using this query to find the appropriate combination of potential features of the items in the memory network, S M 2 R L represents the score of the mind for the items, and M u;t ð Þ 2 R d represents the final representation of the enhanced mind by the memory network after passing a shared mapping matrix.

Prediction layer
As mentioned above, the user's mind needs to consider: user's behavioral patterns and collaborative relationships between behavioral items representation h; user's mixed mental factors representation I (u,t) ; and the enhancement of the user's mixed mind representation M (u,t) .

IJCS 5,2
The predicted score for candidate behavioral item a(a [ V) at t time step is calculated as follows: where e a 2 R d shares the embedded item matrix parameter with the embedding layer.

Model training
In this paper, the truncation and padding strategy is adopted to convert each user behavior sequence (excluding the last operation) to a fixed-length sequence B (u,t) , and the interaction item X u t corresponding to the time step t is taken as the prediction result. We convert the prediction score into probability and take the negative log likelihood function for model optimization. In other words, we use the cross-entropy loss as the objective function: where H is a set of the model trainable parameters and l is the regularization parameter. s = 1/(1 þ ex ) is sigmoid function. The network is optimized by the Adam optimizer (Diederik and Jimmy, 2014), which is a variant of stochastic gradient descent with adaptive moment estimation.

Experiments
We take e-commerce, online services and other typical crowd intelligence scenarios as examples to verify the effectiveness of the proposed method. On standard benchmark data sets from different domain, we compare the proposed method with the baseline method.

Data sets
This work conducts experiments on five common data sets collected from real-world platforms, which come from different domains and have different sparsity levels. To ensure that each user/item has enough interaction, we follow the preprocessing procedure in Zhou et al. (2020), which only keeps the "5-core" data sets. This means that users and items with fewer than five interaction records are deleted. The processed data statistics are summarized in the following Table 1. Amazon [1] data set is widely used to evaluate the performance. According to the category of products on the Amazon platform, this work selects three subcategories, beauty, sports and toys, and uses the categories and the brands of the items as attributes. In the LastFM [2] data set, the artist tags given by the users are used as attributes. Yelp data set is collected by Yelp [3], which is the largest review site in the USA. We follow the preprocessing procedure in Zhou et al. (2020) and use the transaction records after January 1st, 2019. In addition, we treat the categories of businesses as attributes.

Evaluation metrics
In our work, the strategy of "leave one out" was adopted to divide the data sets. For each user, we use the last behavioral item as test data and the item before the last item as validation data. The rest are used for training. To save computation resources and time, we randomly sampled 99 negative sample items according to the popularity of the items, and constituted a candidate set with ground-truth. We report the evaluated results by three popular top-K metrics, namely, hit ratio (HR@K), normalized discounted cumulative gain (NDCG@K) and mean reciprocal rank (MRR). Here, we empirically set K to 1, 5 and 10. We omit NDCG@1, because its result is equal to HR@1.

Baselines
We compare our method with the following baselines: NeuMF (Xiangnan et al., 2017): This method combined with the traditional generalized MF and multilayer perceptron can capture both linear and nonlinear interaction features between users and items. BPRMF (Steffen et al., 2012): This method is based on BPR, which uses pairwise coding method to sort all items for each user. FPMC (Rendle et al., 2010): It combines MF and MC to fuse sequence and personalization information. GRU4Rec (Bal azs and Alexandros, 2015): This method is a variant of RNN, which uses GRU to capture sequential dependencies and make recommendation. Caser (Tang and Wang, 2018): It uses vertical and horizontal convolution to learn users' sequential patterns for sequential recommendation. STAMP : It considers the impact of the user's current actions on the next step and captures the user's long-term overall and current short-term interest preferences. SASRec (Kang and McAuley, 2018): It uses self-attention mechanism to capture the user's sequential pattern for sequential recommendation. BERT4Rec (Sun et al., 2019): It uses the bi-direction self-attention mechanism to model the sequence of user behavior and constructs the bi-direction representation model by Cloze task learning. HGN : This method adopts adaptive hierarchical gating unit to model sequential features.

Experiment settings
For fair comparison, we collect open source code or source code provided by the corresponding authors. We implemented them by using PyTorch [4] based on those codes while keeping the data format and evaluation metrics consistent with our work. All hyperparameters are tuned by grid search on the validation set. For the proposed method AM 2 E, we set batch size as 256, learning rate is 0.001 and the weight of the L2 regularization is set to 5 Â 10 -4 . The model latent dimension and memory unit dimension are both set to 64. The maximum sequence length L is set to 20, as the average length is low in most cases. As for the transformer encode, we set number of heads h and number of layers b as 2. Another important hyper-parameter, multi-aspect mind factors, is set to 5. All parameters are tuned by grid search. Our experiments are conducted with PyTorch running on GPU machines (Nvidia GeForce TITAN RTX). IJCS 5,2 5.5 Performance comparison The performance comparison results are shown in Table 2. It is worth noting that improvement represents difference between best results and second-best results. From the results, we have the following observations.
Overall, the proposed model performs better than all baselines in the experiment. Among the baseline methods, the performance of the sequential methods (e.g. SASRec and GRU4Rec) is better than non-sequential methods (BPRMF and NeuMF). This shows that historical behavior records can be used to effectively model behavioral patterns and potential associations between items. In the sequential behavior baseline method, SASRec uses self-attention mechanism and outperforms the other baseline methods, indicating the effectiveness of attentional mechanism in modeling sequential behavior. In addition, the performance of HGN is comparable to SASRec. This indicates that the gating network can simulate the relationship between related items well.
We observed that all methods had generally low scores on the sports data set. When there are more attributes in the data set, it means that the distribution of items is more sparse. Therefore, the overall data distribution is sparse, which makes it more difficult for the methods to model the potential correlation.
Different from simple modeling of user interest or behavioral pattern, the proposed method comprehensively considers multi-aspect mental factors of users and incorporates behavioral pattern into them. Experiments show that the performance is outperforming all baseline methods. The significant improvement in the comparison results demonstrated the effectiveness of mental representations in decision-making.

Ablation study
There are three important components in our model. The first part represents behavioral patterns and potential relationship between behavioral items, which are hidden in historical behavior sequence. The second part represents adaptive fusion of multi-aspect mental factors. The third part represents hybrid mental representation enhanced by memory network. Then we verify the effectiveness of important structures through ablation experiments. For the validity of the data, we repeated the experiment several times and took the average to get the final result. Figures 2 and 3 show the performance of our default method and its ablation variant on all data set. We introduce them respectively and analyze their influence.
We removed behavioral patterns, multi-aspect perception and memory networks, respectively, to test the validity of each component in mental representation. In addition, DEFAULT stands for the AM 2 E method, and their variants are named W\O BP, W\O MA and W\O MN, respectively. At the same time, we chose SASRec and AM 2 E as comparison. From our experiments, we can see that all the results outperform SASRec, and AM 2 E is always at the optimal level, indicating that each part of the proposed method contributes to the modeling of mental representation.
The importance of the three components varies from one data set to another. As the experimental results show, memory network plays a more important role in Yelp data set, multi-aspect perception is more important in LastFM data set, while behavioral pattern is crucial in Amazon data sets. In general, behavioral patterns and multi-aspect perception play a more important role than mental representation enhancement. Mental representation enhancement is more like icing on the cake.

Influence of embedding dimension
We analyzed the key hyper-parameter, embedding dimension, to understand the impact of embedding dimension on the performance of the proposed model in this paper.

3.87
Notes: The best results are highlighted in italic and the second-best results are highlighted with an underscore. All reported improvements over baseline methods are statistically significant at 0.05 level of the different embedding dimensions on the beauty data set are shown in Figure 4 and compared with other representative approaches. As shown in Figure 4, considering the different embedding dimensions, the AM 2 E model outperforms the other approaches in most cases. This further demonstrates the effectiveness of the proposed model and we can see that SASRec performs better with lower embedding dimensions, possibly because transformer is expressive enough to capture sequence features, and the model presents an over-fitting situation as the dimensions get larger. However, AM 2 E model proposed in this paper shows a trend of increasing first and then decreasing when the embedding dimension changes, which indicates that increasing the embedding dimension will improve the performance to a certain extent, but as the embedding dimension is too high, it will lead to the over-fitting of the model.

Influence of sequence length
We also analyzed another key hyper-parameter, sequence length, to understand the effect of sequence length on the performance of the proposed model in this paper. The results of different sequence lengths on the beauty data set are presented in Figure 5 and compared with other representative methods.
As shown in Figure 5, AM 2 E is generally superior to other comparison methods under different sequence lengths. It is worth noting that the effect of HGN model decreases with the increase of sequence length, and the HGN model shows better performance when the sequence length is shorter, which indicates that HGN has modeled relatively shallow features, but has shortcomings when modeling longer dependencies. However, the proposed model in this paper,  200 AM 2 E, shows a slight upward trend when the sequence length increases. Combining with the characteristics of which average sequence length is short on beauty data set, it indicates the stability and robustness of the model in the face of long sequences and complex dependencies.

Influence of feature
In the proposed method, we use attribute information of behavioral items. In some sequential methods, attribute information is often not included. Some works and experiments can prove that the introduction of attribute information can enhance the representational ability of the item and achieve better results. To demonstrate that our approach can model more complex underlying relationships, rather than relying on attribute information, we have implemented several methods, GRU4REC_F, SASREC_F and HGN_F, which are based on the combination of original sequential model and attribute information. Specifically, we merge attribute embedding with item embedding to replace the original item embedding. Figure 6 shows the performance of the proposed method and the methods described above on all data sets. Through the experimental results, we can see that not all the sequential models can benefit from the introduction of attribute information, e.g. the performance of HGN is obviously degraded after the introduction of attribute information. In most cases, the introduction of attribute information has a significant impact on the performance of the model, which is in line with our expectations. Another obvious result is that although these sequential methods introduce attribute information, they do not outperform our method on all data sets.

Conclusion
In this paper, we propose a novel method called AM 2 E. We designed three different structures to explore mental representations, incorporating behavioral patterns module, multi-aspect factors fusion module and memory network enhancement module in a comprehensive and integrated manner. Experiments on public data sets show that each structure plays an important role in mental representation. More importantly, the experiments show that the use of mental representations to assist decision-making has a significant improvement compared with traditional interest modeling method. CrowIntell network provides more precise recommendations for users' behavioral decisions, which can catalyze the dependence of digitalselfs on the CrowIntell network and promote the improvement of transaction efficiency.