Is voice really persuasive? The influence of modality in virtual assistant interactions and two alternative explanations

Purpose – Virtual assistants are increasingly used for persuasive purposes, employing the different modalitiesofvoiceandtext(oracombinationofthetwo).Inthisstudy,theauthorscomparethepersuasiveness ofvoice-andtext-basedvirtualassistants.Theauthorsargueforperceivedhuman-likenessandcognitiveloadasunderlyingmechanismsthatcanexplainwhyvoice-andtext-basedassistantsdifferintheirpersuasive potentialbysuppressingtheactivationofconsumers ’ persuasion knowledge. Design/methodology/approach – A pre-registered online-experiment ( n 5 450) implemented a text-based and two voice-based (with and without interaction history displayed in text) virtual assistants. Findings – Findingsshowthat,contrarytoexpectations,atext-basedassistantisperceivedasmorehuman-like compared to a voice-based assistant (regardless of whether the interaction history is displayed), which in turn positivelyinfluencesbrandattitudesandpurchaseintention.Theauthorsalsofindthatvoiceasacommunication modalitycanincreasepersuasionknowledgebybeingcognitivelymoredemandingincomparisontotext. Practical implications – Simply using voice as a presumably human cue might not suffice to give virtual assistants a human-like appeal. For the development of virtual assistants, it might be beneficial to actively engage consumers to increase awareness of persuasion. Originality/value – Thecurrentstudyaddstotheemergentresearchstreamconsideringvirtualassistantsin explicitly exploring modality differences between voice and text (and a combination of the two) and provides insights into the effects of persuasion coming from virtual assistants.

Virtual assistants and perceived human-likeness Human-likeness is a key concept in the field of human-machine communication (Guzman and Lewis, 2020).Several studies indicate that humans ascribe human characteristics to nonhuman agents such as virtual assistants (Epley et al., 2007; for a literature review see Rapp et al., 2021).However, very little is known about the extent to which the way in which we communicate with a virtual assistant, i.e. the modality (voice-based and/or text-based), influences our perceptions of human-likeness.To the best of our knowledge, only one study so far has examined differences in perceived human-likeness between voice and text modality of the same virtual assistant source.Cho et al. (2019) found that voice (vs text) is perceived as more human-like, subsequently leading to more positive attitudes toward a virtual assistant for utilitarian tasks.As virtual assistants become increasingly prevalent in the marketing field to influence consumers, our research aims to add to this explicit modality comparison by applying it to a persuasion context.
We conceptualize perceived human-likeness as a combination of anthropomorphism or humanness of a technology (adapted from the humanness index) and social presence (Cho et al., 2019).Anthropomorphism is the attribution of human qualities to the virtual assistant (such as friendliness or lifelikeness, Kim and Sundar, 2012), while social presence taps into the perception of communicating with a social interaction partner (Lee, 2004).Even though there are conceptual differences in the literature, they are closely related as they both relate to the sociability or the human touch of a (virtual) entity.
We draw on the Computers are Social Actors paradigm (Reeves and Nass, 1996), stating that humans apply social rules to interactions with technology similarly to interactions with other human beings.Following this paradigm, using social cues in technology interactions enhances social responses.Given that the interaction with voice is an inherently human characteristic (Pinker and Bloom, 1990), a virtual assistant emulating human voice is expected to be perceived as more human-like in comparison to communicating via text only (see Figure 1).This assumption is supported by Schroeder and Epley (2016), who found in a series of experiments that paralinguistic cues in speech (e.g.pace, intonation) influence people's perceptions of human-likeness in comparison to text-based interaction.To extend these previous findings, we firstly examine whether voice itself (in comparison to text) enhances perceptions of human-likeness.We propose the following: H1.A voice-based virtual assistant is perceived as higher in human-likeness than a text-based virtual assistant.Virtual assistants and persuasion knowledge A large body of literature has been devoted to understanding how people recognize, process and respond to persuasion techniques that are subtler than traditional advertising, such as sponsored content, brand placement or advergames (Boerman et al., 2012;Tutaj and van Reijmersdal, 2012).Building on the persuasion knowledge model of Friestad and Wright (1994), one central claim of this line of research is that these formats include hidden persuasive attempts that are less identifiable than traditional advertising.Studying these more subtle forms of advertising is important considering consumer empowerment, as consumers might be more prone to persuasive attempts that can potentially be misleading.
Virtual assistants can embed a persuasive attempt in their messages, e.g. by advertising brands in their communication.The persuasion knowledge model is therefore a useful anchor point to develop a research model for explaining the persuasive effects of advertising delivered by a virtual assistant.The concept of persuasion knowledge describes a range of competences that are related to the understanding of advertising in general, and the persuasive intent of advertising more specifically (Tutaj and van Reijmersdal, 2012).Persuasion knowledge can be developed through experience with the persuasion technique and through socialization (Friestad and Wright, 1994).While consumers might already have developed some persuasion knowledge with regard to other persuasive techniques (e.g.sponsored content, celebrity endorsement; Boerman et al., 2017), it can be assumed that the persuasion knowledge about virtual assistants as a new technique is less refined.
While originally conceptualized as a dispositional variable (Friestad and Wright, 1994), several scholars have shown interest in situations that can activate higher or lower levels of persuasion knowledge following a specific persuasion tactic (Boerman et al., 2012;Campbell, 1995;van Noort et al., 2012;Tutaj and van Reijmersdal, 2012).Within the broad range of conceptualizations in literature and empirical studies (for an overview, see Boerman et al., 2018;Ham et al., 2015), the current study specifically applies consumers' understanding of a persuasive or selling intent in an online advertising context to the context of virtual assistants.Hence, persuasion knowledge is treated as a situational variable (van Noort et al., 2012;Tutaj and van Reijmersdal, 2012).
Perceived human-likeness as the primary explanation for reduced persuasion knowledge This research studies perceived human-likeness as the primary underlying mechanism explaining the effect of modality on reduced persuasion knowledge.Examining the relationship of perceived human-likeness and persuasiveness is crucial, as virtual assistants are characterized by their social or human-like cues that can potentially make them more influential.Research on paralinguistic cues builds on the idea that voice, in comparison to text, provides more opportunities for the illustration of human traits, states and feelings in the communication (Van Zant and Berger, 2019).This line of research has found voice to influence persuasiveness via positive perceptions of these human characteristics (i.e.having a confident appearance; Van Zant and Berger, 2019).
Moreover, Campbell and Kirmani (2000) identified the accessibility of ulterior motives as a key factor responsible for whether persuasion knowledge is used.In other words, if consumers infer that the underlying motive for the conversation with the virtual assistant is persuasive (rather than social) in nature, they are more likely to activate persuasion knowledge.This applies especially in a situation in which a persuasive attempt is embedded in a conversation.We argue that when the interaction via voice is perceived as more human-like, virtual assistant users apply social responses toward it (following the CASA paradigm; Reeves and Nass, 1996), including greater social attractiveness (Lee, 2010).

Is voice really persuasive?
Consumers might thus be more likely to infer social motives for the interaction with the virtual assistant such as relationship building or helping, rather than a persuasive motive.Hence, the perception of human-likeness might lead to lower persuasion knowledge.However, since empirical evidence is lacking to formulate a supported hypothesis, we propose the following research question: RQ1.Does interacting with a voice-based virtual assistant lead to lower persuasion knowledge in comparison to interacting with a text-based virtual assistant mediated by higher perceptions of the virtual assistant's human-likeness?
Cognitive load as an alternative explanation As an alternative explanation, this study includes cognitive load.Cognitive load has been shown to play an important role when examining persuasion (e.g.Berry et al., 2005).
The concept represents the mental burden that a particular task imposes on a user's cognitive system (Paas and Van Merri€ enboer, 1994).Text is self-paced (and thus more controllable) and allows to go back and forth in the interaction (for text-effects in multimedia learning, see Schmidt-Weigand et al., 2010).Based on cognitive load theory (see Leahy and Sweller, 2011), it is likely that interacting via text imposes a lower cognitive load on the user compared to voice that is more demanding.This notion has been supported by a study on virtual assistants by Berry et al. (2005), finding text as being easier understood than voice.Further, research on modality differences (in for example the word-of-mouth context) showed that the asynchronous nature of written communication allows greater time to construct and refine what to say (Berger and Iyengar, 2013).To test this assumption, we propose the following [1]: H2. Cognitive load is higher for interacting with a voice-based virtual assistant than for interacting with a text-based virtual assistant.
For persuasion knowledge to be activated and utilized, information has to be retrieved from memory (Campbell and Kirmani, 2000;Hossain and Saini, 2014).This makes cognitive processing the second primary antecedent (Kirmani and Campbell, 2008), for both, recognizing the persuasive attempt and responding to it (Campbell, 1995;Hossain and Saini, 2014).Research showed that cognitively busy people are less likely to activate and use their persuasion knowledge in a given situation (Campbell and Kirmani, 2000).Applied to the virtual assistant interaction, this means that the higher the cognitive load imposed on a consumer by communicating with the assistant itself, the less cognitive processing of the actual (content of) interaction occurs.If this is the case, the consumer is then also less likely to activate their persuasion knowledge.The effect of voice on persuasion knowledge might then be attributed to differences in cognitive load, complementing perceived human-likeness of the voice-based virtual assistant.To test this assumption, we propose the following research question including both, perceived human-likeness, and cognitive load: RQ2.Does interacting with a voice-based virtual assistant lead to lower persuasion knowledge in comparison to interacting with a text-based virtual assistant, mediated by both, higher perceptions of the virtual assistant's human-likeness and higher cognitive load?
Interaction history reducing cognitive load Virtual assistants can implement the display of interaction history in addition to using voice, as for example done in smart displays (Seifert, 2020).In other words, everything the voice-based virtual assistant delivers verbally is translated into text and shown on the screen.
Hence, cognitive load can be lifted in both, the communication with the text-based, and the communication with the voice-based virtual assistant.To make sure that the effects proposed can be attributed to the communication modality, we need to examine as well whether interacting with voice, but seeing the interaction history, results in the same effects as voice.
This also allows us to study a wider array of virtual assistant modalities that are implemented in practice (e.g.smart displays).Van Zant and Berger (2019) proposed that the human-likeness of paralinguistic cues might increase persuasion, but that these effects might disappear in the presence of linguistic cues that facilitate the detection (e.g.displaying the interaction in text on the screen).In other words, when cognitive load is lower, consumers have more cognitive resources left to detect persuasive intents coming from the virtual assistant.Further supported by classical modality studies (e.g.Pfau, 1990), text-based communication triggers people to be more focused on content-characteristics and less distracted by sourcecharacteristics.A further explanation for cognitive load being lifted when voice-based communication is accompanied by text lies in the dual coding theory (Paivio, 1986).Dual coding theory states that verbal and visual information are coded differently, and additive effects exist for both types of codes.For the current study this means that the combination of verbal (i.e.voice) and visual (i.e.text) is easier to comprehend than text alone.Hence, to test whether the mediating effect of perceived human-likeness still holds when cognitive load is lifted in the voice-based virtual assistant through the display of interaction history, we propose the following: RQ3.Does interacting with a voice-based virtual assistant supported by a text-based interaction history lead to lower persuasion knowledge in comparison to interacting with a text-based virtual assistant, mediated by higher perceptions of the virtual assistant's human-likeness?

Persuasion knowledge and advertising effectiveness
Virtual assistants are used in marketing to disseminate persuasive messages and ultimately, to increase advertising effectiveness.Hence it is imperative to understand how persuasive knowledge and advertising effectiveness are related in the context of virtual assistants implementing different modalities.Persuasion knowledge may exert different effects on different types of responses.To examine the subsequent effects of persuasion knowledge on advertising effectiveness, we include three brand-related outcomes in our research model: affective (brand attitudes), cognitive (brand memory) and behavioral (purchase intention) outcomes.
We suggest that increased persuasion knowledge positively influences cognitive outcomes such as brand memory.To retrieve and utilize persuasion knowledge, people need to elaborately process the communicated message (Buijzen et al., 2010).Higher persuasion knowledge then in turn also increases the likelihood of remembering the communicated brands, since cognitive processes are activated.Research on sponsorship disclosure showed a positive relationship between understanding that a message is persuasive and recognition and recall of an advertisement (e.g.Boerman and van Reijmersdal, 2020).
However, this line of research also showed that understanding a persuasive or selling intent negatively influences more affective processes (Boerman et al., 2017;Boerman and van Reijmersdal, 2020) and behavioral intentions (Choi et al., 2018).Thus, we propose persuasion knowledge to negatively influence brand attitudes and purchase intention as indicators for advertising effectiveness.As the persuasion knowledge model (Friestad and Wright, 1994) and theories on reactance (for an overview, see Fransen et al., 2015) suggest, people use their Is voice really persuasive?
persuasion knowledge to cope with a persuasive attempt.Hence, persuasion knowledge enhances the critical assessment of advertising that might be perceived as a threat to peoples' individual freedom (Brehm, 1966).Even though not all previous studies found effects of persuasion knowledge on brand-related outcomes (e.g.Boerman et al., 2012;Van Reijmersdal et al., 2012), several studies indicated that in case a persuasive attempt is detected as such, people are more likely to critically assess the attempt (Obermiller et al., 2005), negatively evaluate it (Tutaj and van Reijmersdal, 2012) and develop less positive attitudes and behavioral intentions (Boerman et al., 2017;van Reijmersdal et al., 2016).Hence, we propose that an increase in persuasion knowledge negatively influences affective (attitudes) and behavioral (purchase intention) brand-related outcomes.In sum, this leads to the following hypothesis: H3. Persuasion knowledge is positively related to (a) brand memory and negatively related to (b) brand attitudes and (c) purchase intention.

Design and sample
We implemented an experimental between-subjects design with three conditions, (1) virtual assistant communicating via voice only (voice condition; voice as input and output modality), (2) virtual assistant communicating via voice, but displaying the interaction history (voice þ IH condition; voice as input modality, voice accompanied by text as output modality), and (3) virtual assistant communicating via text (text condition; text as input and output modality).The study was conducted in Dutch and was pre-registered on the open science framework (OSF) [2].
An a priori G*Power analysis for a between-subject one-way analysis of variance (ANOVA) with three groups informed that the required sample size was 300 for an effect size of partial eta square 5 0.06 (f 5 0.25) with 98% power.This calculation was informed by the study of Cho et al. (2019) estimating a similar effect.However, research on virtual assistants is an emerging area of research with a very limited number of previous studies that help us estimating effect sizes.Additionally, we wanted to account for possible technical difficulties with the setup.Hence, we were striving for a larger sample size of at least 450.
Participants were recruited through an ISO-certified research company in the Netherlands, using initial quotas for age, gender and region, and using a continuous recruit stream in June and July 2020 to reach the desired sample size.Before filling in the survey, participants had to give informed consent and make sure they used Google Chrome as a browser.Participation was terminated in case participants failed one or both attention checks (n 5 305), failed to received audio in the voice conditions (n 5 37), or did not interact with the virtual assistant, meaning that they spent less than 15 seconds on the interactions, or could not recall the recommendation given by the assistant (n 5 398).After excluding one participant who indicated to be under 18 years old and one multivariate outlier [3], the final sample was 450.Participants in the final sample were between 18 and 78 years old (M 5 46.15,SD 5 15.84), 229 were male (50.89%; 220 female, 1 non-binary).In terms of education, 54.22% indicated to have a high educational level (36.0%middle, 9.78% low).A full overview of all descriptive statistics is presented in the Appendix.

Procedure
The study was approved by the university's Ethical Review Board.After giving informed consent and randomly being assigned to the conditions (n Voice 5 113, n Voice þ IH 5 134, n Text 5 203) [4], participants were instructed to interact with the virtual assistant to obtain a recommendation for a dinner recipe.We chose this task because virtual assistants are often used for cooking-related questions and it allows for the embedding of branded productrelated recommendations (Rabideau, 2018).Participants were guided through the interaction by the virtual assistant, including the request to choose one out of three pasta dishes (beef, chicken or vegetarian to account for differences in taste).After choosing a dish, the virtual assistant gave the ingredients for four portions including eleven ingredients each.The recommendation contained five branded ingredients.A full transcript of the interaction is provided on the OSF [5].

Stimuli
Three versions of a virtual assistant were designed for this study using and extending the conversational agent research toolkit (Araujo, 2020).In the voice only condition, participants were exposed to a microphone icon they had to click on to start talking to the assistant.The assistant responded via voice, providing a recommendation for a recipe.The voice used was the voice "Xander"a synthetic younger male voiceavailable on Google Chrome (an example of the voice is provided on the OSF).The voice plus interaction history condition resembled the voice only condition in its graphical interface.In addition to the verbal input and output, the interaction was translated into text that was displayed in a chat-interface.
In the text only condition, participants interacted with the assistant via a chat-interface.Examples of the stimuli are presented in Figure 2.

Measurements
All items were measured with a 7-point Likert-scale ranging from 1 5 strongly disagree to 7 5 strongly agree, unless stated otherwise.All measurements were translated from English to Dutch for the experiment.
Perceived human-likeness.Perceived human-likeness was assessed with a combination of mindful anthropomorphism, measured with four items on a 7-point semantic differential scale including "I perceived the virtual assistant as machine-like/human-like" (Bartneck et al., 2009;Ho and MacDorman, 2010) and social presence, measured with nine items including "While I was communicating with the virtual assistant, I felt as if it was an intelligent being" (Gefen and Straub, 2004;Lee and Nass, 2003; Cronbach's alpha 5 0.96, M 5 3.94, SD 5 1.46).
Cognitive load.To assess cognitive load, we measured the amount of mental effort, referring to the cognitive capacity allocated to the communication (Paas et al., 2003).We used one item "How much mental effort have you invested in the interaction with the virtual assistant?" on a 7-point scale from 1 5 very low to 7 5 very high (Paas, 1992; M 5 3.30, SD 5 1.47) [6].
Persuasion knowledge.Persuasion knowledge, conceptualized as the understanding of persuasive and selling intent, was measured with seven items (Tutaj and van Reijmersdal, 2012).Two items were used to measure selling intent: "The aim of this virtual assistant is to sell products" and "The aim of this virtual assistant is to stimulate the sales of products".Two items were used to measure persuasive intent: "The aim of this virtual assistant is to influence my opinion" and "The aim of this virtual assistant is to make people like certain products".Three items were used as filler items referring to an informational attempt (helping): "The aim of this virtual assistant is to help me choose a recipe", "The aim of this virtual assistant is to give information about recipes" and "The aim of this virtual assistant is to let people know more about recipes".To assess persuasion knowledge, we used the four items measuring a persuasive or selling intent, which formed a reliable scale (Cronbach's alpha 5 0.83, M 5 4.48, SD 5 1.29).

Is voice really persuasive?
Advertising effectiveness.Brand memory was assessed with a recall and a recognition task.Firstly, participants were asked to write a shopping list with all items they remembered from the interaction.In the second, guided task, participants were asked to tick all items they would put on the shopping list out of a list with several options.For both, the recall, and the recognition task, we counted the total number of correctly identified brands/branded ingredients (scale from 0-5; M recall 5 0.15, SD recall 5 0.53; M recognition 5 2.26, SD recognition 5 1.59).Since most participants were unable to freely recall the brands, we exclude this variable in the subsequent analysis and only use recognition as an indicator for brand memory.SD 5 1.53).We also controlled for preference for voice, or text-based communication (Pastore, 2014; M 5 4.95, SD 5 2.08) and preference for the recommended dish (M 5 5.47, SD 5 1.16).

Technical pretest
We conducted a technical pretest with 13 Master's students of Communication Science in a classroom setting, of which 10 completed an online questionnaire assessing common variables of the Technology Acceptance Model (Davis et al., 1989) beforehand.Based on the feedback, we adapted the wording of the interaction dialog with the virtual assistant.

Randomization check
To see whether the random assignment to the different conditions was successful, we conducted chi-square difference tests with gender and education as the dependent variable and the three conditions as the independent variable.The proportions for gender (X 2 (4) 5 7.46, p 5 0.114) and education (X 2 (10) 5 8.65, p 5 0.566) did not differ by condition.
Furthermore, we conducted one-way ANOVAs with all other control variables as the dependent variable respectively and the three conditions as the between-subjects factor.Age (F(2, 447) 5 0.29, p 5 0.750), and liking of the recommended dish (F(2, 447) 5 0.82, p 5 0.443) did not differ across conditions.We found significant differences across conditions for familiarity with the featured brands (F(2, 447) 5 6.67, p < 0.001).Familiarity with the featured brands was significantly lower in the voice only condition (M voice 5 4.02, SD voice 5 1.48, p < 0.001) than in the voice þ interaction history condition (M voice þ IH 5 4.70, SD voice þ IH 5 1.37), but not in comparison to the text condition (M text 5 4.37, SD text 5 1.52, p 5 0.103).Familiarity with virtual assistants also differed across conditions (F(2, 447) 5 16.59, p < 0.001).Familiarity with virtual assistants was significantly higher in the text condition (M text 5 4.56, SD text 5 1.40) than in the voice þ interaction history condition (M voice þ IH 5 4.07, SD voice þ IH 5 1.56, p 5 0.009) and the voice only condition (M voice 5 3.57, SD voice 5 1.53, p < 0.001).The mean difference between participants in the voice only and voice þ interaction history condition was also significant (p 5 0.022) Preference for voice, or text-based communication differed across conditions (F(2, 447) 5 28.97, p 5 0.001), participants in the text condition were significantly more in favor of text-based communication (M text 5 5.69, SD text 5 1.81) than participants in the voice þ interaction history condition (M voice þ IH 5 4.08, SD voice 5 IH 5 2.06, p < 0.001) and in the voice only condition (M voice 5 4.95, SD voice 5 2.10, p < 0.001) [7].Since these three variables were also significantly correlated with at least one of the outcome variables, they were included as covariates in the subsequent analyses.

Hypothesis testing
To test H1, we conducted a one-way ANCOVA with perceived human-likeness as the dependent variable and the three conditions as the between-subjects factor, controlling for familiarity with the featured brands, familiarity with virtual assistants and preference for voice-or text-based communication.We find a significant effect of modality on perceived human-likeness (F(2, 444) 5 7.05, p < 0.001, partial eta square 5 0.03).However, a post-hoc comparison with Tukey HSD adjustments showed an unexpected pattern.Participants in the text condition (M 5 4.20, SD 5 1.50) perceived the virtual assistant significantly more human-like than participants in the voice þ interaction history condition (M 5 3.84, SD 5 1.44, p 5 0.032) and the voice only condition (M 5 3.59, SD 5 1.32, p 5 0.001).H1 cannot be supported.

Is voice really persuasive?
To test H2, we conducted a one-way ANCOVA with cognitive load as the dependent variable and the three conditions as the between-subjects factor, controlling for familiarity with the featured brands, familiarity with virtual assistants and preference for voice-or text-based communication.We find a significant effect of modality on cognitive load (F(2, 444) 5 3.43, p 5 0.033, partial eta square 5 0.02).Confirming our expectations, cognitive load was highest in the voice only condition (M text 5 3.11, SD text 5 1.47; M voice þ IH 5 3.41, SD voice þ IH 5 1.52, M voice 5 3.49, SD voice 5 1.40).Note however that post-hoc comparisons indicated no significant differences across experimental conditions.In sum, H2 cannot be supported.
To test H3, we conducted a simple regression with persuasion knowledge (persuasive and selling intent) as the independent and brand memory, brand attitudes and purchase intention as the dependent variable respectively, controlling for familiarity with the featured brands, familiarity with virtual assistants and preference for voice-or text-based communication.A significant regression equation was found for brand memory (F(4, 445) 5 16.95, p < 0.001, R 2 5 0.13, b 5 0.23), brand attitudes (F(4, 445) 5 7.14, p < 0.001, R 2 5 0.06, b 5 À0.13) and purchase intention (F(4, 445) 5 10.08, p < 0.001, R 2 5 0.08, b 5 À0.19).H3 can be supported as persuasion knowledge is positively related to brand memory and negatively related to brand attitudes and purchase intention.

Analysis of the full research model
To answer the research questions and test the full proposed model, we used the PROCESS macro for SPSS (model 80) with bootstrapping (10,000 samples) to create confidence intervals for the indirect effects (Hayes, 2017).Since the independent variable is multi-categorical, we ran the analysis twice accounting for all different comparisons.The results of the full models are presented in Figure 3   presented above, perceived human-likeness was significantly higher in the text condition than in the voice only and in the voice þ interaction history condition.Furthermore, the path from modality to cognitive load was significant for comparing voice only with text only, and surprisingly, also for comparing voice þ interaction history condition and text only condition.In other words, participants in the text only condition invested less mental effort in the than in the other two conditions.Contrary to expectations, results indicate further that cognitive load positively influences persuasion knowledge (b 5 0.12, SE 5 0.04, p 5 0.003), meaning that the more mental effort participants invested in the conversation the more likely they were to identify a persuasive or selling attempt.Furthermore, we find a small significant negative indirect effect for modality (voice only vs text only, and voice only vs voice þ interaction history) on persuasion knowledge mediated by cognitive load.We do not find any indirect effects of perceived human-likeness on persuasion knowledge, and the three advertising effectiveness variables.In response to the research questions 1-3, interacting with a voice-based virtual assistant does not directly lead to lower persuasion knowledge in comparison to interacting with a virtual assistant that uses a visual display of text (either text only, or voice accompanied by text), nor are these effects mediated by higher perceptions of the virtual assistant's human-likeness.However, cognitive load mediated the effect of modality on persuasion knowledge in an unexpected direction.The higher cognitive load imposed in the voice condition compared to the two conditions displaying text positively influences persuasion knowledge and subsequently advertising effectiveness.

Discussion and conclusion
Virtual assistants have become increasingly important for businesses as a new way to disseminate persuasive messages that are more subtle than traditional forms of advertising.Importantly, virtual assistants can not only be text-based, but are more often based on voice-interactions that can potentially give the assistant a more human touch and make it more persuasive.Hence, the current study examines the persuasive potential of different virtual assistant modalities.
This study adds to the emergent research stream considering virtual assistants in explicitly exploring modality differences between voice and text (and a combination of the two) and provides insights into the effects of persuasion coming from virtual assistants.Drawing on previous research in the field of human-machine communication, the current study extends this work in investigating whether voice in itself (vs text, or a combination of both) influences the perceptions of human-likeness.It contributes to marketing communication theories by applying the persuasion knowledge model (Friestad and Wright, 1994).By doing so, this study includes cognitive load as well as persuasion knowledge and examines consumers' understanding of persuasive intents coming from virtual assistants and downstream effects on brand memory, brand attitudes and purchase intention.Given the increased application of virtual assistants for commercial purposes (e.g.product recommendations), our work focuses on the persuasion process and connects human-machine communication to marketing communication theories.Four main conclusions emerge from this study and have theoretical implications.

INTR 32,7
Firstly, we find the text-based virtual assistant to be perceived as more human-like (or less machine-like, considering that mean scores are only slightly above midpoint) compared to the voice-based assistant.This effect exists irrespectively of whether the interaction was visually displayed by means of showing the interaction history or not.This finding is contradictory to our expectations formulated in H1 and suggestions of previous research (Cho et al., 2019;Schroeder and Epley, 2016), as voice as used in this experiment decreased the virtual assistant's human-like appeal.This finding greatly contributes to the emerging research stream on human-likeness perceptions of virtual assistants by providing some contradictory results that can spark future research on possible boundary conditions that must be explored.
One possible explanation for this finding can be provided by social information processing (SIP) theory.Based on SIP theory, users rely on available cues to form an impression and a relationship with their interaction partner (Walther, 1992;Walther et al., 2015).For the interaction with a virtual assistant this implies that consumers try to interpret all human-like cues given by the assistant, including speech.In the text only condition, only limited cues are available.This leaves room for consumers' own interpretation, as they have no non-verbal cues available.In the voice condition however, the (synthetic) voice as used in the experiment might have functioned as a cue that created perceptions of machine-likeness.It might have made the non-human nature of the communication partner more obvious.
To relate this finding to the opposing results of previous work by Cho et al. (2019) it is worth noting that they found a mediating effect of perceived human-likeness on attitudes toward the virtual assistant for utilitarian, but not for hedonic tasks.We do not know whether the participants in our more specific persuasive scenario experienced the interaction about a dinner recipe recommendation as rather hedonic, or utilitarian.Hence, we are not able to directly compare these results.However, taking the two studies into account can give an indication for possible task-specific moderating factors such as task involvement or preference.Future research should include these and other variables that might explain the different results.
Secondly, the current study contributes to marketing communication research by applying the persuasion knowledge model to examine the persuasive effects of voice (vs text) in virtual assistants.As virtual assistants are increasingly used for persuasive purposes, it is imperative to understand whether it is possible to make conversations more human-like by implementing a certain modality, and whether that translates into persuasiveness.Our findings regarding RQ1 are mixed.We find that modality does not directly influence persuasion knowledge, nor is this effect mediated by perceptions of human-likeness.In other words, not only is a text-based interaction perceived as more human-like than a voice-based interaction in this study, but a perception of human-likeness also does not influence whether consumers identify a persuasive or selling attempt.However, while we do not find any direct effects of modality on persuasion knowledge, we see in our additional analyses that the text-based assistant is perceived as more human-like and positively influences brand attitudes and purchase intention.More research is needed to fully understand the persuasive potential of virtual assistants.
Thirdly, we make a theoretical contribution by including cognitive load as an alternative explanation for persuasiveness.We expected that interacting via voice is more demanding for consumers and increases cognitive load (H2; Berry et al., 2005;Van Zant and Berger, 2019), which in turn also leads them to be less likely to activate their persuasion knowledge (RQ2 and RQ3; Campbell and Kirmani, 2000).Contrary to expectations, our results show that cognitive load did not suppress, but increased persuasion knowledge.This suggests that a task that is more demanding for a user, might make them more alert toward the content of the interaction.Moreover, even though post hoc differences are not significant for our ANCOVA to test H2 and must be handled with caution, we do see effects of modality on cognitive load Is voice really persuasive?
when testing the full model.Cognitive load was lower when communicating with the text-based virtual assistant compared to communicating with one of the two voice-based assistants.This is surprising, considering our third condition of a voice assistant accompanied by the display of interaction history.We find that showing an interaction history did not lift cognitive load.Quite on the contrary, voice accompanied by interaction history induced more cognitive load than text alone.If the text-output would have been the driver for cognitive load, we would have seen differences between communicating via voice only compared to a modality that has a visual display of text (so either text only, or voice with interaction history), but no differences between the two conditions that included text.This leads us to the conclusion that the differences in cognitive load of voice-and text-based virtual assistant interaction could be attributed to the input modality instead.In the two voice-conditions, participants used voice as input modality to interact with the virtual assistant, which might have led to differences in cognitive load.
It must be noted, however, that the relative amount of cognitive load is below the scale's midpoint, hence relatively low.One explanation could be that the interaction about a dinner recommendation as chosen in this experiment is a relatively easy topic to engage in.Additional research is needed to examine whether cognitive load is influenced by the virtual assistant's (input as well as output) modality in the context of more demanding topics.
Lastly, we contribute to knowledge on the overall effectiveness of different virtual assistant modalities.We studied whether persuasion knowledge translates into persuasive outcomes for virtual assistants.Based on previous research on the effects of persuasion knowledge (e.g.Boerman and van Reijmersdal, 2020;Boerman et al., 2017;Obermiller et al., 2005;Tutaj and van Reijmersdal, 2012;van Reijmersdal et al., 2016), we proposed in H3 that persuasion knowledge is positively related to brand memory, and negatively related to brand attitudes and purchase intention.Notwithstanding the lack of effects of virtual assistant modality on persuasion knowledge, our study confirms these expectations.It thereby shows that findings from these research lines can be translated to the context of virtual assistants.Hence, we can show and corroborate previous research (Voorveld and Araujo, 2020) that using the persuasion knowledge model is generally a useful tool to explain persuasive attempts coming from a virtual assistant.Notably, the positive effects of persuasion knowledge on brand memory were stronger than the negative effects on brand attitudes and purchase intention.

Managerial implications
Our findings have managerial implications.When concerning emerging technologies it is often assumed that a higher degree of "human touch" can lead to more positive affective evaluations (see, e.g.Sivaramakrishnan et al., 2007).Our study can confirm and extend this notion by showing that perceived human-likeness can positively influence persuasive outcomes such as brand attitudes and purchase intention.For businesses, it might therefore be valuable to invest in emerging technologies that provide a human appeal in consumer-brand interactions.However, simply using voice as a presumably human cue might not suffice.Quite on the contrary, voice can be experienced as less human-like than text by consumers, which implies that businesses must carefully evaluate how human-likeness can be conveyed.
Furthermore, reflecting on the differences of voice-and text-based virtual assistant interactions, our findings can give a first indication that voice as a communication modality can increase persuasion knowledge by being cognitively more demanding.For the development and implementation of virtual assistants in business, it might therefore be beneficial to actively engage consumers.Persuasion knowledge in the context of virtual assistants makes consumers aware of the commercial nature of an interaction, but at the same INTR 32,7 time also positively influences cognitive outcomes such as brand memory.As we could show, consumers' brand memory in general is relatively low, hence actively engaging consumers might help businesses to strengthen the visibility of their brands.Our findings further indicate that a combination of voice and text might be most successful to do so.

Societal implications
Furthermore, our findings have societal implications and inform regulators and policymakers in finding ways to empower users of virtual assistants.Based on the findings of our study, voice used as a human-like cue does not disallow consumers to cope with persuasive attempts.However, we also confirm direct effects of perceived humanlikeness on more affective outcomes such as brand attitudes and purchase intention.Considering the threats of increasingly human-like interfaces to influence and possibly mislead consumers, we suggest to not only explicitly inform recipients about the commercial nature of the conversation (as done with sponsorship disclosure, e.g.Boerman et al., 2017) but also to inform them about the non-human nature of the interaction.
Moreover, we find that persuasion knowledge as a cognitive response is influenced by the amount of cognitive load used when interacting with an assistant.The more engaged consumers are in a conversation, the more likely they might be to recognize a persuasive attempt.Hence, policymakers should try to not only inform, but involve consumers in interactions with technologies such as conversational agents.

Limitations and suggestions for future research
This study merits mentioning two methodological limitations that bring suggestions for future research.Firstly, even though randomly assigned to conditions, more participants interacted with the text-based virtual assistant compared to the two voice-based assistants, suggesting that more participants dropped out in the voice only and voice þ interaction history condition compared to the text condition.Participants may have experienced some sort of hindrance interacting with the voice-based virtual assistant, especially when the interaction was not visually displayed in text.However, to be able to measure persuasion knowledge and brand memory, it was necessary to only include participants that had a full interaction with the virtual assistant.Hence, future research should consider possible difficulties for participants to interact with voice only.Furthermore, we see that participants that communicated with the text-based assistant indicated that they were more familiar with virtual assistants in general.Even though we were able to control for familiarity in our analysis, we suggest future research to include familiarity with voice-based communication and familiarity with text-based communication as two separate constructs.
Secondly, we decided to use a younger male voice for this experiment.We believe that this created a realistic scenario since we chose a voice readily available on Google Chrome.However, future research should further examine the persuasive potential and expand the growing body of research on different auditory cues such as gender, emotionality or vocal pitch.Furthermore, we decided to mirror input and output modality in this study, meaning that participants used voice to interact with the two voice-based assistants (with and without interaction history displayed), and text to interact with the text-based assistant.This is as also suggested by previous scholars (Cho et al., 2019) and adds naturalness to the interaction, since it resembles how virtual assistants are used in practice (e.g.Google Assistant, Amazon ALexa).However, since we suggest that cognitive load of voice-and text-based virtual assistant interaction might be attributed to the input modality, it is advised for future research to add another layer of complexity and experimentally manipulate input modality in addition to output modality.This will enable an even more fine-grained analysis of the interplay between the two.