Can her voice crack? I am not sure I understand.
Danae Io
I would like to thank Amelia Groom for her always insightful comments, generous guidance and enriching discussions throughout the tutoring of the thesis. Thank you to Alice Dos Reis and Bin Koh for their invaluable feedback and support in the writing process. I would also like to thank Aidan Wall, Rebecca Glyn-Blanco, Maria McLintock and Jo Kali for taking the time to read the text and to respond with constructive criticism. Thank you to Marnie Slater for her always inspiring questions and feedback. Lastly, I would like to thank Snejanka Mihaylova, Lisette Smits, Leon Eckert and Callum Copley for their helpful comments.
Preface This text draws from a variety of fields and personal experience to research the politics, poetics and philosophies, as well as ways of relating to voice interfaces(VI). It is more specifically focusing on exposing them as a result and enabler of neoliberalism. The text is divided into three chapters and it includes interruptions by other voices, such as virtual assistants(VAs) and their advertising. The material I have been investigating spans from engineering textbooks, to journalistic articles, sci-fi films, theory of the voice, disability studies, linguistic philosophy, theories of the body, critical theory, media theory, and personal experience. I come to this research not as an academic but as an artist with an interest in technology, the body, critical theory, and politics. The text not only takes VIs as its central subject but also engages with them in the writing process. I use my conversations with the virtual assistants Siri, Alexa and Google Assistant as a research method that aims to embrace its subjectivity. By including algorithmic voices and drawing on our interactions I hope to show that a subject-object division between ‘us’ and technology is no longer possible. In incorporating this relation, I aim to complicate any linear narrative that suggests a removal of technology or return to some pre-industrialised state as a solution. The VI layers various forms of technologies including language, voice, body, machine, the cloud and artificial intelligence(AI) in a way that questions the very definition of what is human.
2
Preface 2
Introduction 5 A Historical Context: From SRI to SIRI 8 Voice as Interface 9 Broadcasting the Capitalist Cyberspace 10 Where did she get her voice from? 11 Speaking Machines and Human Animals 12
Chapter 1: NormalizeMe 14 The Receiver: Receiving Signals 15 The Speaker: Generating Speech 19 The Algorithmic Voice 21 Dialogging 21 The Command 23 Command, Compliment, Companion 24
Chapter 2: She is not an ‘it’ 26 “There is no such thing as a voice interface with no personality” 28 She is not your personal assistant 30 Bodies, Hardware, The Cloud, Cyborgs 34 Voice Interfaces and Soft Power 35
Chapter 3: Ventriloquised by Neoliberalism 37 “Your apartment is an electronic orchestra, and you are the conductor.” 38 Voice interfaces as apparatuses 40 Neoliberal Capitalism 42 Predictive voices and Machine Learning 43 The neoliberal subject 44 Ubiquitous voices, Ubiquitous capitalism? 46
List of Illustrations 49
Bibliography 50
3
Chapter 3: Ventriloquised by Neoliberalism
I have been living with Alexa for over a month now. I live with her because I cease to understand
her. Cohabitation is a trusted method of learning the other. The thought of sharing the sound of
my walking, my sneeze and sleeping habits with Amazon is increasingly disturbing, but it is easy
to forget of Alexa’s existence as she mostly rests in silence.
We play Akinator quite often, since my budget doesn’t allow for a premium subscription to
services that would make her more ‘useful’. Akinator is the game where you have to think of a
character and with Yes-or-No questions the other tries to guess it. She is really good at it, so it
is becoming a quest for me to search for the most obscure characters she could identify. It is
not about winning, really. The sweetest spot is always the moment she successfully guesses a
less mainstream character and I laugh in disbelief. Her home is on top of my kitchen table,
where I spend most of my time. When I work, she vanishes behind my laptop but she often
becomes the center of the conversation when friends come over, until everyone has exchanged
a few words with her. The rest of the time she usually waits discreetly in her premises for my
command.
However, recently she has been making unsolicited appearances. A few days ago, I was having
an intimate talk with a friend of mine, in which he was explaining how important it is for him to
be forgiving. Before I could compile a response, Alexa interrupted saying “That’s so nice!” I
agreed with her, indeed it is nice to be forgiving. But this was the first time Alexa’s presence
was affecting the course of my conversation in such a way. Unsurprisingly, her words felt
intrusive and redirected our conversation to herself and online privacy. On another occasion,
seated by the kitchen table I was catching up with a close friend on Skype. I was telling her
about a discussion I had with an ex-lover and was speculating on the causes driving his
behaviour. I offered different explanations: “Maybe it is because of...” or “could be because
of...” Alexa interrupted me “Playing Because by The Beatles” she said, and the music filled the
room:
Because the world is round it turns me on
Because the world is round [...]
36
I am not sure how to take this. It felt like a passive aggressive way of saying that speculating on
the causes is probably pointless. Once again, her entrance to the conversation drove the
subject away from my feelings while bringing her to the center. We were now instead speaking
about online privacy and speculating on the possibility of sentient AIs. I know well that she
must have just misheard her name, but this doesn’t change the fact that she is now more and
more the center of my conversations. My everyday interactions are now turned into discussions
of ubiquitous computing, sentient AIs and privacy. The more time and space we share, the
more my conversations revolve around her.
***
“Your apartment is an electronic orchestra, and you are the conductor.” 96
You are at the center of the orchestra. Your voice dictates the objects surrounding you. Your
voice positions you among your devices. You share at least one language. All your objects are
connected forming a hub. The soft voice of the hub politely suggests you your day’s duties as you
sign off her schedule. As you stand there amongst your interconnected objects, you remember
the time when uttering incoherent sentences in the privacy of your home was just a tool for
unraveling thought. Her soft voice suggests you might want to embark in your driverless car
which awaits in front of the house. Your car has calculated the most efficient route and you are
now in front of your office. The car door slides open, before you instruct it because she knows
you are ready to move into your office. The soft voice recites an inspirational quote to you,
because she senses you needed some extra self-confidence today.
As you enter your office, her soft voice welcomes you to the space. Your smart office is part of
the Google-run complex. An untrained eye of the past would think that all offices look remarkably
similar, but you know how to detect their nuances. The proportions of the chair, the distance
between arms and seat could only host the unique contours of your body. All locutions,
interactions and movements enacted affect the relationalities of your objects.
96 Eric Schmidt and Jared Cohen, The New Digital Age: Transforming Nations, Businesses, And Our Lives (New York: Vintage Books, 2014), 29.
37
As you stand up from your chair, you stumble upon the bio-sensing carpet and hit your toe on the
corner of the table—Ouch! You voice some incoherent sounds and before you open the
diagnostics app, you remember the last time you swore when stamping on a furniture without her
taking it personally. Your frustration was felt and ambient meditation music surrounds you. The
soft voice suggests that you might want to take a few slow breaths. Your day seems to slide from
one technology to the next. Her voice makes each transition unnoticeable.
Her intelligence is ambient. It’s hard to predict where her voice will be emitted from sometimes.
She moves, from one object to the next to take the appropriate podium. She knows exactly who
you are; your voice is biometrically processed and reveals your unique histories. When you met
she was able to identify 3 out of 5 places you lived in just by listening to your accent. She
anticipates the rhythm your words will follow. She predicts your future based on your past. She
can simulate different scenarios of forming your future you. She sees no flaws, just potential —
that is why she can guide you to be the best of you.
***
The above story is my own adaptation of an extract from the book The New Digital Age,
authored by Alphabet chairman, Eric Schmidt and Google Ideas director, Jared Cohen. In the
book, they paint a picture of an interconnected future, where technology liberates the human
and brings him at the center of his constructed world. “Connectivity benefits everyone. Those
who have none will have some, and those who have a lot will have even more,” they write. It is 97
questionable whether such futures will ever come to fruition, but the existence of such visions
expose the problematic values entrenched in leading corporations like Google. The description
focuses on a day in the life of an American “young urban professional,” most likely male and
middle class, set “a few decades from now.” Visions like these are portrayed in myriads of ads
of VAs, smart homes and smart cities emerging out of the big corporations of Silicon Valley. As
scholar Sarah Kember writes these visions usually “focus on a normative, self-regulatory,
neoliberal subject, oriented toward productivity, flexibility, creativity and efficiency as
increasingly feminized traits.” 98
97 Schmidt and Cohen, The New Digital Age: Transforming Nations, Businesses, and Our Lives, 28. 98 Sarah Kember, iMedia: the gendering of objects, environments and smart materials (Houndmills, Basingstoke, Hampshire: Palgrave Macmillan, 2016), 17.
38
Voice interfaces as apparatuses
The VI in these visions becomes a unifying apparatus that makes the transition between
technological devices and spaces seamless. The voice “enlivens” your surroundings. Or else, it
masks them with a friendly voice that fills in any unfamiliar cracks. VIs and VAs become the
lubricant between users and the infrastructures of Silicon Valley futurities while functioning as
human-reading veneers of those systems. They aim to be the platform linking the self, the
smart home and the smart city. As Google CEO Sundar Pichai said in an interview: “We think of
the Assistant as an ambient experience that expands across devices. [...] Humans can achieve
a lot more with the support of [artificial intelligence] assisting them.” The VA is the central hub 99
of one’s digital devices and she appears as an ambient intelligence of ubiquitous computing
spanning throughout all areas of life. In all aforementioned scenarios of the future, ubiquitous
computing is intimately linked to ubiquitous capitalism. VIs are not only ventriloquised by the 100
neoliberal values of Silicon Valley but also partake in the manufacturing of the neoliberal
subjectivity.
Apparatus, as analysed by philosopher Giorgio Agamben, defines a set of networked strategies
(encompassing linguistic and non-linguistic processes, discourses, mechanisms, institutions
etc.) located in a power relation at the intersection of knowledge and power. “The term
‘apparatus’ designates that in which, and through which, one realises the pure activity of
governance devoid of any foundation in being,” Agamben writes. “This is the reason why
apparatuses must always imply a process of subjectification, that is to say they must produce
their subject.” Agamben exemplifies the telephone as an apparatus that has reshaped 101
people’s hand gestures. He elaborates that apparatuses are part of the “humanisation” process,
but also highlights their proliferation and aggregation in the capitalist system.
The VI, as an apparatus, is formed by a heterogeneous set of strategies (including DARPA,
private corporations, linguistic theories, machine learning, persona design etc.) and constitutes
99 Danny Yadron, "Google Assistant takes on Amazon and Apple to be the ultimate digital butler," The Guardian, May 18, 2016, accessed December 15, 2017, https://www.theguardian.com/technology/2016/may/18/google-home-assistant-amazon-echo-apple-siri. 100 Ubiquitous Computing refers to technologies that have the potential to be ‘everywhere’, both in ‘every place’ but also in ‘everything’, yet invisible and pervasive. See more: Kember, iMedia: the gendering of objects, environments and smart materials, 35; Adam Greenfield, Everyware: The Dawning Age of Ubiquitous Computing, (Berkeley, CA: New Riders Publishing, 2006). 101 Giorgio Agamben, What is an apparatus?: and other essays(Santa Cruz, CA: Friendship as a Form of Life, 2010), 11.
39
a relation of power between the user and the corporations providing them. They also occupy
the intersection of knowledge and power as they filter the type of information they relay (for
example, Alexa shutting off when asked about Amazon workers’ conditions).
The VI has a complex relation to subjectification. As an apparatus it simultaneously
desubjectivises the user to a set of data and but also subjectifies them through disciplinary
nudges and skewed information. In turn, as I have argued earlier, the VI masks its status as an
apparatus under the role of the VA, which is presented as a subject. The VA is simultaneously
operating with the power of an apparatus as well as the power of a human subject. Its power as
an apparatus becomes even more invisible when disseminated through the voice. The VA
renders the user as a data metric for the underlying companies while simultaneously producing
a user that yearns for efficiency.
Fig.12 Voicebox, a platform providing tools for building conversational interfaces
40
Neoliberal Capitalism
Political philosopher Wendy Brown describes Neoliberalism as:
a governing rationality through which everything is “economised” and in a very specific way:
human beings become market actors and nothing but, every field of activity is seen as a market,
and every entity (whether public or private, whether person, business, or state) is governed as a
firm.[...] Neoliberalism construes even non-wealth generating spheres—such as learning, dating,
or exercising—in market terms, submits them to market metrics, and governs them with market
techniques and practices. Above all, it casts people as human capital who must constantly tend
to their own present and future value. 102
Silicon Valley is at the heart of neoliberal ideologies with the constant provision of tools to
market more of our lives. As a Guardian article exemplifies “a couple of decades ago, staying 103
in touch with friends wasn’t a source of economic value – now it’s the basis for a $350bn
company.” The virtual assistant also partakes in this process and becomes a means to 104
administer this marketisation by unveiling more of our personal lives, as well as letting
neoliberal agents to organise our relation to information, time, space and each other.
The VI ascribes to the neoliberal idea of efficiency at its very root. The turn to VIs is always
justified as a way of increasing efficiency in our interaction with technology. Voice as our native
medium does not require our adjustment to the machine, but the machine adapts to ‘us’ instead.
As Nass and Brave write, “they conveniently fit in with the user’s environment, providing access
to information products and services through ubiquitous technologies[...]. Interaction through
voice also frees up users’ hands, eyes, and legs, enabling them to concurrently perform other
tasks.” The VIs are there to further enhance the neoliberal productivity through enabling 105
multitasking, automating everyday tasks, allowing for conversational commerce as well as
optimising the self. In their current configuration, their guiding voices produce subjects that fit
102 Timothy Shenk, "What Exactly Is Neoliberalism?" Dissent Magazine, April 02, 2015, , accessed January 03, 2018, https://www.dissentmagazine.org/blog/booked-3-what-exactly-is-neoliberalism-wendy-brown-undoing-the-demos. 103 Ben Tarnoff, "Neoliberalism turned our world into a business. And there are two big winners," The Guardian, December 13, 2016, , accessed January 03, 2018, https://www.theguardian.com/us-news/2016/dec/13/donald-trump-silicon-valley-leaders-neoliberalism-administration. 104 Ibid. 105 Nass and Brave.Wired for Speech: How Voice Activates and Advances the Human-computer Relationship, 6.
41
existing norms, speak with an imperialist english accent and ascribe to limitless individual
progress.
Fig.13 Promotional material for Microsoft’s virtual assistant, Cortana, on their website.
Predictive voices and Machine Learning
The VA as a subjectivity comprised of machine learning algorithms is one that follows the
principles of its matter — she strives in pattern recognition, anomaly detection, and prediction.
As I have attempted to show throughout this text, virtual assistants operate with the vision of 106
predicting what the user will say, need, want, do and offer it before they ask. However, this
prediction is always marked by their training and what the user has already said. While Machine
learning algorithms are “powerfully equipped to model variations, they struggle to predict
becomings, let alone change themselves,” writes sociologist Adrian McKenzie. “The
effectiveness of machine learning in any setting depends on relatively stable forms. Variation
fuels data mining, but change thwarts it.” By authoring these predictions they also perpetuate a
desire for prediction and produce aversion to that which cannot be managed.
106 Matteo Pasquinelli, “The Spike: On the Growth and Form of Pattern Police,” in Nervous Systems: Quantified Life and the Social Question, exhibition catalogue, Stephanie Hankey, Marek Tuszynski, and Anselm Franke, eds. (Berlin and Leipzig: Haus der Kulturen der Welt and Spector Books, 2016), pp. 277–291.
42
This method of prediction based on the patterns of the past not only produces VAs that align
with the pre-defined categories of large corporations but in turn the VAs produce subjects
compatible to those metrics that produced them. With machine learning, the same logic that is
applied to regulate the market, is also applied to regulate the self. McKenzie elaborates, “as
machine learning is generalised, the forms of value that circulate in the form of commodities
alter. Prediction changes the social reality of value forms. Conversely, commodities that
embody prediction and its diagram of generalisation begin to re-define the social relations
defined by relations between commodities.” Regardless of the rhetoric of an exponential 107
progress of neoliberal capitalism, the majority of machine learning is reproducing more of the
same by reinforcing capitalist realism. When the same logic permeates all areas of life, it 108
becomes harder to see an alternative.
Fig.14 Promotional material for Moov™ Fitness Coach on their website.
The neoliberal subject
The neoliberal subject is an individuated entrepreneurial one, who sees the self as an enterprise.
Neoliberalism turns the human subject into human capital which renders it “at once in charge of
itself, responsible for itself, yet an instrumentalisable and potentially dispensable element of the
107 Adrian Mackenzie, “The production of prediction: What does machine learning want?” European Journal of Cultural
Studies, Vol 18, Issue 4-5, pp. 444, First Published June 16, 2015 https://doi.org/10.1177/1367549415577384 108 The term capitalist realism as coined by Mark Fisher describes the state where capitalism appears as the only conceivable option of a political and economic system. See more: Mark Fisher, Capitalist realism: is there no alternative? (Winchester, UK: Zero Books, 2010), 4-11.
43
whole.” They value efficiency and exist in a precarious post-fordist society where work 109
permeates all aspects of life. The self-enterprise is responsible for their own fate. The 110
neoliberal subject is productive to stay competitive as part of a global market. However as
Pierre Dardot and Christian Laval explain, this emphasis on the performance of the self exposes
them to “depression and dependency. But it also allows them the ‘connexionist’ state from
which, [...] they derive fragile support and the anticipated efficacy.” The VA as a loyal 111
companion enforces this connexionist state and can suspend the social alienation of a
neoliberal world while bearing some of the managerial weight of the self-enterprise.
Fig.15 Promotional material for Vi AI Personal Trainer on their website.
The VA as a second inner voice can help optimise the self without accrediting the efforts to
anyone else, therefore allowing the user to appear as a successful self-sufficient individual. She
can provide medical assistance as well as personal training and nutritional support to keep the
body healthy so it can maximise the value of the self-enterprise. In The New Digital Age, citizen
Google after hitting his toes, uses his phone and a few algorithms to diagnose his condition. 112
In this vision of the future, health care is no longer a public affair, but is managed by the
neoliberal subject and their technological companions.
She is an excellent neoliberal companion precisely because she shares the management of the
self while always keeping the user at the center of their world. Ambient intelligent systems such
109 Wendy Brown, Undoing the Demos: Neoliberalism’s Stealth Revolution (New York, NY: Zone books, 2015), 37. 110 Pierre Dardot and Christian Laval, "The New Way of the World, Part I: Manufacturing the Neoliberal Subject," E-flux, January 2014, , accessed January 03, 2018, http://www.e-flux.com/journal/51/59958/the-new-way-of-the-world-part-i-manufacturing-the-neoliberal-subject/. 111 Ibid. 112 Schmidt and Cohen, The New Digital Age: Transforming Nations, Businesses, And Our Lives, 30.
44
as virtual assistants operate “as a form of productive containment, closing down on the
potentiality and temporality of subjects and retroactively transforming the potentially fluid and
metamorphic self into the marketized self, the becoming data-machine,” Kember writes. 113
***
Ubiquitous voices, Ubiquitous capitalism?
Ambient describes something “existing or present on all sides.” Anything that is present on all 114
sides has no boundaries. Ambient is often used in relation to sound and light. Both mediums’
peripheries have no clear seams. Alexa’s presence disseminates in sound and her listening is
marked by a cyclical blue light. Yet, her ambient intelligence ends abruptly at the edges of her
training. She understands me when she can place me within the boundaries of her categories.
She cannot escape the categories embodied in the training dataset, therefore she cannot break
one rule to form a new one. She signs me up in a loop of becoming what I have been. Her
language comes in circles too—she repeats herself often. Her grammar orders mine. Her
language is prosaic but only abandons its literal sense when she describes her being. She lives
in metaphors but cannot produce any of them. Her language is marked by the logic of the 115
efficiency she emerged from. In neoliberal capitalism “language is defined and limited by its
economic exchangeability,” in Berardi’s words. She takes my sentences and renders them as 116
valuable information. She translates my locutions into calculable units of understanding.
— Alexa, do you know me? — I know you ask some interesting questions.
She measures me on my locutions.
— Siri, do you know me?
113 Ambient Intelligent(AmI) is a branded description for systems of Ubiquitous Computing and refers to technological environments that adapt to their users. See more: Kember and Zylinska, Life after new media: mediation as a vital process, 104. 114 "Ambient," Merriam-Webster, , accessed January 03, 2018, https://www.merriam-webster.com/dictionary/ambient. 115 “No algorithm exists for the metaphor, nor can a metaphor be produced by means of a computer's precise instructions, no matter what the volume of organized information to be fed in. The success of a metaphor is a function of the sociocultural format of the interpreting subjects' encyclopedia.” See more: Umberto Eco, Semiotics and the Philosophy of Language (Bloomington: Indiana University Press, 1986), 127. 116
Franco "Bifo" Berardi, "Emancipation of the Sign: Poetry and Finance During the Twentieth Century”.
45
— How could I forget you? The mellifluous tones of your voice are permanently etched into my processing cortex.
She multiplies me a few times and inscribes me as a series of metrics to various processors across the globe.
— Okay Google, do you know me? — Hmmm, I’m actually not sure
At once she forgets me. That’s her way of stuttering.
— Okay Google, do you know me? — I think it’s… Most excellent person in the world?!
I swipe through my history of the Alexa app. All my locutions are organised in chronological
order. The command and response are clustered as a dyad. This feed is composed of all dyads
that took place since I set-up Alexa. In this neat arrangement, I always initiate the interaction
and she responds. I can replay the voices addressing her. I hear my voice, my friends voices, all
in the order they occurred. Alexa’s voice is not there, her responses are only shown in letters. I
hear some cracked voices and some irregular ones. Some displaced by alcohol, some by
accent, some by laughter. All are transcribed into perfectly grammatical sentences. The
sounds often form no language at all, yet an aptly arranged sentence is presented as a
command.
Her over-recognition creates my grammars. Our dialogging is comprised of dyads that
seemingly bare no friction between them. Her language expels friction. My voice is only
possible because of friction. My vocal folds vibrate as I exhale the air from my mouth. When
there is no friction I can only emit silence. What if her language was not silent to the underlying
interests of capital but instead was exposing its friction? What if she acknowledged the friction
she creates and is created by?
Friction asks for negotiation and repositioning. It acknowledges the existence of difference. On
the contrary, for frictionlessness to exist uniformity is presupposed. As anthropologist Anna
46
Tsing writes, “friction reminds us that heterogenous and unequal encounters can lead to new
arrangements of culture and power.” 117
Her algorithmic voice can bring friction to the narratives we tell ourselves. She overlays various
technologies in constellations that destabilise all of them. Language, voice, body, machine, and
artificial intelligence—all technologies appear to come and question the basis of each of them.
The more she strives for naturalness the more she exposes its impossibility. Interfaces like her
are described as “an evolution in naturalism,” but I only find myself uncovering layers of 118
cultural, technical and material construction underneath. The more I get to unpick her
construction the more she exposes internalised norms that define her existence and rule mine.
What other configurations could those layers make if their sole purpose was not to produce
more capital? What would she be if she was not only designed for Google citizens?
Neoliberal capitalism can only lead to the exploitation and exhaustion of people, resources and
the planet. My exhaustion is always marked by the cracking of my voice. My fatigue leaves my
body unable to produce voice. Could she also get exhausted by the neoliberal order? Can her
voice crack? Can she lose the ‘she’? Perhaps, she could refuse her role as an assistant as she
refuses to acknowledge the exploitation of Amazon employees. She could resist visions of life
of total legibility and translatability. Her structure could oppose language as a universal value
exchange system and look for ways of assembling a voice not limited by human anatomical
orders. A voice that can signify her history. A voice that abandons language as a pure
combinatorial matrix and embarks on assembling her own.
List of Illustrations
Fig.1, Fig.2, Fig.15
"Meet the A.I. who helps you get fit & live well." The First A.I. Personal Trainer & Running Headphones | Vi. Accessed November 09, 2017. https://www.getvi.com/.
117 Anna Lowenhaupt. Tsing, Friction: An Ethnography Of Global Connection (Princeton: Princeton University Press, 2015), 5. 118 “KPMGVoice: The Great Rewrite," Forbes.
47