The Latency of Dialogue

What we lose in a stream-based culture

When you build a system that serves up artifacts as a stand-in for anticipatory dialogue, you create a stream.

Streams exist in droves today: email, Twitter feeds, Facebook, BuzzFeed, Instagram, Pinterest, and Vine, to a certain extent. In the end, they all are facsimiles of the discontinuous, ever changing set of real-world streams of social interaction in the world at large. That is why using multiple social networks feels additive, and beneficial, instead of fragmented and exclusive. Social media pales in depth to our physical social interactions. Jaron Lanier argues that we have two options in beating the Turing Test: either we make better technology, or we devolve our own sense of what is human. I argue the latter choice happens all the time, subconsciously and pervasively, and is especially true regarding our acceptance of streams as the dominant form of interaction on the web.

My central argument is that web media have become too hot, in a McLuhan-esque sense. There’s too much focus on the archival, pseudo-objective, and static aspects of our online conversations. We think of streams as nearly real-time, and participatory — but these still pale in comparison with physical dialogue, some older analog mediums, or even early incarnations of the web (i.e., pre-Google). 0.2% of Twitter users generate 50% of all read or shared tweets.

Often the most exciting digital mediums today result from when we endeavor to throw away the supposed benefits of a global, public, and permanent document network, and create something temporary, semi-private, and conversational, like SnapChat.

Facebook is one of the most successful historical examples of a stream. Arguably its news feed (and Twitter’s) served as the prototypical example of how to serve pseudo-infinite artifacts of change, in aggregate, to users in a way that emulated a personal dialogue while still working within the constraints of a largely asynchronous medium.

The structure of the web has historically assumed a fixed, unchanging page of information for all future existence (note: and even so, the assumption of permanence is oversold — the Web Archive can only capture so much traffic, disks fail, servers and companies go away). This notion appeals to our sensibilities: mild nostalgia and our obsession with high-fidelity archives, manifesting in fetishes for the authentic, the quantified self, photo albums, home movies, and lossless music formats. The metaphors we use betray this bias: we think of the web as a collection of “pages” rather than a river.

Television in McLuhan’s time represented a live transmission of a remote activity, never to be seen again. A transient, cool, thing, something you actively watched and discussed. It’s now archived, time-shifted, and forever available to be ignored at our convenience. The Google philosophy permeates through everything we touch. It’s the thought that anything and everything should be archived and recallable for all time. Even with a casual instant message, the resulting words are automatically appended to some infinite transcript somewhere. We have begun to demand these qualities on all of our media. The web, from which Google emerged, was fundamentally designed to depict what happened in the past. We’ve been persistent enough to figure out how to warp it half-way into expressing something resembling the present.

Further reinforcing this idea, think of the care high schoolers take in perfecting their Facebook profiles and feed articles. It makes sense why SnapChat became popular — impermanence is a feature in a world of forever. But really, what’s wrong with using streams as a conversational medium? It seems to work fairly well, and as mentioned earlier, we excel at adapting our behaviors to new ones.

Well, for one, the idea of anticipation and timing is largely gone, as we never know what’s going to happen next, except when the stream is disguising itself well: a continual capture of a longer event through time, like a road trip documented on Instagram, or on a lively wall post exchange with friends. But this anticipation is always at a macro, and ultimately passive level; we observe from a distance people’s reactions and emotions, and reconstruct them in our mind’s eye. This is the same struggle one has when text messaging while failing to convey a specific tone — what happened was a failure to anticipate how another reconstructs our own words.Thus the latency of discourse dominates our mode of interaction.

Anticipation is a necessary component of empathy. Louis C.K. points out that he doesn’t let his children use smartphones particularly for this reason — they need to be able to see how their words affect another through body language, and in turn how that makes them feel. All this before they enter the world of asynchronous visual communication.

Conversation at present is then largely resigned to combine past pieces of visual display to today, and nearly all mediums on the web can be described as such. These favor the visual over tactile, kinesthetic, audile feedback, the kind which excites our anticipatory instincts.

Our contemporary reluctance to talk on the phone comes from the painful switch from a visual world into a seemingly exotic audile one.

A quote credited often to Marshall McLuhan is “I don’t know who discovered water, but it certainly wasn’t a fish.” The visual is so overwhelming in the web’s aesthetic that we are allowing ourselves to believe that “being there” is synonymous with observing artifacts of the near past, like photos on Facebook from last night.

A heightened sense of “being there” in time has similar counterparts in introductions of past medias — the newspaper compared to the book, which heightened the sensation of now while still remaining an artifact of before. In each, we came to accept this visually fragmented stand-in to now. Our successive adapations as a culture make us that much more receptive to further contortions down the road.

This illusion is broken, however, when we attempt to reply back within the same medium. If you take a careful look, few web media today achieve a “panticipatory” dialogue.

Imagine a seminar hall full of people discussing a topic, responding in turn as conversation develops, with a professor leading a lively discourse on, say, digital media studies. What would the participation level be for the equivalent MOOC? That depth of dialogue is largely absent in our digital media today. This is largely due to structural constraints of the web itself, and our learned tolerance for it.

Take a form in which timing is critical: improvisational comedy. Could this, today, be conducted over the Internet? These performers benefit and thrive on their time-sensitive medium of choice (speech) in a way that may never be when we think in terms of HTTP.

An asynchronous medium is ultimately a compromise to scale time and resources at the expense of participation.

Overwhelmingly we prefer static, asynchronous media today. There are exceptions, but usually with compromises: Skype achieves something notable, but remains largely a one-to-one dialogue. YouTube enables video responses from many, but carry a significant latency of interaction, sometimes on the order of days. SnapChat uses images in a conversational manner to small groups of friends, but still limits itself to glib, photographic blips.

“New” mediums typically are advertised as increasing fidelity of visual form (e.g., h.265, 4K video, and 3D). Rarely is latency mentioned as the improvement, even though improving this is one of the greatest challenges the Internet faces.

Participation latency is critical to many tools, including programming environments and creational tools — we just call it a feedback loop. Much as the typewriter enabled rhythm in poetry, so should instantaneous feedback with computers enable novel digital content, if we only took the time to design them for this. This is the plea central to Bret Victor’s Stop Drawing Dead Fish.

The fact that you cannot talk to me nor shape what I’m saying as I type this article is a preference of the medium we are using, and this dictates the kinds of content we produce. The medium has always been the message.

Shortening feedback loops is always a laudable goal. The history of progressions of mediums always involves further re-integrating what was once external to the process.

While I’ve critiqued most web media, as this is where most of our dominant Internet use lies, I don’t mean to suggest that there’s anything inherently preventing us from doing better. For example, many networked multiplayer video games come close to a panticipatory dialogue. There’s conscious technical considerations to lower the latency of interaction, they’re largely multi-sensory, and they typically bring in potentially large groups of players together into a near-now moment.

There are also remaining vestiges of the more paced, exploratory, webring-era web, especially within various Internet subcultures. As time goes on, though, they are increasingly eclipsed in volume by socially-derived stream content, as reflected by Google’s ranking algorithm changes.

As software engineers working with the Internet, we are all capable of generating new mediums from nothing, and at a scale the world has never seen before.This is a profound gift.

My belief is that there hasn’t been enough discussion around how we design mediums, and in turn, protocols that get us closer to a deeper, more real-world dialogue between people. Perhaps this is a self-fulfilling effect of us relying on web media for this discussion in the first place.

I don’t have concrete ideas for implementation, but I do have some principles to those brave enough to try: let’s emphasize latency over fidelity, impermanence over records, and the multi-sensory over the visual, and in the process make something worthy of the human spirit.