Thoughts on interfaces, AI agents, and magic

Aug 18, 2023 • Yousef Amar • 6 min read

In UI design, Skeuomorphism is where UI elements look like their physical counterparts. For example, a button might have highlights/shadows, you might adjust time through slot-machine-like dials, or hear a shutter sound when you take a photo. I quite like skeuomorphic design.

I pay special attention to icons. My younger sister is young enough to have never used a floppy disk and therefore only knows this symbol 💾 to mean "save" but not why. You see it everywhere: icons (like a phone handset), language (like an "inbox"), and other tools (like the dodge and burn tools in photo editors, which stem from physical retouching on film).

Sometimes, words have gone through several layers of this, where they're borrowed over and over again. For me, one area where I see this a lot is in networks. In the days of radio and analogue electronics, we got a lot of new words that were borrowed from other things that people were already familiar with. Once computer networks came along, suddenly "bandwidth" adopted a different meaning.

The key here is this idea of familiarity. When something is new, it needs to be rooted in something old, in order for people to be able to embrace it, let alone understand it. Once they do, only then do you see design trends cut the fat (for example, the shiny Web 2.0 style made way for the more flat design we have today). If a time traveller from 20 years ago were to visit, of course they would find modern design and affordances confusing.

Take this a step further however: what about the things that never had a physical counterpart or couldn't quite be connected to one? Well, it seems we latch on to the closest available concept or symbol! For example, what exactly is "the cloud"? It never substituted water vapour in the sky; it was something new. Why is this ☰ the symbol for a hamburger menu? Because it sort of looks like items in a menu. Not to mention, why did we call it a hamburger menu? Because the symbol sort of looks like a hamburger.[1]

Anyway, why do I bring all this up? Because I noticed new words and icons showing up in the AI space, as AI is becoming more ubiquitous. AI assistance built into tools are becoming "copilots". The symbol for "apply AI" is becoming magic sparkles that look a bit like this ✨. I find this very interesting -- people seem to not quite have a previous concept to connect AI to other than "magic", and the robot emoji might be a little too intimidating 🤖 (maybe I should change the Amarbot triggering reaction to sparkles instead).

A couple days ago, this was trending on HackerNews, and sparked some conversation in my circles. As you might know, I have some interest in this space. It seemed to have some overlap with, a 2D virtual environment for work. This category really took off during covid. This product in particular has some big name backers (though not a16z ironically enough).

This got me thinking... AI agents would truly be first-class citizens in environments like these. You would interact with them the same way you interact with a human colleague. You could tell them "go tell Bob to have the reports ready by 2pm" and the agent would walk over to Bob's virtual desk, and tell them using the same chat / voice interface that a human would use.

How would agents interact with the outside world? LLMs already have an understanding of human concepts baked in. Why hack a language model to execute code (Code Interpreter) when you could use the same skeuomorphism that humans are good at, in an environment like this? If there's a big red button in the corner of your virtual office called "server restart button", a human as well as an AI agent can easily interact with that. Neither may ever know that this magic button does something in a parallel universe.

It might be some ways off before we're all working out of the metaverse, but I believe that the only way for that to happen is if it becomes more ergonomic than real life. It just so happens that this is great for humans as well as AI agents! There are already a class of tools that make you more productive in AR/VR than on a normal monitor (think 3D CAD). However when it comes to day-to-day working, organising your thoughts, communicating, etc, we still have some ways to go. To cross that bridge, we most likely need to embrace skeuomorphic design in the first instance.

What might that look like? Certainly storing information in space. Your desk top (and I don't mean "desktop", I mean literally the surface of your desk) can go 3D, and you can perhaps visualise directory trees in ways you couldn't before. Humans have excellent spatial reasoning (and memory) as my friend working on virtual mind palaces will tell you.

You could of course have physical objects map 1:1 to their virtual counterparts, e.g. you could see a server rack that represents your actual servers. However, instead of red and green dots on a dashboard, maybe the server can catch on literal fire if it's unhealthy? That's one way to receive information and monitor systems! A human as well as an AI agent can understand that fire is bad. Similarly, interactions with things can be physical, e.g. maybe you toss a book into a virtual basket, which orders a physical version of it. Maybe uploading a photo to the cloud is an actual photo flying up to a cloud?

Or maybe this virtual world becomes another layer for AI (think Black Mirror "White Christmas" episode), where humans only chat with a single representative that supervises all these virtual objects/agents, and talks in the human's ear? Humans dodge the metaverse apocalypse and can live in the real world like Humane wants?

Humans are social creatures and great at interacting with other humans. Sure, they can learn to drive a car, and no longer have to think about the individual actions, rather the intent, but nothing is more natural than conversation. LLMs are great at conversation too of course (it's in the name) and validates a belief that I've had for a long time that conversation may be the most widely applicable and ergonomic interaction interface.

What if my server was a person in my virtual workspace? A member of my team like any other? What if it cried if server health was bad? What if it explained to me what's wrong instead of me trawling through logs on the command line? I'm not sure what to call this. Is this reverse-skeuomorphism? Skeuomorphic datavis?

I might have a fleet of AI coworkers, each specialised in some way, or representing something. Already Sentinel is a personification of my smart home systems. Is this the beginning of an exocortex? Is there a day where I can simply utter my desires and an army of agents communicate with each other and interact with the world to make these a reality?

(Most) humans are great at reading faces (human faces that is, the same way Zebras can tell each other apart). This concept was explored in data visualisation before, via Chernoff faces. There are reasons why it didn't catch on but I find it very interesting. I was first introduced to this concept by the sci-fi novel Blindsight. In it, a vampire visualises statistical data through an array of tortured faces, as their brains in this story are excellent at seeing the nuance in that. You can read the whole novel for free online like other Peter Watts novels, but I'll leave the quote here for good measure:

A sea of tortured faces, rotating in slow orbits around my vampire commander.

"My God, what is this?"

"Statistics." Sarasti seemed focused on a flayed Asian child. "Rorschach's growth allometry over a two-week period."

"They're faces…"

He nodded, turning his attention to a woman with no eyes. "Skull diameter scales to total mass. Mandible length scales to EM transparency at one Angstrom. One hundred thirteen facial dimensions, each presenting a different variable. Principle-component combinations present as multifeature aspect ratios." He turned to face me, his naked gleaming eyes just slightly sidecast. "You'd be surprised how much gray matter is dedicated to the analysis of facial imagery. Shame to waste it on anything as—counterintuitive as residual plots or contingency tables."

I felt my jaw clenching. "And the expressions? What do they represent?"

"Software customizes output for user."

There are so many parallels between language and programming. For example, Toki Pona (a spoken language with a vocabulary of only 120 words) is like the RISC of linguistics. You need to compose more words together to convey the the same meaning, but it's quite elegant how you can still do that with so few words. It seems like languages don't need that large a vocabulary to be "Turing complete" and able to express any idea. Or maybe because language and thought are so tightly coupled, we're just not able to even conceive of ideas that we don't have the linguistic tools to express in the first place.

You can create subroutines, functions, macros in a program. You can reuse the same code at a higher level of abstraction. Similarly, we can invent new words and symbols that carry a lot more meaning, at the cost of making our language more terse. A language like Toki Pona is verbose because ideas are expressed from elementary building blocks and are context-dependent.

I imagine a day where abstractions layered on top of abstractions disconnect us from the underlying magic. You see a symbol like the Bluetooth icon and it has no other meaning to you except Bluetooth. In your virtual world, you interact with curious artefacts that have no bearing on your reality. You read arcane symbols as if they were ancient runes. You cast spells by speaking commands to underlings and ambient listeners that understand what you mean. Somewhere along the way, we can no longer explain how this has become a reality; how the effects we see actually connect to the ones and zeros firing. Is that not magic? ✨

  1. This is sometimes called a drawer menu too, but the point still stands, as it slides out like a drawer. Other forms of navigation have physical counterparts too, like "tabs" come from physical folders. One you start noticing these you can't stop! ↩︎