In my last post, I suggested a way to look at transformer models — the form of AI behind ChatGPT and its ilk — as a clutch: a soft, sliding layer between people (flexible) and software systems (rigid). Human intentions and needs are variable and fluid; machine systems are clockworks, made in just the way they’re made, and hard to change. Having a go-between to translate between these worlds will make interacting with software systems more pleasant. It will make the systems more useful.
It’s also interesting to think about how these models create a new way for machines to talk to each other.
When we want two gadgets to talk for some reason, we start with a wire (or a radio wave). We set the voltage on the wire low or high. We make patterns of low and high. We package the low and high into discrete chunks spread out over time, and we agree on conventions about what each chunk means. We group the chunks into larger units, that we call “packets” or “frames”, and exchange special packets in a special order, according to agreed-upon rules.
Ethernet, HTTP, and USB are all just solutions to particular coordination problems at different levels of this game. They work spectacularly well, but they’re artifacts of a clockwork universe, so they’re brittle. Their rules are baked into hardware, into firmware, into internationally agreed-upon standards, into business models, into intellectual property rights. Once deployed, they are hard or impossible to change. The Wifi protocols your laptop hardware supports may never change; if newer versions of Wifi are invented, you’ll likely have to buy a new laptop, and a new router, to get them.
It’s a cumbersome setup, so naturally, most devices can’t talk to each other at all. We take this as normal.
Lots of devices in our lives have a clock, for instance, but the clocks don’t synchronize. All the displays and lights and cash registers in a retail environment don’t turn on and off together unless someone very carefully arranged for them to do so. Your car, solar panels, and all your appliances don’t spontaneously coordinate to optimize your usage of electricity and save you money.
But language models — the ones we already have, not even the future better ones — change the dynamics here in an intriguing way.
What they offer is the opportunity for a new protocol layer, one that will sit on top of Ethernet, HTTP, USB, and the rest.
At this layer, there won’t be packets, buffers, error codes, there’s just . . . English.
What happens at this layer is governed by a document that looks more like a legal contract or board game instructions than a computer program. It just explains the rules.
Devices that abide by the rules are allowed to say things to each other like:
“I’m this type of device are here are my capabilities…”
“I’m observing this situation…”
“I’m taking this action…”
“I’m making this request…”
And the place where this conversation takes place basically just looks like . . . . a chat room.
In this chat room, the participants are the devices around you, and their goal is to try to help you realize your goals. The chat room of your house could be just another WhatsApp thread on your phone.
In that thread, you can read what’s going on, diagnose problems, or give direction. If you buy a new vehicle, you can add it to the chat. Once you do that, then the next time you’re riding down the highway wondering, did I leave the iron on?, you can muse aloud to the car, the car can ask the house, and the house can ask the iron.
Instead of technology being a black box, we have the opportunity for an open system where the user of the system actually has control.
The necessary ingredients are:
A small language model, with a reasonable amount of common sense
The model is baked into a chip so hardware vendors can buy it from parts bins as easily as they’d buy a touchscreen
Strong security guarantees. The model needs good “alignment” and the system as a whole needs to be guaranteed to serve your goals and not the goals of others. The current generation of “smart devices” are NOT the model here; these things are trash.
A messaging protocol running on a local Wifi network
Once you have this, then when you’re cooking dinner in the oven, and the timer goes off, but you’re way downstairs by the washing machine, the oven tells the washing machine what’s up, and the washing machine lets you know. The system as a whole has common sense.
(The enterprise use cases are substantial, probably more than these “smart home” examples I’m pulling, but consumer use cases are easier to explain and build intuitions around.)
In this world, the “internet of things” — the perennially niche idea of intelligent environments where objects can adapt to human needs in real time — can start to make sense. Up til now, the juice in that idea has never been worth the squeeze. Getting devices to talk is hard; adding more devices is harder. Each additional piece of the system adds complexity and management overhead. You quickly end up with something that isn’t comprehensible to most of its users, only to a priest class. And the utility of the system is capped — hard — to exactly those use cases someone was able to imagine before it was built.
In networks made up of people, we tend to see the opposite dynamic. Under the right conditions, the more people connected to a network, the more valuable that network becomes. This dynamic, referred to as Metcalfe’s law, is the engine of social networks, transit networks, markets, and cities. These types of systems get better as they grow.
As soon as there is simple intelligence in the devices, and a way for them to connect, then a collection of these devices can grow its usefulness under Metcalfe’s law, rather than collapsing like a technology soufflé. When the components of the network are more like (well-behaved) people than opaque clockworks, then adding the next participant to the chat, and the next, increases the value of the whole system.
Done right, it’s a vision of the wizard’s castle, full of benevolent, enchanted objects.
Done wrong, of course, it’s the Sorcerer’s Apprentice scenario I alluded to in the last post, with a million animate mops and buckets wreaking havoc.
Legibility is the key here; if the incantations in the spell book are in English, not in Latin, then people will have power over their stuff. More power than they’ve had, actually, in the history of computing so far.