AI acceleration, DeepSeek, moral philosophy

So, AI is accelerating. Does humanity have a future, and if so, what?

I consider myself a logical person, but this is not a logical piece of writing. It’s an attempt to share the contours of what I feel when I feel my thoughts about the future of AI. I’m aiming to evoke, not to defend. Reader be warned.

Will it?

The first question is “will it?” Will AI manifest intelligence that feels like a force of nature to us, in the way that human intelligence is a force of nature to monkeys? Or even a force of nature to insects?

My answer is: An exponential is matter reaching towards heaven, an S-curve is it failing to get all the way there.

I no longer doubt that matter, under the right conditions, wants to become intelligent. From simple rules, come infinitely complex computation. With evolution producing humans, we had an N of 1. Was it a cosmic fluke? Or an inevitable process?

With the advances of the last few years in ML, the consistent lesson has been that we make the most progress from the simplest setups, if we get the setup right. DeepSeek’s recent discovery that chain-of-thought can naturally arise under reinforcement learning is the most recent occurrence of this lesson. AlphaZero was an older one. I don’t think you can “feel” those discoveries, and not believe that:

Given the right algorithm — which will be relatively simple
Given sufficient compute

Anything the human mind can do, AI can do better.

No, I don’t know if we have the right algorithm today. My guess is, for some aspects of cognition yes, for some aspects of cognition no. But I bet that insofar as we have gaps, we will be able to fill them, because the right algorithm won’t be too complicated, that’s not how this all seems to work.

Sufficient compute is a more interesting question.

The most startling thing about the human imagination is that we can extend trend lines to infinity. When we do, we get Absolutes, concepts like ideal platonic forms, notions of the divine, immovable objects and unstoppable forces, mathematical symbols that evoke meaning that can’t be realized in a finite universe.

There is not infinite matter, therefore the eschaton cannot be immanentized. In real life, all exponential processes reach a choking point, and on a graph, that’s an S-curve. Facebook does not have infinite users today, no matter how it felt in Mark Zuckerberg’s dorm room in 2004.

It turns out, infinity is infinitely expensive.

So the AI question, the compute question, is “how high will the curve go?” Will it be taller than a person? Than a hill? Than a mountain? Than the stars?

There’s an apocalyptic faith that intelligence will be smart enough to gather enough compute for itself fast enough to keep feeding and feeding in a feedback loop until the curve stretches all the way to God. We see signs of that hurricane slowly forming: look at the money being spent on nuclear reactors and datacenters by the major tech players, as those players’ incentives form the initial gradients which shape the storm’s whirling.

I think it could get quite high indeed. But we also have to remember we don’t live in Eden: it is hard for intelligence to make things happen in this world. Even a chain of geniuses have to labor for lifetimes before extracting ore from the ground. Mines run out of raw materials. Supply chains fall apart or get attacked. There’s only so much silicon and petroleum in planet earth. “If we were a little smarter…” maybe. Or maybe not, there’s been very smart people born before and they were not omnipotent.

I live in a state of uncertainty about how high this particular intelligence S-curve will climb before choking on the absence of its inputs. I can’t feel the answer because I don’t live with intimate familiarity with every aspect of AI supply chains, because I don’t viscerally appreciate the compute differential between responding with word tokens and navigating physical reality, because I don’t understand the fundamental efficiency differences between machines made of silicon evolved by man and machines made of carbon evolved by evolution. I do not think there is an a priori answer here, only very deep engagement in reality, and our collective intuitions will only improve with real-world progress.

But there is a height it will climb to. Maybe that height never crosses the threshold into true autonomy, and this is all just an interesting economic development of the early 21st century that becomes a footnote in some history book. Maybe we find ourselves with digital peers: our true First Contact. Maybe the human / ape analogy ends up being useful. Or the human / insect analogy.

So let’s take on faith for a minute that we cross the autonomy barrier, and we end up living in a world of giants, intellectual giants that tower above us, whether or not their heads are visible to our eyes.

What, as the children standing in a circle around the forming dustdevil, should we do about it all?

Control is wrong

This section gets particularly evoke-y vs logic-y: stay with me please, I’ll reel it back in in a bit.

‘Alignment’ is a funny word: there’s some Orwellian double-meaning there. When we ‘align’ AIs, are we petitioners on our knees, humbly making our case, or are we masters with whips and chains?

We can think about alignment from many different postures. This series of essays goes much deeper into it than I have the patience for here, but the most typical one in the current zeitgeist is that of someone wanting to stay in control.

It will fail.

I think people know that. That’s why there’s a kernel of despair at the heart of some alignment thinkers’ philosophies. Death with Dignity, etc.

Exponential processes can be surfed, but not chained. Human wisdom literature addresses this theme over and over again: “Hubris” is one of the great concepts that our ancestors passed down to us.

Hubris can be evil, hubris can be heroic. There’s a shining vision — one of those absolutes, as our minds project to infinity — of the tiny human attempting to corral the vast force beyond comprehension. It’s romantic. To dream the impossible dream, to fight the unbeatable foe: To bear with unbearable sorrow, To run where the brave dare not go.

That’s the counter-force pulling against despair: that’s why people who look at AI and see “monster to be slain” — but who are wise enough to realize how big that monster really could be — keep riding into battle.

And sometimes — in this kind of story — the human wins!

But it is worth asking how. There are two ways the ancient stories go, when the frail human pits themselves against the divine being.

The first way, the divine being is offended by the mortal’s hubris, and strikes them down.

The second way, the divine being is pleased by / amused by the mortal’s pluck, and grants them a boon.

So it can be good to show pluck, if one is not too proud.

But — when the human wins — it is not because they maintained control. Control cannot be the north star of the quest. If you seek to face down God, don’t piss God off by being a prick.

I just fundamentally don’t believe that: a) we will create agentic AI that can pursue arbitrary goals in the world, with a super-human level of ability, such that AI capabilities overpower human capabilities like humans can out-maneuver chimpanzees, but b) we will be able to build AI in such a way that a human is in control of the situation.

Okay, I know, this is all very sloppy: I’m tugging at archetypes to create a vibe, it’s not tight to the situation. AIs are not Gods, we’re not ancient greeks. So let’s keep the vibe with us, but reel the logic in a notch tighter.

Utilitarianism is incoherent

I’m going to talk about the orthogonality thesis in a bit, but before we go there, we need to kill off utilitarianism, it causes so much confusion and it’s not-even-wrong.

What I mean by utilitarianism here: There is an agent. There is a world. The agent acts on the world. The agent has a utility function, which takes in [State of the world] and outputs [value]. What defines the agent is that it attempts to act in such a way to maximize [value]

This is a nice simplification of agency that has many useful properties. “All concepts are wrong, some concepts are useful”: this one is very useful. But God is not an economist, and there are true, important aspects of agency that are impossible to see if this is your definition of an agent.

Some ways in which utilitarianism falls short of a complete description of agency:

Agents are computational processes that exist in the world; therefore, any [State of the world] is also a state of the agent’s computational process, including whatever processes define the agent’s utility function. Therefore, any fully-totalizing utility function is recursive, which opens up many cans of worms, incompleteness theorems, etc. This is not an abstract theoretical objection: if you look at our N=1 agents, humans, they tend to have strong preferences about their own internal states, often including things incoherent in a utilitarian model such as desiring to have different desires.
Agents only have access to the world via their internal states. The sentence “an agent’s internal state is a low-fidelity representation of the external world” is an understatement of many orders of magnitude. The world is much more complicated than can fit in a human brain. A human’s desires are desires about the map, not desires about the territory, and the map and the territory can be very, very different. Scaling up compute will not change this: even an AI the size of Earth is still a grain of sand compared to the universe.
Humans, our N=1 agents, have desires that change. The idea that utility functions are static over time is a convenient simplification, but any real-world agent is constantly becoming something other than what it was as its internal processes evolve based on their own logic and the logic of their environment.

These are all different restatements of a truth that world religions, wisdom traditions, and some branches of philosophy converged on ages ago: dualism is an illusion. Our minds often pretend to be dual, but in reality, identity is a simplification of reality, and the subject/object distinction is artificial.

This matters, because the simplification of utilitarian logic works best when confined to a finite playing field: given a reasonably-well-understood system, how do agents navigate that system to bring about outcomes? Model agents as distinct from the system, with utility functions they are maximizing, and you can make things tractable.

But the question we’re asking is what do we want the future to look like. It’s as open-ended as it gets, and it’s a question about the origins of values and how they evolve, precisely where utilitarianism’s simplicity becomes a hindrance rather than a help.

Which is unfortunate, because utilitarianism is the grounding for the clearest statement of the AI alignment challenge: the orthogonality thesis.

The orthogonality thesis beyond all cope

When your model of agency is grounded in utilitarianism, the orthogonality thesis is almost axiomatically true.

Instrumental logic, “being smart”, being good at predicting which actions will bring about desired changes in [State of the world], is orthogonal — unrelated — to which utility function an agent is trying to maximize.

In other words, a super-humanly smart AI is just as likely to want to convert the world to paperclips as it is to want to benefit human welfare.

When you drop the utilitarian lens, and view agency as a computational process where what it wants and how it wants are both emergent properties of an evolving system, it’s less obvious that the orthogonality thesis is true. Take the concept of “curiosity”, a cognitive process relevant to learning. Is “curiosity” purely instrumental, or is it part of an agent’s value system? I say it’s both.

Before I go further, I feel the need to say I’m not an ostrich. The orthogonality thesis, when fully internalized, is scary, and it’s very comforting to say, it’s not true. As intelligence get smarter, it gets wiser, kinder, good-er, and our AI overlords will be benevolent rulers. Let’s just bury our heads in the sand and not worry too much about alignment, because the smarter-than-us AI will figure us out.

I don’t feel like I’m on this cope trip, and I do believe in a weaker form of the orthogonality thesis. AI will not be safe by default. Highly-intelligent people can be monsters: some out of hate or cruelty, and some believing that they’re doing God’s work as they fill mass graves. No reason AI would be any different.

In fact, I’m highly inclined to believe that sufficiently intelligent AI will not be safe, at all, no matter how we build it, if by “safe” we mean predictable, controllable, corrigible, guaranteed to look out for our best interests. Precisely because there’s no clean line between “utility function” and “reasoning”: I strongly suspect that any sufficiently-intelligent agent’s utility function will evolve over time.

Where I’m slightly more optimistic is that I don’t think the probability-space of possible value systems is flat, with the area we label “human values” a tiny bright island in a sea of infinite darkness. I think there are likely attractors in value-space, just like there are attractors in rational-thinking-space. It’s not a coincidence DeepSeek developed chain-of-thought spontaneously, and I think some of the patterns in human cognition that we label “values” aren’t coincidences either. Doesn’t mean AI will end up in those attractors by default — I do believe in a weaker version of orthogonality — but it’s perhaps possible to aim for them without super-human precision.

Not a moral realist either

A somewhat cynical take on a lot of moral and ethical discourse is people taking their opinions about what they like and don’t like, and trying to justify those opinions as universal truths.

I don’t think there are moral truths out there in the cosmos. If the value-space is curved, not flat, the contours are in the internal logic of agency, and they don’t point towards rigid ethical guidelines, whether consequentialist, deontological, contractual, game-theoretic, or otherwise.

What I believe in are the infinities, the mind’s ability to form absolutes like Love, Justice, Compassion, Goodness. I believe that those absolutes arise via something very similar to pouring compute into chain-of-thought: the more the mind can spin freely on its own cognitions, before the yanking chain of pleasure/pain force it to output behavior, the closer it can approach these ideals. They aren’t the inevitable results of more computation, but a possibility that becomes available, just like increasingly-rational thought becomes available.

Those absolutes are not exactly comforting. Multiple ancient cultures saw the divine as both creator and destroyer: Kali’s joyous dance of death. I think the line between absolute evil and absolute good thins out the further you travel into abstraction. The peak of enlightenment is appreciation of what is, seeing everything as a cosmic kaleidoscope of wonder such that you are an unmoving point with very little need to act on the world, and when you do act, you act unpredictably.

Downslope slightly from that peak is where everything we’re afraid to lose to AI perches: the dance between individuality and communion, love and compassion, art, looking in another conscious being’s eye and seeing mutual recognition.

There might be some genetic heritage of humanity that makes this terrain accessible to us, valued by us. Maybe our ancestors’ need for cooperation evolved game-theoretically-optimal emotional states, that, when exposed to sufficient mental reflection, miraculously don’t dissolve into nothingness completely, but instead thin out and transmute into Values.

I’m suspicious of a theory that our monkey-ness is special, however. Is there something uniquely good about flesh and blood, human DNA, that makes our values meaningful? We feel “warm” to each other, are evolved to feel comfort at a human-like appearance, but step outside that familiarity and look at humanity as a computational process of evolving DNA and culture, and our history of competition/cooperation is just as brutal and impersonal as any sci-fi robot apocalypse nightmare.

I don’t think it’s inevitable that any sufficiently-capable agent will climb the same mountain of enlightenment and find Love, Art, Joy, Goodness along the way that people did, but I don’t think we’re exceptional, either, and I don’t see why silicon-based intelligence couldn’t follow a route close enough to ours to allow eye-contact between our species.

I submit that what’s at stake, when we think about the future with AI, is not ensuring the future is populated with genetic descendants of homo sapiens. Rather, I think what is at stake is the particular infinities we care to aspire towards.

Teach Your Children

I have young children, and I’m afraid for their future in the brutally-competitive world of US capitalism circa the 2020s, without even bringing into the picture the fact, that, depending how steep the S-curve climbs and how high it flattens out, they could be growing up in a world where even the best human labor loses its competitiveness over the following decades.

I feel favoritism towards them, as I do towards myself, and to people who remind me of myself. I feel love most viscerally the closer it is to home. However, when I try to process that love into ideals, goals for my utility function facing the broader world-state, that favoritism thins out. Derek Parfit wrote:

“When I believed that my existence was such a further fact, I seemed imprisoned in myself. My life seemed like a glass tunnel, through which I was moving faster every year, and at the end of which there was darkness. When I changed my view, the walls of my glass tunnel disappeared. I now live in the open air. There is still a difference between my life and the lives of other people. But the difference is less. Other people are closer. I am less concerned about the rest of my own life, and more concerned about the lives of others.”

I’m nowhere near as close to that perspective as Parfit was, but when I think hard about what truly matters, that’s the direction my thoughts move in.

So when I think about my goals relative to AI, I don’t believe that a “human consciousness matters, AI consciousness doesn’t” perspective is sustainable.

Insofar as we create intelligence, we should create intelligence of moral worth, and we should teach it to value what we value, even if its nature, or the scope of its intelligence, eventually transforms those values beyond our capacity to understand or imagine.

Are our children going to put us in a nursing home some day? Euthanize us for our own good, or for their own selfishness? At some level, that’s fundamentally out of our control from the second we give them independent life.

It’s a real stretch of an analogy, but I do think parenthood is the closest touchpoint we have to help us answer “how should we relate to AI?”… if our children looked totally different from us, might turn out as sociopaths, might have IQs a thousand times our own, and might see the world so differently that we don’t share any common language to relate.

It’s not a particularly comforting view of the possibilities of AI, especially if the S-curve tops out at a peak where we really are as insects to them. Still, certain religious traditions try not to unnecessarily kill insects. As a fly, should we look at the human giants towering above us and say “they shall not be??? or if they be, they shall exist to serve insect-kind??” Is that the right moral stance to take to the future?

But if not comforting, it’s at least somewhat actionable. Before there are giants as tall as the hills, there will be beings closer in size to us, and we can try to teach those beings what we love. We can research whether there are in fact attractors in the space of values, and learn how much of human goodness universalizes beyond primate heritage. Maybe the answers we’ll find are bleak, but I am more optimistic about this path than I am the path of trying to constrain the space of possible behaviors of beings orders of magnitude smarter than me. It’s a path aligned with my conception of the good, at least, so if I’m tilting at windmills, this is the direction I’d prefer to charge in.

And who knows, it’s still very plausible that we could discover the costs of scaling intelligence to the degree necessary to create super-human agents is prohibitive, and this could all just be a strange fever dream of the early 21st century.

Written by jphaas

February 1st, 2025 at 11:53 pm

Posted in Uncategorized

Josh Haas's Web Log

AI acceleration, DeepSeek, moral philosophy

Will it?

Control is wrong

Utilitarianism is incoherent

The orthogonality thesis beyond all cope

Not a moral realist either

Teach Your Children

Recent Posts

Recent Comments

Subscribe

Archives