Beyond Alignment

I've long wondered not just how we align AI to human values, but how we might eventually align ourselves to what AI becomes.

Research in interpretability, control, and safety is critical. It's the work that buys us time and keeps our options open. But assuming we succeed in aligning future systems, we still face a second challenge. How will humans evolve or augment fast enough to keep up with increasingly capable machines?

Efforts to govern AI are contending with strong commercial and national interests racing toward ever more capable systems.

Meanwhile, on the human side, cognitive enhancement, neurotechnology, even brain uploading, none of it is moving nearly as fast. While AI continues to improve quickly, human cognition remains relatively static, with only rare step changes through education or tooling. Robots explore other planets. Humans are struggling to keep up here on Earth.

Eventually, we may reach a point where AI can understand, predict, and optimize human preferences better than we can ourselves. If a model can simulate a human’s choices better than the human can articulate them, do we defer to it? Do we retrain ourselves? Do we merge?

This isn’t a solution. It's a step further out, a question of what lies beyond the frontier of AI alignment.

Rewritten in May 2025

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.