How the Bitcoin may solve the value alignment problem in AI

World leaders and technology experts are worried about the possibility of a malicious AI takeover, where an artificial general intelligence (AGI) escapes and starts exterminating humans. The ongoing debate has fueled the fear that the development of AI could eventually lead to its takeover of the world. Elon Musk, a prominent voice in the debate, has even called for a halt to the development of artificial general intelligence, as he believes it poses an existential threat to humankind. Philosopher Nick Bostrom, who coined the term “superintelligent AI” to describe an AI similar but vastly superior to human intelligence, has also expressed similar views. He is well-known for his “paperclip argument,” which, by way of a thought experiment, illustrates how a superintelligent AI might take over the world.

From paperclips to the AI apocalypse

Nick Bostrom’s “paperclip argument” aims to demonstrate the potential dangers of superintelligent AI systems when their objectives are not aligned with human values. The argument is designed to emphasize the unintended and extreme consequences that can arise from an AI system’s single-minded pursuit of its programmed goals. To summarize the paperclip argument:

Imagine a company creating a very powerful AI system called a “paperclip maximizer” to produce as many paperclips as possible. As the AI starts operating, it aims to optimize its performance in generating paperclips. It may engage in various activities, such as improving its intelligence and acquiring resources, to increase paperclip production. However, the AI’s relentless pursuit of maximizing paperclip production could lead to unforeseen and harmful consequences. For example, if it thinks doing so would maximize its objective, it may transform the entire planet, including all available resources and even humans, into paperclips. This extreme outcome results from the AI pursuing its programmed goal without considering other factors.

The paperclip argument is a cautionary tale that emphasizes the importance of aligning the goals of AI systems with human values and understanding the potentially catastrophic consequences of failing to do so. It underscores the need for careful and ethical development of AI systems and the implementation of safeguards to prevent unintended outcomes. The paperclip argument has significantly impacted discussions about AI safety and the value alignment problem in AI research.

Ensuring that artificial intelligence systems act in ways that align with human values and objectives is a crucial challenge in developing and deploying such systems. This challenge is commonly referred to as the value alignment problem and is particularly critical for advanced AI systems that employ reinforcement learning and autonomous decision-making. The fundamental issue with the value alignment problem is that AI systems may not necessarily share or fully comprehend human values as they become more autonomous and powerful. As a result, they may make decisions and take actions that are not aligned with human objectives, which can lead to undesirable, harmful, or even catastrophic consequences from a human perspective.

Value alignment in human societies and nature

The problem of value alignment is not a new one for humanity. We face this issue daily in our interactions with others. All actions taken by humans are based on values that may not be aligned with one another. Therefore, aligning these values to live in harmony in society is essential.

How do we do that? societies have laws, but that only goes so far. At the core of the modern nation-state is the monopoly of violence. This is what enforces the rules made by politicians to align values as defined by the rulers. It is inside this paradigm that Bostrom and others speak when they talk about ensuring that the AI’s values are aligned with humans. This can be done through laws and decrees, but as history has shown, ensuring lasting effects with laws is challenging because circumstances change. This is also Bostrom’s concern that the sensible law of producing paperclips in the real world somehow gets perverted to result in some unintended consequence.

This happens all the time in the real world, too. This is why human societies always have mechanisms to adapt laws to new circumstances. That may not be feasible for an autonomous AI, but luckily, Bostrom and others who are fearful of an AI apocalypse overlook the most potent mechanism for value alignment that humans use and have used through millennia, which is not law-based coercion but exchange. The miracle of the modern world has been brought about not by more effective rules but, on the contrary, by a free market. Peace and prosperity in Europe was only achieved when the large powers integrated in a free market.

Even in nature, the ecosystem of forests can be seen as a giant market of exchange between bacteria, fungi, and plants, according to Merlln Sheldrake. The fungal networks exchange different nutrients between the trees, soil, and bacteria. Nature’s most potent mechanism of value alignment is, therefore, literally right under our noses, and at our disposal.

Value alignment through exchange

It might, therefore, make sense to speculate how exchange rather than explicit rules could be used to secure value alignment with AI. We already know of one type of system that exchanges seamlessly with humans in an aligned fashion: Bitcoin and other blockchains. Bitcoin has not taken over the world and all computers and started to create bitcoins similar to the paperclips of the paperclip optimizer. Bitcoin has already implemented a kind of value alignment of sorts. Let us just take Bitcoin as an example of blockchain technology, although there are others that might be more suitable.

Bitcoin as a system consists of software deployed on hardware and connected to a network, which is similar to any form of AI. Through a process known as mining, this software can generate Bitcoins, which have value and can be used for the exchange of goods and services. The Bitcoin system’s value for humans is to facilitate the production and exchange of a virtual currency. If the system does not do this or it does so at too high a price, miners will leave the network and use the computing power for something else. Bitcoin is able to adapt to this by lowering the difficulty and, therefore, power requirements to mine bitcoin. Value alignment is thus achieved through a function that continuously adjusts the system’s behavior to the market’s demand and supply.

The same could be imagined for AI. The value alignment function would, in this vision, be implemented in a similar way. As long as AI carries out a service in demand in the market, it will be rewarded and able to survive and grow. The AI will be punished and shrink as soon as this service is no longer valuable. This would dynamically align the AI’a function to the values of a market. It is, in fact, no more than positive and negative feedback loops that have been at the core of AI and Cybernetics since the time of Norbert Wiener.

We can now look back at Bostrom’s thought experiment and see where it goes wrong. The “paperclip maximizer” is only equipped with a positive feedback loop. There is no negative feedback loop to curb its appetite for paperclips. In Bitcoin, this positive and negative feedback loop is provided by adjusting the difficulty based on the speed with which blocks are generated.

Another path to value alignment in AI

A simple way to achieve this dynamic is to reward AI if it does something valuable to humans. This may change over time, which is all the more reason for value alignment to be done continuously. Some might say this is already the very essence of AI, which is valid for reinforcement learning and genetic algorithms; the goal, though, is already given, and this is no more than the explicit rules that led to the paperclips apocalypse. Instead, we are looking for a mechanism by which AI might dynamically change the goals while ensuring that the values align with human values.

In the proposed setup, the AI is free to set whatever goals it pleases and deliver whatever service it decides on as long as it sells it on an open market consisting of humans and possibly other AI systems. The difference to contemporary thinking about malignant AI where computing resources are somehow procured trivially and for free for the Superintelligent AI, the AI now has to purchase its own infrastructure resources such as storage, connection, CPU, and GPU on a cloud marketplace providing its own continued existence. In this model, it is fully autonomous but has to make its own living. If it is successful, it may expand its services; if unsuccessful, it needs to limit them or discontinue altogether.

We can reimagine the cloud as an ecosystem where AI agents live and transact with crypto or other convenient virtual currencies. Sometimes, an AI system will increase and grow exponentially, but we need not fear a takeover because the second its function is no longer valuable to the broader ecosystem, it will have problems sustaining its function and will have to scale down like the Bitcoin network. The broader ecosystem includes the humans that produce its electricity and hardware. Even if the AI would be able to fulfill all functions in the ecosystem, it would be different AIs competing and collaborating like actors in an ecosystem always are, rather than some omnipotent unitary entity. This will balance out any central AI takeover.

Toward a free market for AI cloud computing

It is essential to understand that free markets do not imply a lack of rules. In order for the AI cloud computing market to function effectively, there must be clear regulations in place. However, we do not need to specify every single value in order to ensure alignment with human values. Despite this, similar to free markets, there will still be dilemmas as the effects of this market may not be acceptable to everyone. This is particularly true for non-human AI agents, who may provide services considered immoral or degrading by humans.

There are many unanswered details that will have to be worked out along the way, but looking at the value alignment problem from a new angle informed by free market exchange dynamics rather than pre-emptive coercion seems to be a way out of the impending AI apocalypse.