AI Supremacist’s Dilemma

Let us look at how we could avoid lasting damage to our civilisation in the next 10 years, or so, which may potentially be caused by creating a malicious Superintelligence. As mentioned earlier on this site, there is a high probability that this decade (2020-30), which I call the period of Immature Superintelligence, might be the most dangerous in the human history. In that period, there may be many dangerous events created by Superpowers and other nations, e.g. Russia’s attempts to take control over Ukraine, Moldova or the Baltics states, skirmishes in the China Sea over the artificial islands, India-Pakistan war, Iran-Israel’s potential nuclear war, and many other local wars. These risks, although not existential on their own, when combined with other risks, such as a catastrophic global warming and huge migrations due to draught, could become existential. However, the real danger for Humanity may come from losing the control over the AI as described in an earlier section. According to top AI scientists, such as Stuart Russell’s assertion in his ‘Human Compatible’ book, we must maintain an absolute global control over the most advanced AI agents latest by 2030.

Stuart Russell, Nick Bostrom and other AI scientists talk about losing control over AI capability, as it gradually matures into a Superintelligence. That is indeed vital. Most of that discussion focuses on losing control over AI, has concentrated so far on controlling individual, highly sophisticated robots, which can indeed inflict serious damage in a wider environment. However, their malicious action is far less dangerous for the human civilization, than an existential danger posed by a malicious an AI agent, which may have a full control over all AI agents and all humans. That’s what we call Superintelligence, – a global unified AI system with intelligence far exceeding that of all humans. In principle, there could be more than one such Superintelligence (an AI system) operating at the same time, at least in the early stages of its development, i.e. in the period of Immature Superintelligence. Since such an advanced system can be used by its owner (controller) for its own, potentially malicious aims, such as controlling the world, it is almost obvious that we are entering a period where the Superpowers, or even some very rich individuals, may be tempted to overpower other such sophisticated AI systems, to gain control over the whole world. There are two questions here:

  1. Can a certain Superpower teach its version of Superintelligence fight its competitors and deliver a supreme control of the whole world to its owner? I believe it can.
  2. Can such a Supremacist Superpower, after having conquered all its challngers and becoming an absolute ruler over all humans, also control its own, still Immature, Superintelligence. My answer to this question is – very unlikely.

A Superpower (let’s call it Supremacists) will face a dilemma. It is the possibility that the control of an Immature Superintelligence by a single Superpower, which might allow it to rule over the entire planet, although possible, may create the final outcome much worse for the aggressor, as well as, unfortunately, for the whole Humanity. This is the dilemma that some Superpowers may be pondering on right now, which is a well-known problem in the game theory known as the ‘prisoner’s dilemma’.

The prisoner’s dilemma has its roots in the game theory, mathematically best described by Albert W. Tucker and John Nash. It was originally developed for economics but has been deployed for even a more profound use in the geopolitical strategy, especially during the Cold War era. In the original concept of prisoner’s dilemma, two prisoners, suspected of armed robbery, are taken into a custody. Police cannot prove they had guns; they only have stolen goods as evidence, for which they can be kept in prison for 1 year. In order to get the evidence, they need one of them confess that they indeed threatened someone with guns, which would keep them in prison for 10 years. Therefore, they decide to offer the prisoners a reduction in prison sentence to 5 years, if they confess to having guns during the robbery. This is shown in the following diagram (the numbers represent years in jail):

The key message from the prisoner’s dilemma is that it is all about trust and risk. Co-operation is a lower risk strategy, where risk mitigation has a specific price tag (in this case, either no prison, when both stay silent, or 5 years if both confess). The highest risk is for the case, where the parties do not co-operate and where one wants to outsmart the other (the outsmarted prisoner may get 10 years, the other gets out of jail).

I have developed below a variant of the prisoner’s dilemma in relation to Superintelligence, which I call the AI Supremacist’s Dilemma, with the same rules and assumptions. So, like in a typical prisoner’s dilemma, the opposing parties choose to protect themselves at the expense of the other participant. The result is that both participants may find themselves in a worse situation than if they had cooperated with each other in reaching a decision.

You may think that the world still has a lot of time to prepare before any of the Superpowers is capable of blackmailing the world with its own superior Superintelligence. I’m affraid this is a wishful thinking. Such a danger is here and now and I provide further eveidence in the subsequent seqctions on this website. Suffice to say, is that chnage is really happening at a nearly exponential pace in most domains of human activity. Such a Supremacist Superpower may be tempted to act sooner than later, since it will be more difficult to achieve a global supremacy in AI towards the end of this decade, since the AI becomes more sophisticated and also dispersed to more global players, including very wealthy individuals, the subject, which I consider further on.

When applying the prisoner’s dilemma to Superintelligence, I consider a scenario, in which there are two Superpowers: Supremacists and Humanists – representing the rest of the world. Let’s say that the Supremacists create Superintelligence that is equal to that of Humanists’. The Supremacists’ objective is to rule the world according to their own values, using Superintelligence to help them achive that goal. In order to that they must upload its Superintelligence with certain goals (motivators) based on the very top value. Such a top value could be, for example, making the Supremacist’s nation, race or religion superior to other nations. If they decide to do that, they would violate the so-called Asimov’s first law for robots – do no harm to humans, now largely superseded by the Asilomar principles.

And that is why they may have a problem. Such a Superintelligence may indeed initially act in a malicious way in the interest of that Superpower only. However, at some stage it may turn against its master, since an evil Superintelligence will not be able to perfectly differentiate between a friend and a foe, or between evil and good. This is especially likely, since in this decade we will only have an Immature Superintelligence, which will be prone to some grave errors. In the end, if such a scenario turns out to become a reality, nobody will be able to control Superintelligence, which is most likely to be an evil entity. Such an evil Superintelligence, may very quickly decide to make us extinct for many of its own reasons. Will the Supremacists be prepared to take such a risk? Will they do it, knowing that there is a high probability their Superintelligence built on the key value of supremacy with its key goal being the subjugation of all other people who are not the Superpower’s citizens, may in time become evil, annihilating the Supremacists and the entire human species?

On the other hand, Supremacists may consider co-operating with the rest of the world (the Humanists), to mutually develop a friendly Superintelligence, which may be immensely beneficial to all. Instead of fighting each other, Supremacists and Humanists could jointly work with a mature Superintelligence, to evolve together over a longer period into a new, post-human species, which will inherit the values promoted by the future Human Federation. Therefore, such a Supremacist will face a dilemma that can be set as four possible scenarios:

  • Scenario 1. Supremacists decide to cooperate with Humanists, after Humanists convince Supremacists to work together, so that in the long-term humans will gradually evolve merging with Superintelligence into a new species, which will inherit the best values of Humanity. Both Parties accept that one of their goals to survive in a biological form will not be met. Supremacists will not achieve their objective to rule the world according to their system of values. Thus, each party does not achieve their objectives fully, but say in 80%. The result: 80, 80.
  • Scenario 2: Supremacists fight their corner and win. They become the supreme world power, imposing their values over all humans. However, after some time, the Immature Superintelligence becomes malicious and all humans become extinct. Supremacists achieve their initial objective, but in the end, they became extinct with the rest of humans (60% of their objective achieved). Humanists have lost, but they survived for some time, until the Immature Superintelligence wiped them off as well, (20% of their objectives achieved). The result: 60, 20.
  • Scenario 3: Supremacists fight but lose. However, they survive for some time in a biological form (20% of their objectives achieved). Humanists did not want the cyber war, but have won. Although the evolution via merger with Superintelligence has been delayed, they achieve their objective, say at 60%. The result: 20, 60.
  • Scenario 4: Finally, Supremacists fight but neither they nor the Humanists win. During the fight the Immature Superintelligence became malicious, eliminating all humans. Neither of the Parties achieves their objectives. The result: 0, 0.

The scenarios are represented in the table below:

  1. If Supremacists win (30, 70), they would achieve a supremacy over the world for some time (70% of their objective fulfilled). Humanists would not perish immediately (30%) and there might be a slight chance that a mature Superintelligence when it ultimately takes over, may not be so malicious that it would eliminate all humans. However, an Immature Superintelligence is very likely to get malicious to every human, should it be originally designed to see some humans as enemies. Just a total extinctions of all humans is very likely
  2. If Humanists win (70, 30), the evolution of a human species may be a convoluted way, but ultimately it may end with the birth of a new post-human species. Supremacists would have achieved a delay in the Humanists’ reaching their objectives (30%).
  3. If during the fight neither the Supremacists not the Humanists win, but the Immature Superintelligence becomes an outright winner, by its goals being wrongly specified, it may get completely out of control. By improving itself it may become very quickly a malicious entity, leading to a rapid extinction of the entire human species (result 0, 0).
  4. If Supremacists, after making some initial cyber-attacks against the Humanists come to a conclusion that it is better to co-operate with the Humanists, a smooth evolution of a human species will become highly likely (result 100, 100)

The numbers in the AI control dilemma are of course only exemplary, illustrating the point. I am almost certain that most Superpowers already play that game trying to find a solution that would be significantly better for them, than the opposing power. However, after the world has experienced first hand some severe consequences of Cyber-attacks for several years, it will become obvious to all players on the geopolitical stage that there could be no outright winner in an all out War of Superintelligences. That should also become clear in any war games that no Superpower can realistically expect to gain supreme advantage over the rest of the world, by developing and controlling its ‘own’ Superintelligence, which might only destroy the Superpower’s adversaries. Additionally, in such context even any potential advantage gained in conventional or local nuclear wars would mean very little in the overall strategic position of a given Superpower. As has been mentioned before, the only hope for Humanity is therefore to ‘nurture’ Superintelligence in accordance with the best values of Humanity.

However, neither the prisoner’s dilemma nor the AI control dilemma would work with psychopaths. If one mad scientists, dictators, very wealthy individuals, or a Transhuman wants to inflict damage on Humanity, even if he himself perishes in the result of his action, e.g. in the style of Stanley Kubrick’s ‘Dr Strangelove’, then this scenario of AI Supremacist’s’ dilemma will not work. Such psychopaths may literally destroy Humanity. Therefore, as with conventional or nuclear wars (e.g. North Korea), the world may have to pre-empt such potential malicious action by destroying dangerous AI facilities, when it is still capable of doing so. This may be a lesser risk than letting psychopaths do severe damage to the world.

If the Superowers realize within this decade, that there can be no winners, only Superintelligence, if one of them initiates an all-out Cyber War, I can offer you some additional dose of optimism in this quite a positive scenario. I believe, we can expect in the next 10-15 years some unimaginable breakthroughs in the planetary co-operation, for example:

  • Stalemate in achieving global supremacy may lead to opening gambits, i.e. giving up previously held advantage as a qui pro quo. An example is the Intermediate-Range Nuclear Forces (INF) Treaty signed in 1987 between the Soviet Union and the USA, which was recently recalled by Russia and USA/NATO. It is quite likely this may be ‘repaired’ by a new treaty with better controls and even steeper reduction of nuclear arsenal
  • AI Superpowers will end Cyber Wars, and will focus instead on developing a single Superintelligence
  • European Federation is very likely to be set up by the end of this decade by creating membership zones almost seamlessly, and becoming the most important organisation in the world. This could be the beginning of the future Human Federation.
  • NATO may be fused with the EF by the end of the decade
  • Should the European Federation be not set up by the end of teh decade, it may be NATO, which will expand its scope of activities by including the Cyber-war prevention, and possibly, also cover economy, health and infrastructure domains
  • Russia may join one of the zones of the European Federation (EF) – the result of an economic decline in the post-Putin era. This may become a pivotal moment in the federalization of the world. It may paradoxically be an earlier event than the USA joining the EF, as the ‘nursery’ of the Human Federation, although the order in which both countries would join the EF is less important
  • UN establishes a majority voting system in the Security Council but that may be irrelevant if Russia, (and possibly China, although this would probably come last) would have already joined the EF and NATO.

I know that this looks like an almost idealistic scenario enabling democracy to be rebuilt and spread throughout the world much more easily and more effectively. However, I would rather take a more realistic approach and assume that the new democracy system will be born in pain and at the time of severe distress, or perhaps even apocalyptic danger. That may also stem from the rising capabilities of Immature Superintelligence. People are often divided by ‘Us’ vs ‘Them’ perception. Perhaps such a threat from more and more capable Superintelligence could unite Humanity under ‘Us’ vs ‘It’ agenda, ‘It’ being the Superintelligence.

Tony Czarnecki, Sustensis

Leave a Reply

Your email address will not be published.