Comments by "Jim Luebke" (@jimluebke3869) on "Goal of Self-Preservation in Intelligent Systems | Michael Littman and Lex Fridman" video.
-
OK, so here's traditional AI training:
1) Define success criteria, construct training dataset
2) Iterate the AI operating on the training data, evaluate against success criteria
3) Automatically adjust AI parameters based on whether the current iteration gives you more success than the last
4) Stop training when you've hit an optimum state
This isn't an expert system, with a lot of "if this / then that" rules. The rules emerge from the evaluation of success or failure, and they are not explicit, but rather embedded (largely cryptically) in the multitude of parameters making up the AI.
You can structure your training data and success criteria such that you have "exclusion zones" of various weights. The destruction of the AI could be 100x bad. Disobedience to a human command could be 10,000x bad. Harming a human being could be 1,000,000,000x bad (or an absolute fail). But this makes the structure of your training data and success criteria FAR more complicated than it needs to be for, say, an AI whose only task is determining "hotdog / not hotdog".
The level of awareness necessary to know what a human is, what could harm a human, what a human command is, what the AI itself is, or what could harm the AI is, is an extremely complex thing to parameterize.
Human beings ourselves have a hard time with ethical dilemmas, such as "Is it wrong to lie to a woman about whether she's attractive to someone she's attracted to, if it would hurt her feelings to think she wasn't?" Asimov looks into this, in his short story "Liar!"
1