Rust is generally believed to be undefined-behavior-free when you write the entire program in safe Rust. However, this turns out false, and has a profound relationship with the Halting Problem.

Side Effects

Let's start off by talking about side effects. If you are doing mathematics, a function is simply a mapping from input to output, nothing more. Whereas in a programming language, a "function" can print a file and return a value at the same time. Anything not essentially related to computing is called a side effect.

Quick quiz, what do you think this is?

fn foo(b: bool) -> () {
    if b {
        loop {}
    }
}

Partial Functions

In Mathematics, a function is total, in the sense that there is an output value for every input value.

This definition doesn't reflect the reality of computer programming. For example, the following program1 never returns an i32 value:

fn bar(y: i32) -> i32 {
    if y > 0 {
        y + 1
    } else {
        unimplemented!()
    }
}

A function corresponds to the behavior of bar is thus partial (as opposed to total), as the return values are not defined when y <= 0.

Moreover, if a program never returns (or, halts) for some input, the function is said to be partial as well.

Why do partial functions matter? Because they allow us to reason about programs with our mathematical knowledge. For example, a partial function $h$ that represents =bar= can be written as:

$$ \begin{equation} h(y) = \left{ \begin{array}{ll} y + 1 & y > 0 \ \perp & y \leq 0 \end{array} \right. \end{equation} $$

And now, we can use $h(y)$ as an ordinary function, as long as we take care of the "bottom" values ($\perp$).

Optimization

When optimizing a program, we want to reserve its semantics while making it faster. A basic technique, Dead Code Elimination (DCE), comes into play.

fn print_y(y: i32) -> i32 {
    println!("Argument: {}", y);
    y * 2
}

Let's suppose println! is a function. Can you remove that call, since it has nothing to do with the computation of y? No, because you the programmer wants a message to be printed out. Therefore, optimization must preserve side effects.

All side effects must be preserved.

Now try this:

fn baz(y: i32) -> i32 {
    let _ = bar(y);
    y * 2
}

Clearly, the return value of bar is dropped. Can you remove that call? No, because it has a side effect: when $y \leq 0$, the program will get stuck.

A natural question to ask is, which calls can be safely removed? Simple: as long as a program is a pure function, or a side-effect-free function. Such functions are also total.

The Halting Problem

If a compiler can find out which functions are pure, then everything will be OK. However, that's not going to work, because of the (in)famous undecidability of the Halting Problem.

The Halting Problem basically asks you to write a program to solve the following problem:

Given any program P, determin if P will halt in finite time.

Or, try to determine if a program will ever return a value, with exceptions or panics not considered "halting".

"How's that impossible?" Looking at the previous examples, they are pretty simple, and, as a human, you can immediately tell that which of them will not halt for what input.

But let's try this. Suppose you have written a solver H of the Halting Problem, which accepts a program and returns true when the argument is indeed halting.

fn paradox() {
    if H(paradox) {
        loop {}
    }
}

How is H going to work for paradox? If H(paradox) returns true, then paradox should halt, but the code suggests this program will not halt. Contradictory.

Therefore, we come to a saddening conclusion that no one will ever write a program to solve the halting problem2.

So what?

A terrifying conclusion follows immediately. If you accept that safe Rust can never have UBs, you can't even do DCE, a very basic optimization, properly, in Rust!

When you remove a call to a function, you basically asserts this function has no side effects, and, consequently, total. A immediate corollary from the undecidability of the halting problem is that the compiler can't know whether your function is really pure.

Compromise, you say? Yes, that's what C++ does. The C++ standard basically assumes every C++ program to have IO operations or terminate. What if it's not? The standard says nothing, and it's up to the compiler implementation to do whatever it wants.

Yes, a typical undefined behavior. That's where the compiler and the programmer can disagree on each other.

In Rust, however, it's generally believed that safe Rust is UB-free. It almost is. If you accept the consequence that you can't remove any unnecessary calls, then yes, no more UBs beyond floating point numbers.

Is this really a big problem?

Let's accept that consequence for now, and try to answer this question: is this a real problem?

A pure function always computes some value waiting to be consumed, so when you see any code like this:

fn main() {
    first(1);
    second(10);
    third(100);
}

You should immediately know that the three calls have side effects, or the programmer wouldn't write code that way!

Nice try, but keep in mind that the Zero Abstraction thing depends on a great optimizer. Generic code also needs this: if something is not needed, the computation won't happen at all. Remember BTreeSet and HashSet?

Status Quo

It seems Rust doesn't really do heavylifting, but LLVM does. LLVM is designed for C/C++, so yeah, you have their UBs as well. And this can lead safe Rust code to crash.

fn foo() -> ! { loop { } }
fn main() {
    foo(); // This call is removed unexpectedly
    // foo shouldn't return, so ud2 is inserted here to catch that
}

Safe Rust has more UBs than I previously thought. :-)

Workaround?

I don't think there's any practical way to "remove" this UB.

Instead of the compiler figuring out whether a function is pure or not, we can let programmers to annotate functions with #[side_effects] or #[pure]. What if the annotation is wrong? Still an undefined behavior.

Can the compiler automatically insert no-op side effects to prevent optimizations like that? I'm not sure, but that seems like a dead-end, too. loop is the simplest case, while whether an implementation of an algorithm can terminate or not is non-obvious.

At least, I think the ud2 instruction can be replaced by an infinite loop. But that's on LLVM's part, I presume?

Conclusion

Just accept that safe Rust has UBs. Or, more UBs than people previously thought.

Credits

This article is a summary of what was discussed in a chat room. Most insights come from Gary Guo. He also mentioned that more optimizations, like loop fusion, actually depends as well on the assumption that IO-free functions will terminate.

1

For clarity, here a function in a programming language is called a program.

2

You can, however, "bypass" this impossibility by only solving a "fraction" of the original problem.