Let’s rewrite the sum \(\sum_{d\mid n}\mu(d)=0\) by trying to omit the \(\mu(d)\) that equal zero. If we do this, the sum reduces to the long, but correct,
\begin{equation*}
\sum_{d\mid n}\mu(d)=\sum_{\substack{\text{all divisors }d\text{ with just one or zero}\\\text{ of each prime factor }p_{i}\text{ of }n}}(-1)^{\text{the number of primes dividing } d}\text{.}
\end{equation*}
Now let’s set up a little notation. First, let’s borrow from
Definition 23.3.3 the notation
\(\omega(d)\) for the number of distinct prime divisors of a divisor
\(d\) of
\(n\text{.}\) Next, for convenience we will write
\(k=\omega(n)\) for the number of (again, distinct) prime divisors of
\(n\) itself.
Then the crazy sum \(\sum_{d\mid n}\mu(d)\) becomes easier to write:
\begin{equation*}
\sum_{\substack{\text{all divisors }d\text{ with just one or zero}\\\text{ of each prime factor }p_{i}\text{ of }n}}(-1)^{\omega(d)}\text{.}
\end{equation*}
If at this point you are asking yourself why I bothered introducing \(k\text{,}\) you may want to think about that briefly while reading the next formula:
\begin{equation*}
\sum_{\substack{\text{all divisors }d\text{ with just one or zero}\\\text{ of each prime factor }p_{i}\text{ of }n}}(-1)^{\omega(d)} =\sum_{d\text{ that work}}(1)^{k-\omega(d)}(-1)^{\omega(d)}\text{.}
\end{equation*}
Note that \((k-\omega(d))+\omega(d)=k\text{.}\)
Let’s step back for a rationale for all this manipulation. Consider each of the divisors \(d\) with no square factors (the ones in question that we are indexing by). Each of these have \(\omega(d)\) of the prime factors of \(n\text{;}\) that means that they do not have the other \(k-\omega(d)\) possible prime factors available to us from \(n\text{.}\) So in the expression \((1)^{k-\omega(d)}(-1)^{\omega(d)}\) we are, in some sense, picking a subset (of size \(\omega(d)\)) of the primes dividing \(n\) and multiplying by \(1\) for each of those; likewise we multiply by \(-1\) for each possible prime not picked.
This is a combinatorial point of view, which means we can count all this picking another way. Instead, consider just picking a subset of \(\{1,2,\ldots,k\}\) and assigning \(\pm 1\) respectively; that would be the same thing. However, we can reinterpret that as picking a particular term in the full expansion of the \(k\)th power of the binomial \(1+(-1)\text{:}\)
\begin{equation*}
(1+(-1))^{k}=(1+(-1))(1+(-1))\cdots(1+(-1))\text{ (k }\text{ times, for }2,3,\ldots , p_k)\text{.}
\end{equation*}
Summing all the possible terms must be the same as calculating this power, so an easy computation finishes the proof:
\begin{equation*}
\sum_{d\text{ that work}}(-1)^{\omega(d)}=(1+(-1))^{k}=0\text{.}
\end{equation*}