Many cryptographic algorithms like RSA, Diffie-Hellman key exchange are based on arithmetic operations modulo a large number. These algorithms require to do modular exponentiation and which is implemented via successive squaring. That means, many modular multiplications we are required to compute. In 1985, mathematician Peter L. Montgomery, proposed a method of computing $z = x.y \, mod \, N$ $(N > 1)$ which is much faster and does not require any division by $N$. This method is referred as Montgomery Modular Multiplication.
Let $N > 1$ be an integer. We choose a radix $R(> N)$ co-prime to $N$, i.e. $g.c.d(N, R) = 1$. Bezout's identity says that, if $x$ and $y$ be two non-zero integers with $z$ being their $g.c.d$, then there exists integers $a$ and $b$ such that $$ax + by = z.$$Therefore, since $g.c.d(N, R) = 1$, so there exists integers $a$ and $b$ such that $$aR + bN = 1$$From the above identity, it is evident that $aR = 1 \, mod \, N$. Therefore, $a$ is the multiplicative inverse of $R$ w.r.t. modulo $N$. We denote, $a = R^{-1}$. Similarly, $bN = 1\, mod \, R,$ so $a$ is the multiplicative inverse of $N$ w.r.t. modulo $R$. We denote, $b = N^{-1}$. Therefore, we have $$R^{-1}R + N^{-1}N = 1,$$ where $R^{-1} \in \mathbb{Z}/N\mathbb{Z}$ and $N^{-1} \in \mathbb{Z}/R\mathbb{Z}$. Note that, for $N = 6$ and $R = 8$, $R^{-1} \in \mathbb{Z}/N\mathbb{Z}$ does not exist. So, co-primeness between $R$ and $N$ is a must condition.
Montgomery Reduction: It is a one-one onto mapping defined from $\mathbb{Z}/N\mathbb{Z}$ to $\mathbb{Z}/N\mathbb{Z}$. For $0 \leq i < N$, let $\overline{i}$ represent the residue class containing $iR^{-1}\, mod N$. This is a complete residue system. Let us take an example to understand it better. Let, $N = 11$, $R = 16$ and note that $9.16 - 13.11 = 1$. Hence, $R^{-1} = 9$ and $N^{-1} = -13$.$$\overline{0} = 0.9 \, mod\, 11 = 0$$$$\overline{1} = 1.9 \, mod \,11 = 9$$$$\overline{2} = 2.9 \, mod\, 11 = 7$$$$\overline{3} = 3.9 \, mod \,11 = 5$$$$\overline{4} = 4.9 \, mod\, 11 = 3$$$$\overline{5} = 5.9 \, mod \,11 = 1$$$$\overline{6} = 6.9 \, mod\, 11 = 10$$$$\overline{7} = 7.9 \, mod \,11 = 8$$$$\overline{8} = 8.9 \, mod\, 11 = 6$$$$\overline{9} = 9.9 \, mod \,11 = 4$$$$\overline{10} = 10.9 \, mod \,11 = 2$$Therefore, the mapping $f\, : \, \mathbb{Z}/N\mathbb{Z} \,->\, \mathbb{Z}/N\mathbb{Z}$ such that $$\overline{x} = f(x) = x.R^{-1} \, mod N$$ maps $\mathbb{Z}/N\mathbb{Z}$ onto itself.
This is called Montgomery Reduction and the rational behind this selection is that it facilitates quick computation of $T.R^{-1}\, mod\,N$, for $0 \leq T < R.N$ when $R (> N)$ is chosen as some power of $2$. For $N$ and $R (> N)$ such that $g.c.d(N, R) = 1,$ we know that $R^{-1}R + N^{-1}N = 1,$ where $R^{-1} \in \mathbb{Z}/N\mathbb{Z}$ and $N^{-1} \in \mathbb{Z}/R\mathbb{Z}$. This implies that $$T = T.N.N^{-1} + T.R.R^{-1}$$$$T + T.(-N^{-1}).N = T.R.R^{-1}$$$$T + m.N = T.R.R^{-1},$$ where $m = T(-N^{-1})\,mod \, R.$ This proves that, when $N$ and $R$ are co-prime to each other, then for every $T \geq 0$ there exists and integer $m$ such that $\frac{T + m.N}{R}$ has no remainder. Montgomery Reduction of $0 \leq T < R.N$, i.e., $T.R^{-1}\, mod\,N$ is therefore calculated via the following algorithm and we denote it as $Redc(T)$.
Algorithm Redc(T), where $0 <= T < R.N$
$m = T(-N^{-1}) mod R$
$t = (T + m.N)/R$
if $t >= N$
return $t - N$
else
return $t$
We have already proved that $\frac{T + m.N}{R}$ has no remainder. So, $t$ is an integer. Therefore, $$t.R = T = m.N$$$$t.R = T \, mod \, N$$$$t = T.R^{-1}\, mod\,N$$ Since, $0 \leq T < R.N,$ so $0 <leq T + m.N < R.N + R. N$ and hence $$0 \leq t < 2.N$$ This justifies that, why we return $t - N$ as the value for $T.R^{-1}\, mod\,N$ when $N \leq t < 2.N$. Since $R$ is of the form $2^{i}$ for some $i > 1$, so the values for $m$ and $t$ in algorithm $Redc$ can be calculated easily and quickly using left shift operation.
Montgomery Residue: If $x \in \mathbb{Z}/N\mathbb{Z}$, then Montgomery Residue of $x$ is defined as $x{'} = (x.R) \, mod \, N$. Note that $0 \leq x{'} < N - 1$. It can also be proved that, Montgomery Residue is also a complete residue system, i.e., the mapping $(x.R) \, mod \, N$ is an one-one mapping from $\mathbb{Z}/N\mathbb{Z}$ onto itself.
Now, consider that we want to compute $z = (x.y)\, mod \, N$ for $x, y \in \mathbb{Z}/N\mathbb{Z}$. Then, $$z{'} = (z.R) \, mod \, N = (x.y)R \, mod \, N = {(x.R).(y.R)}.R^{-1}\, mod\, N = (x{'}.y{'}).R^{-1}\, mod \, N = Redc(x{'}.y{'})$$ Now, since $z{'} = (z.R) \, mod \, N$, so $$z = z{'}.R^{-1}\, mod \, N = Redc(z{'}) = Redc(Redc(x^{'}.y^{'}))$$This shows that, $z$ can be computed by invoking $Redc$ algorithm twice successively and $Redc$ algorithm does not require any trial divison to be done since $R$ is some power of $2$. Any division by $R$ can be easily implemented by using left shift operations. But, we need to compute $x{'} = (x.R) \, mod \, N$. Note that, $x{'} = (x.R) \, mod \, N = (x.R^{2}).R^{-1}\, mod \, N = Redc(x.R^{2})$. So, we are also not required to divide by modulus $N$, even to compute $x{'}$ and $y{'}$. But, we are still required to compute $R^{2} \, mod \, N$ in order to compute $x{'} = Redc(x.R^{2})$. But, since $R$ is chosen before hand as a power of $2$, computing $R^{2} \, mod \, N$ is to be done only once and therefore does not add much to the whole computation. Let us now take an example to understand the whole procedure.
Example: Let, $N = 11$, $R = 16$. Then, $N^{-1} = -13$ and $R^{-1} = 9$. Let us take, $x = 6$, $y = 10$ and we are interested to find $z = (x.y) \, mod \, N = 5$ through this montgomery reduction process.
We first compute, $R^{2} \, mod \, N = 256 \, mod \, 11 = 3$ and then $$x{'} = Redc(x.R^{2}) = Redc(18) = \frac{18 + (18.13 \,\, mod \, 16).11}{16} = 8,$$ and $$y{'} = Redc(y.R^{2}) = Redc(30) = \frac{30 + (30.13 \,\, mod \, 16).11}{16} = 6.$$ We will first compute, $Redc(x{'}.y{'}) = Redc(48)$. $Redc(48) = \frac{48 + (48.13 \,\, mod \, 16).11}{16} = 3,$ and therefore $$z = Redc(3) = \frac{3 + (3.13 \,\, mod \, 16).11}{16} = 5.$$ And this proves the efficiency of Montgomery Reduction Algorithm.
References:
[1] Peter L. Montgomery, Modular Multiplication Without Trial Division. In Mathematics of Computation, Volume 44, No 170, pp. 519 - 521, April - 1985.