First off, if it was to be parenthesized, it would go:
a ^= (b ^= (a ^= b));
  3     2     1
This just helps visualize the order that things happen in.

Lets take

a = 1001 1011
b = 1100 0110
The first XOR modifies a:
a  = 1001 1011
b  = 1100 0110
a' = 0101 1101
The second XOR then takes a' and b, storing the result into b':
b  = 1100 0110
a' = 0101 1101
b' = 1001 1011
Note that b' now is the same value as the original a. The last XOR then operates on a' and b', storing into a":
a' = 0101 1101
b' = 1001 1011
a" = 1100 0110
The final results stored (a" and b') are the same as the original a and b, except swapped.
Now, the question of why does this work?

Take two values:

c = 1100
d = 1010

e = (c xor d)
e = 0110
(d xor e) == c
(c xor e) == d
In the above example, the 'e' equivalent was stored into a'. The second operation then extracts a from the xor'ed value and stores it into b'. Then operating on the 'e' value again, this time with the original a value (stored in b'), the b value is extracted and stored into a".
As a note, many compilers will complain if you do this math on pointers. Furthermore, this only works with same typed values: swapping a float with an integer is a bad idea.
Why would you use this?

In days of old computers had very few registers and not much memory either. Without the XOR Swap, working from the 8088 processor, to swap 2 memory locations one would have to store the register somewhere in memory, load the meory desired to be swaped into the register, store that memory somewhere, load the 2nd piece of meory into the reigster, store that at the first place, load the saved memory into the reigster, store that at the 2nd spot, and load the register with the original memory - 8 steps and may cycles.

On the other hand, doing a register, memory xor took about the same amount of time as doing a load from memory into the register. Thus the inner 6 steps can be shortened to 4 steps. Load the register with the value, xor it with the 2nd value (destination in register), xor or that with the first value (destination in memory), and xor the register with the 2nd value (destination in memory). This resulted in saving two instructions (actualy quite valueable) and improved performance by about 33% for that operation. This was a valueable operation.

Today, this is more for show and understanding how the compiler works than practical useage. Modern processors have dozens of registers and swaping two values likely has two or three registers that are not being used (if this is even necessary - the values may be in the registers themselves) at which point it is simply load, load, store store.