[Coin-discuss] PPC vs. x86

Wed Jul 5 10:33:35 EDT 2006

Hi Martin,

most probably, the reason is that the two processors are performing
floating point operations slightly different. That means, that the
calculations in the LP solving can be different and the LP solutions
can differ. Since the optimal solution to the LP relaxation does not
need to be unique (there may exist an optimal face of dimension > 0),
it can depend on the internal calculations which of the optimal
vertices is returned.
Then, with a different optimal basis for the root LP, everything
afterwards can be totally different. For example, a different basis
yields different Gomory cuts. A different primal solution vector leads
to a different branching decision.

The Intel processors have double precision floating point registers of
80 bits, and the values are only truncated to 64 bits if they are
stored as doubles in memory. Therefore, it can happen that "r := x + y
+ z;" is not necessarily equal to "r := x + y + z;" ! It depends on
whether the compiler performed the whole calculation in registers
(i.e., on 80 bit precision, truncating it to 64 bit afterwards) or if
an intermediate result is stored in memory, thereby truncating the
intermediate value to 64 bits.

This is very devastating when it comes to debugging. Assume your code
compiled with the "-O3" optimization flag leads to a segmentation
fault. Now you want to debug the situation using "-O0", since you want
to disable inlining and so on for easier debugging. Then, the miracle
occurs: the non-optimized code does not lead to the segmentation
fault. The reason is that without inlining, certain intermediate
results are stored on the stack (i.e., in memory) to be passed to the
subfunction. This truncates the 80 bit register value to 64 bits. In
the optimized mode with inlining, the subfunction is called inline
such that the stack is not needed and the value can be directly used
from the 80 bit register.

There are compiler flags that force the processor to behave strictly
to the IEEE floating point specification, i.e. truncating to 64 bit
after every calculation. You should try whether this gives the same
results as on the PPC. I am not so sure, however, about how the
floating point calculations are performed in PPCs...

Best,   Tobias

Martin Mundschenk wrote:
> Hi!
> 
> I just compiled Cbc on an Intel Xeon running Debian Linux  as well
> as on an Apple PPC G5 running OSX:
> 
> Coin Cbc and Clp Solver version 1.01.00, build Jul  3 2006
> 
> The Intel runs at 3 GHz and the PPC at 2 GHz. I compared the
> performance of both machines on the problem nw04 from the miplib 
> (http://miplib.zib.de/)
> 
> The PPC comes to this result:
> 
> Result - Finished objective 16862 after 0 nodes and 861 iterations
> - took 102.11 seconds
> 
> and the Intel to this one:
> 
> Result - Finished objective 16862 after 484 nodes and 4600
> iterations - took 73.22 seconds
> 
> Why does the identical solver use different numbers of nodes and 
> iterations on the identical problem but on different architectures?
> 
> 
> Regards, Martin
> 
> 
> ------------------------------------------------------------------------
> 
> 
> _______________________________________________ Coin-discuss
> mailing list Coin-discuss at list.coin-or.org 
> http://list.coin-or.org/mailman/listinfo/coin-discuss

-- 
Tobias Achterberg          Konrad-Zuse-Zentrum fuer
                           Informationstechnik Berlin
Tel: +49 (0)30 84185-301   Takustr. 7
Email: achterberg at zib.de   D-14195 Berlin, Germany