[Ipopt] gradient scaling and stopping criteria

Mon Nov 22 18:41:21 EST 2010

Dear Andreas,

thank you very much for this extensive reply.
I see your point, but please allow me to explain
the situation from the user point of view.

| The reasoning behind using the unscaled values for the individual
| error components is that a user might want to specify explicit thresholds
| for the problem as it is posed. Specifically, the gradient-based scaling
| mechanism is activated by default, and in a sense not immediately
| transparent to the user, and (s)he might be confused if the tolerances
| of the individual components are not matching with the posed problem.

It is very hard to correlate the unscaled values of these guys:

| dual_inf_tol
| compl_inf_tol

We do not know how and in what sort of way dual infeasibility
and/or complementarity is related to any variable of our 
forward problem.

As it regards max absolute of constraint violation, I guess this
could be set unscaled.

| constr_viol_tol

All of these thresholds however, make more sense if instead of the
absolute max norm, you adopted the relative max norm. Everybody could
set easily how much percent (%) his/hers constraints could be 
violated. Same thing about dual_inf_tol and compl_inf_tol. 
Moreover, with the adoption of the relative norm, the scaling is 
simplified and it would mean the same for both scaled and unscaled
values. 

Now as it regards PDE constrained optimization, we really have no clue
how to set these thresholds. We could only speculate that, since 
our discretization error is less than a constant times the maximum
of the (timestep, maximum edge length)

||u - u_h||_2 <= C(|grad(u)|) max (h, \delta t)

(it is not clear what the value of the constant is,
it is usually more than 1)

one would expect that this could be an estimate of these
relative thresholds. 

Having said all that, it is up to you to decide whether or
not, it make sense to use relative or %, thresholds instead
absolute ones.

Best wishes,
Drosos.

----- Original Message -----
| Hi Drosos,
| 
| > I noticed that you are using the unscaled values
| > for the stopping criteria (tol excluded), although
| > I have specified specified automatic gradient-based
| > scaling. One would expect that when the problem is
| > scaled, then the stopping criteria would correspond
| > to the scaled values for infeasibility, complementarity etc...
| >
| > Why this doesn't happen, and how could I enforce it?
| > It would be nice if we had a flag so that the scaled
| > values were considered for these thresholds if the user
| > has specified gradient scaling or something similar.
| > If the problem is not scaled, then it makes sense to
| > consider the unscaled values. Could you please shed
| > some light here?
| 
| Your observation is correct, Ipopt looks are the unscaled values for
| the
| termination criteria specified by the options
| 
| dual_inf_tol
| constr_viol_tol
| compl_inf_tol
| 
| On the other hand, the overall tolerance (tol) is the default stopping
| criterion, and it considers the scaled problem (and has some
| additional
| scaling, e.g., based on the size of the multipliers, as described in
| the
| Ipopt implementation paper).
| 

| Also,
| in practice, the model is often written in a way that has real-life
| (e.g., physical) meaning, and a certain constraint violation in the
| original, unscaled problem usually means more than a scaled quantity.
| 
| Also, the purpose of the scaling is mainly to aid the performance of
| Ipopt, since it is used to translate the original (real-life)
| formulation
| into a form that might be easier to solve by Ipopt. We always assumed
| that the original model is what the user actually wants to get a
| solution
| for.
| 
| > Moreover, if I specify hessian approximation to be exact,
| > it is not clear what happens? I would expect, that if I
| > do not provide any hessian information and yet I desire
| > exact hessian approximation, then the algorithm would
| > use finite differences to calculate the columns of the
| > hessian. Could you please explain how could I have IPOPT
| > using finite differences to approximate the Hessian
| > whenever it needs it?
| 
| As Stefan already pointed out, currently Ipopt does not do any finite
| difference approximation of the Hessian. One option is to use
| automatic
| differentiation, such as the COIN-OR project ADOL-C, which also has an
| interface to Ipopt, see, e.g.,
| 
| https://projects.coin-or.org/ADOL-C/browser/releases/2.1.12/ADOL-C/examples/additional_examples/ipopt/LuksanVlcek1_sparse
| 
| Regards,
| 
| Andreas