[Coin-ipopt] Convergence and BFGS Hessian computation (fwd)

Wed Oct 11 15:33:14 EDT 2006

Hi Sylvain,

>From what you write it seems that you are using numerical computations to 
compute your constraint values and/or derivatives.  This can easily 
explain why the primal infeasibility has so much trouble going down. 
(When you scale down your constraints by 1e-3, as you did, it of course 
also makes the constraint violation smaller by that quantity - max 
gradient 300 is fine, 0.3 might be a bit small.)

AMPL would allow you to write down NLPs, if you have them in analytical 
form, i.e., all functions given as equations.  You can write is down 
pretty much as you would have it on paper, so it is easy to model.  And it 
would take take of evaluating the deriatives for you.  However, if you are 
using numerical prodecures to compute constraint function values, this is 
difficult.

In some cases one can avoid numerical procedures like integration, by 
performing the discretization in the NLP itself (some people call this the 
"simultaneous approach").  This increases the number of variables, but the 
problems are usually very sparse, and converge nicer.

By the way, I also discovered a bug in my implementation of the L-BFGS 
approximation that I fixed now and put into the 3.2 stable release of 
Ipopt.  You can get it via subversion from the URL

https://projects.coin-or.org/svn/Ipopt/stable/3.2

This might make the convergence a bit better.  However, the problem of 
slowly converging dual infeasibility is still a real problem, and at this 
point I only have some simple heuristics to offer in the code...

Regards,

Andreas

On Mon, 9 Oct 2006, Andreas Waechter wrote:

> Forwarded on request of sender - I deleted attachment though (too large)
>
> ---------- Forwarded message ----------
> Date: Sat, 07 Oct 2006 16:15:18 +0900
> From: Sylvain Miossec <SylvainMiossec at yahoo.fr>
> To: Carl Damon Laird <claird at andrew.cmu.edu>,
>    Andreas Waechter <andreasw at watson.ibm.com>
> Subject: Re: [Coin-ipopt] Convergence and BFGS Hessian computation
>
> Hi Carl, Hi Andreas,
>
> thank you very much for your detailed responses.
>
> I cannot use my work e-mail adress, so I just send this mail to you. If you 
> think interesting, can you forward it on the coin-ipopt mailing list ?
>
> I hadn't time to test all you proposed me. I will not be able to continue to 
> work on it next week, so I send you my current achievements.
>
> I increased the maximum number of iterations and multiply the objective by 
> 100, set the L-BFGS history to 20, reset the tolerances to their default 
> values. I also looked at the values of the jacobian of the constraints. The 
> maximum values were about 300, so I scaled the optimization parameters to 
> obtain maximum values of about 0.3.
> Those modification made the objective to go down a bit faster. The scaling of 
> the optimization variables allowed to decrease the primal infeasibility to 
> about 1e-3. But I still need a lot iterations (almost 2000) so that the 
> criteria reaches the one obtained with the L-BFGS history of 131 after 300 
> iterations.
>
> I also set the 'recalc_y_feas_tol' to 1 and the dual infeasibility tolerance 
> decrease to almost 1e-2.
>
> I send you a run I obtained with a good result.
>
> If I increase the stopping tolerances, I think I can get the optimization to 
> finish earlier, but it would not be with the minimum criteria, since the 
> values of lowest infeasibilities are not well correlated with the lowest 
> criteria (I can get the lowest values of feasibilities and the criteria is 
> later decreasing again).
>
> So it seems to me the convergence when we get closer to the optimum becomes 
> very slow. Perhaps my computations are not precise enough. I think I still 
> have to look at the scaling of the optimization variables independently, and 
> also of the constraints. And I don't know what are very nonlinear 
> constraints. But I suppose I have quite nonlinear constraints.
>
> Some more details on the way I do the motion optimization. I use a method 
> based on the inverse model of the robot. That is, I parameterize the 
> trajectory of the robot and compute the control needed. My criteria is the 
> energy consumption of the robot. It is then an integral. I use trapezoidal 
> method for the moment, so my gradient computation should not be approximated 
> (just to the numerical precision). The dynamic model of the robot involve 
> many recursive computations, which perhaps increases numerical errors.
> I don't know a lot about AMPL and the automatic differentiation, but I think 
> I cannot use it with my program as it is. I could get only analytic 
> expressions, but not easily.
>
> Best regards
>
> Sylvain Miossec
>
>
> Carl Damon Laird wrote:
>> Hi Sylvain Miossec, glad to hear IPOPT has been working for you. Sorry for 
>> the long response, but there are a few things to discuss here.
>> 
>> As for the convergence criteria, you are using:
>>> tol : 1e-3
>>> acceptable tol : 3e-3
>>> expect_infeasible_problem : yes
>>> dual_inf_tol : 1e-6
>>> compl_inf_tol : 1e-6
>>> acceptable_dual_inf_tol : 3e-3
>>> acceptable_compl_inf_tol : 3e-3
>> 
>> You have correctly seen that IPOPT has two levels of convergence criteria, 
>> standard and acceptable. Even if the standard tolerance criteria are not 
>> satisfied, if the acceptable criteria are satisfied for a certain number of 
>> iterations, then the problem converges with an "acceptable" solution.
>> 
>> Is your problem converging with an acceptable solution?
>> 
>> Also, the parameters tol and acceptable_tol are relative tolerances (they 
>> are scaled as per the paper), while the dual_inf_tol, compl_inf_tol, and 
>> constr_viol_tol (as well as the acceptable versions) are all absolute 
>> tolerances. IPOPT only converges when both the relative and the absolute 
>> are satisfied. I notice that you have actually tightened the absolute 
>> tolerances (to 1e-6 from 1e-4) If you have a problem that is not ideally 
>> scaled, the absolute tolerances will be difficult to achieve. In some 
>> cases, when problems have had poor scaling, we have greatly loosened the 
>> absolute tolerances to enable convergence. Of course, changing the problem 
>> scaling would be better.
>> 
>> You might try increasing these absolute tolerances. Also, is there any 
>> reason why you have changed the absolute tolerance for dual infeasibility 
>> and complementarity, but not for constraint violation (constr_viol_tol)? I 
>> would try increasing all three of these and seeing if that helps.
>> 
>> One final comment. If the problem is poorly scaled or nearly singular, the 
>> multipliers may be converging to very large values. This will, in turn, 
>> cause the dual infeasibility to be large. Fixing the scaling of the problem 
>> can help greatly.
>> 
>> As far as the L-BFGS versus the BFGS I don't have too much to say except 
>> that the C++ version of IPOPT does not have a BFGS reduced space option 
>> like the Fortran version. Therefore, I think L-BFGS is your only choice 
>> with C++ IPOPT right now.
>> 
>> Hope this helps and let us know if you have further difficulties,
>> 
>> Cheers,
>> 
>> Carl.
>> 
>> 
>> On Thu, 5 Oct 2006, Sylvain Miossec wrote:
>> 
>>> Hi all,
>>> 
>>> I am using IPOPT for a year now and I found it quite fast for the type
>>> of problems I solve (robot motion optimization). Thanks a lot for the
>>> good job.
>>> 
>>> But I have one problem is the proper convergence of the optimization.
>>> Now I used to stop the optimization by reaching the maximum number of
>>> iterations. And then checking that the criteria is not changing a lot
>>> during the last iterations, and that the constraints are well satisfied.
>>> The motions I obtain for the robot look also optimal.
>>> 
>>> Does anyone would have an idea why I don't obtain proper convergence ?
>>> 
>>> I provide two files (in a zip compressed file) of some runs of 
>>> optimization for two different history of BFGS approximation.
>>> 
>>> Since the Hessian computation is very complicate to program in my case,
>>> I use the limited-memory BFGS approximation. I tried a history of 6 for
>>> the BFGS computation and 131 (the size of my problem).  The BFGS
>>> computation is slower with a 131 history but the convergence is better.
>>> I have another question then : would it be better for my problem size to
>>> use a BFGS approximation (not limited-memory one) ?
>>> 
>>> the parameters I used for IPOPT are the following :
>>> tol : 1e-3
>>> acceptable tol : 3e-3
>>> max_iter : 300/500
>>> expect_infeasible_problem : yes
>>> hessian_approximation : yes
>>> limited_memory_max_history : 131/6
>>> dual_inf_tol : 1e-6
>>> compl_inf_tol : 1e-6
>>> acceptable_dual_inf_tol : 3e-3
>>> acceptable_compl_inf_tol : 3e-3
>>> check_derivatives_for_naninf : yes
>>> derivative_test : first-order
>>> alpha_for_y : full
>>> 
>>> During the optimization, the dual infeasibility is not really decreasing
>>> (and not reaching the stopping tolerance) while the criteria seems to
>>> converge. Is it normal ? Could it be due to a scaling problem (I scaled
>>> the problem myself by trial and error). Could it be due to the Hessian
>>> approximation ? Perhaps it is another reason for using BFGS
>>> approximation instead of limited-memory one. Or perhaps I should try the
>>> real hessian computation ? (but I would like to be sure it would solve
>>> my problem, since it will take lots of time to implement).
>>> 
>>> An help on this point would be greatly appreciated.
>>> 
>>> Best regards
>>> 
>>> Sylvain Miossec
>>> 
>>> 
>> 
> _______________________________________________
> Coin-ipopt mailing list
> Coin-ipopt at list.coin-or.org
> http://list.coin-or.org/mailman/listinfo/coin-ipopt
>