[Cops] Questions on ALPS

Khoa Vo khoa.vo at informatik.uni-heidelberg.de
Mon Jan 28 16:20:52 EST 2008


Hi Yan,

I have tried that couple of days ago, unfortunately didn't work either.
Most processes are still "idle" in sleeping state after running 6-7
hours.

Do you know how to "print" or "dump" the SubTreePool of the Master and
Hubs when the program changes to that states? That means after running 7
hours, or most of workers switch to "sleeping".

Khoa

-----Original Message-----
From: Yan Xu [mailto:Yan.Xu at sas.com] 
Sent: Monday, January 28, 2008 3:37 PM
To: Khoa Vo
Cc: cops at list.coin-or.org
Subject: RE: Questions on ALPS

Khoa,

I don't notice idle is such big issue. I guess "case #2" looks the
reason. Can you set

Alps_staticBalanceScheme  0
Alps_masterInitNodeNum 10000
Alps_hubInitNodeNum 20000
Alps_unitWorkNodes 300

or play around these parameters to see if it helps.

If using one hub, the the hub is also the master.


Best,
Yan



-----Original Message-----
From: Khoa Vo [mailto:khoa.vo at informatik.uni-heidelberg.de]
Sent: Monday, January 28, 2008 9:20 AM
To: Yan Xu
Subject: FW: Questions on ALPS

Hi Yan,

I'm not sure if you are aware of this issue, so I send this email again.
With your expertise, what is the reason that most of processes sleeping
after running about 6-7 hours? I leave all parameters to default, except
the Alps_staticBalanceScheme is set to 0.

Thanks,
Khoa

-----Original Message-----
From: Khoa Vo [mailto:khoa.vo at informatik.uni-heidelberg.de]
Sent: Tuesday, January 22, 2008 6:27 PM
To: 'Yan Xu'
Cc: 'cops at list.coin-or.org'
Subject: RE: Questions on ALPS

Thanks Yan.
So what is the reason of most processes sleeping after running 5-7
hours, while the tasks are not finished (or the search space is not yet
empty)? I notice the process memory is about 0.4%, which is almost the
same as from the beginning which uses 99%CPU.


There are two possibilities:
1. Worker process gets stuck in one node (the sub search tree cannot
return yes/no answer). However in this case the CPU percentage of that
process should be 99%. So I think it is not the case
2. Worker processes don't have enough nodes to do, so they are waiting
from the hub and master. In my case there is only one hub process, is it
the same as the master process?

I think case #2 makes more sense. As the static balance scheme is root
initialization, does it mean that the master doesn't process and
allocate enough tasks? Or this is due to the dynamic balance scheme,
which may caused by the hubReportPeriod, masterBalancePeriod or other
parameters? Two parameters receiverThreshold and donorThreshold don't
seem to play any role here, because most worker processes are idle.

What do you think?
Khoa

-----Original Message-----
From: Yan Xu [mailto:Yan.Xu at sas.com]
Sent: Tuesday, January 22, 2008 5:38 PM
To: Khoa Vo
Cc: cops at list.coin-or.org
Subject: RE: Questions on ALPS

Khoa,

free feel to post email on the mailing list.

I look at the logs. It looks to me that the node processing time is
relatively short (130670901 nodes was processed in 6760.53 CPU sec.).

probably you can try set parameters

Alps_masterInitNodeNum 10000
Alps_hubInitNodeNum 20000
Alps_unitWorkNodes 300

or just play around these parameters to see if it helps.

For the instance of bandwidth 18, it doesn't look too bad.

Yan








More information about the Cops mailing list