\documentclass [11pt]{article} \usepackage{ecltree} \usepackage{latexsym} \usepackage{epsf} \usepackage{psfig} \usepackage{epsfig} \usepackage{epic} \usepackage{amsmath} \usepackage{amssymb} \begin{document} \title{ Estimate Node Selection Methods and Diving in Depth } \author{Gabrielle Assunta Gr\"{u}n \thanks{School of Computing Science, Simon Fraser University, Burnaby BC V5A 1S6}} \date{\today} \maketitle \begin{sloppypar} \section{Implementation Notes} The implementation is based on the COIN release of mid February to late February, and thus, the entire COIN directory is attached. My modifications to the code are contained by The "////ADDG" comment both before and after the inserted lines of code wherever possible. The following files alone have been modified or added: SbbEnum.hpp, StringConv.h, SbbBranchActual.cpp, SbbBranchActual.hpp, SbbBranchBase.cpp, SbbBranchBase.hpp, SbbModel.cpp, SbbModel.hpp, SbbNode.cpp, SbbNode.hpp, sample2.cpp, runtimes, and Makefile (Makefile.sample2). Virtually all of the 26 node selection methods are described in the documentation below. I have not implemented the pseudocost estimates (at present, the choice for the pseudocost estimates is fixed, i.e. given two nodes x and y, x is always selected) or the various diving strategies yet. An additional enhancement would be to accommodate both maximization and minimization, and not just minimization. The main program in sample2.cpp is designed to solve a MIP problem using each node selection strategy one after the other. The one that takes the minimum amount of time should be evaluated. \section{Introduction } The basic definitions are taken from Section 1 of the depth paper. The key ones are repeated here. $\mathbf{x_{1'}}$ is a $n_{1} \times 1$ variable vector, such that $ \forall x_{j} \in \mathbf{x_{1'}}, x_{j} \in Z $ ir in other words are integer. $z_{L}$ is the best integer feasible solution so far and the lower bound of $z^{*}$. , the optimal solution to the IP problem. The value of the objective function is $z^{i}_{U}$ (the upper bound at node $i$). The last two definitions assume maximization. For minimization, the terms ``upper'' and ``lower'' (``U'' and ``L'') are interchanged. \section{Estimate Based Methods of Node Selection to be Tested} \emph{Most variable Restrictions}:Pick the node with the least sum of allowable variable values, AVV=$\sum_{j=1 to n}{k_{j}}$ where $k_{j}= u_{j}-l_{j}+1$, if $l_{j} \neq -\infty$, and $u_{j} \neq \infty$, $k_{j}= 0.5t +u_{j}$, if $l_{j} = -\infty$, and $u_{j} \neq \infty$, $k_{j}= 0.5t -l_{j} $, if $l_{j} \neq -\infty$, and $u_{j} =\infty$ and $k_{j}=t$, if $l_{j} = -\infty$, and $u_{j} = \infty$ , such that $l_{j} \leq $ $v_{j} \leq u_{j}$. The resulting polynomials are compared in a lexicographic manner, i.e. first the number of $t$s are compared before the constants are compared. When branching, the AVV of the child is equal to the AVV of the parent minus ($k_{i}$ of the parent minus $j_{ii}$ of the child) when we branch on the variable. Note that the AVV always decreases from the parent to the child. However, in practice, the AVV will be computed fresh for every node in the same pass that the sum of integer infeasibilities and the number of integer infeasibilities is computed, since variable bounds can change in ways other than by branching such as logical fixing. The basis for this node selection strategy is that for a tighter polyhedron, it is easier to prove infeasibility, or achieve feasibility. \emph{Sum of Integer Infeasibiluties: } $ s_{j}$ for node $j$=$\sum_{i \in \mathbf{x_{1'}}}{min\{f_{i}, 1- f_{i}}\}$ and $ f_{i}$ is the fractional part of variable $i$. \emph{Number of Integer Infeasibilities:}: $ a_{j}$ for node $j$ $=\sum_{i \in \mathbf{x_{1'}}}{is_int_{i} }$ where $ is_int_{i}$=1 if $v_{i} \in \mathbf{x_{1'}}$ and 9 otherwise. Note that $s_{j}$ and $a_{j}$ can give different results. Consider a candidate list with two nodes, one with $s_{1}=0.45, a_{1}=1$, and another with $s_{1}=0.2, a_{2}=2$. According to the sum pf integer infeasibilities, node 2 should be chosen, but based on the number of integer infeasibilities, node 1 should be chosen. Also, the two metrics may be combined as $measure\_int\_infeasibility_{j}=\alpha s_{j}+\beta a_{j}$ (where $\alpha$ and $\beta$ are to be discovered by our learning experiments). In addition, we could also use $s_{j}/a_{j}$ as a sort of average metric. An example of an instance when both $s_{j}$ and $a_{j}$ favor the selection if the same node and $s_{j}/a_{j}$ favors the selection of another node is $s_{1}/a_{1}$=4/10=0.4 and $s_{2}/a_{2}$ is 1/2-0.5. the only way that it is possible that $measure\_int\_infeasibility_{j} $ gives a different value that all of $s_{j}$, $a_{j}$ and $s_{j}/a_{j}$ is when at least one of $\alpha$ or $\beta$ is negative, e.g $s_{1}/a_{1}$=5/10=0.5, $-s_{1}+a_{1}$=5 and $s_{2}/a_{2}$ is 2/8-0.25 and $-s_{2}+a_{2}$ =6. However, if $s_{2}= s_{1}+c_{1}$ and $a_{2}= a_{1}-c_{2}$ such that $ c_{1}$ and $ c_{2}$ are positive constants and $ \alpha c_{1}$ < $ \beta c_{2}$, $measure\_int\_infeasibility_{2}$ $s_{1}/a_{1}$ assuming that both $\alpha$ and $\beta$ are positive (by interchanging the sign and the direction of the inequalities, the statement still holds). We will select the node with minimum of these metrics, independently of whether we aura performing maximization or minimization. Since the goal is to find an optimal integer feasible solution, the sun and number of integer infeasibilities are valid metrics. \subsection{Objective Function Based Estimates} \subsubsection{Direct Objective Function Value Estimates} Tthe direct objective function value estimate-based methods attempt to estimate the best integer feasible solution obtainable from a node in the branch and bound tree. \emph{Best Projection Estimates; } for a node $j$ as $E^{pro}_{j}= z^{j}_{U}+\frac{z_{L}- z^{0}_{U}}{s_{o}} s_{j}$ Since this metric includes $z_{L}$ or $z_{U}$, it must be updated for each node every time they are updated. This may be time-consuming, as finding nodes with the maximum or minimum value take O(log $\Bbb{L}|$). However, the removal of any nodes that can be fathomed through the update of $z_{L}$ may be done simultaneously to free up space. The replacement of $s_{j}$ with $a_{j}$ could also potentially give a different result The formulation works for both maximization and minimization, since $ z_{L}- z^{0}_{U}$ is negative for maximization and positive for minimization (then $z_{L}$ is $z_{U}$). As for all other objective function value based estimates. For maximization, we favor the node with the highest value and for minimization, we favor the node with the lowest value. \emph{Pseudocost Estimate/Best Estimate:} Since the best projection estimate does not take into account which variables are fractional, the best estimate method \ has a role. $E^{pseudocost}_{j}= z^{j}_{U}-\sum_{i \in \mathbf{x_{1'}}} {min\{ P^{-}_{i}f_{i}, P^{+}_{i}(1- f_{i} )\}}$, where $ P^{-}_{i}$ is the pseudocost of having the $i$th variable rounded down and $P^{+}_{i}$ is the pseudocost of having the $i$th variable rounded p. $P^{-}_{i}= \frac{\mid z^{i-}_{U}-z^{i}_{U}\mid}{ f_{i}}$ and $P^{+}_{i}=\frac{\mid z^{i+}_{U}-z^{i}_{U}\mid}{ 1-f_{i}}$, where $ z^{i-}_{U}$ is the value of the objective function when the fractional variable $i$ is rounded down, and $z^{i+}_{U}$ is defined in a similar way. The above formulation is for maximization. For minimization, the ``-'' before the summation is replied by ``+''. A potential improvement of the pseudocost method that does not assume independence among variable pseudocosts to be tested and compared is $E^{diff_pseudocost}_{j}= z^{j}_{U}- [(\mid z^{j}_{down}-z^{i}_{U}\mid )+(\mid z^{j}_{up}-z^{i}_{U}\mid) +\sum_{i =1 to q} { (\mid z^{j}_{rand(i)} -z^{i}_{U}\mid )}]/(2+q)$, where $z^{j}_{down}$ is the value of the objective function when all the variables violating the integrality constraints are rounded down, $z^{j}_{up}$ is the value of the objective function when all the variables violating the integrality constraints are rounded up, $z^{j}_{rand(i)})$ is the value of the objective function when all the variables violating the integrality constraints are randomly founded up or down in the $I$th random trial, and $q$ is a small integer constant $\geq 1$, Besides taking the average, one may also take the minimum (or maximum) what this pseudocost variant does is that it considers the integrality of all the variables as a whole rather than independent of each other. Note that the first ``-'' is a ``+'' for minimization. We do not plan to consider the pseudo-shadowcost or the adjusted pseudocost estimate, at least not in the initial stage of testing. \subsubsection{Other Objective Function Based Estimates} \emph{Norm Estimate:} $norm_{j}$ = $\frac{d^{2}_{i}} {\mid E^{pseudocost}_{i}- z^{0}_{U} \mid}$ where $d_{j}$ is the depth of node $j$. One can think of the justification as follows: the denominator can be seen as the ``decline'' from the objective function value of the LP relaxation at the root to the best possible estimated integer feasible solution obtainable from node $i$. For a given depth, the node with the lowest denominator is taken. This makes sense, since then, the ``decline'' is at a minimum. For a given denominator, the node with the greatest depth is taken, since then more of the decline has presumably taken place, and we are closer to the integer feasible optimum. Due to the absolute value on the denominator, the norm estimate works for both maximization and minimization. actually has three different definitions for the norm estimate, the first being the previously defined norm estimate. $E^{norm2}_{i}$ as $\frac{d_{i}} {\mid E^{pseudocost}_{i}- z^{0}_{U} \mid}$ and $E^{norm3}_{i}$ as $\frac{d_{i}} {num_{inf}\mid E^{pseudocost}_{i}- z^{0}_{U} \mid}$ where $ num_{inf}$ is the number of integer infeasibilities. The same reasoning applies to $E^{norm2}_{i}$ and $E^{norm3}_{i}$ except the scaling is altered. An alternative in addition to the two versions already described is $alt\_norm_{j}$ = $\frac{ z^{j}_{U}-z_{L} } {\mid E^{pseudocost}_{i}- z^{0}_{U} \mid}$ for maximization and $alt\_norm_{j}$ = $\frac{ z_{U} - z^{j}_{U}} {\mid E^{pseudocost}_{i}- z^{0}_{U} \mid}$ for minimization . For a given numerator, the node with the lowest denominator is taken, like the original norm estimates. This makes sense, since then, the ``decline'' is at a minimum. For a given denominator, the node with the greatest measure of possible improvement in objective function value ($z^{j}_{U}$ can be replaced by $E^{pseudocost}_{j}$) from the current best integer feasible solution is taken. The initial idea was to use the num of times that $z_{L}$ or $z_{I}$ has changed or the number of integer feasible solutions encountered when the node was added to the candidate list, but this only favors the more recently added nodes. Actually, the ``-$z_{L}$''or ``-$z_{I}$'' in the numerator is really unnecessary, as it is constant for all nodes\footnote{ $alt\_norm2_{j}$ = $\frac{ z^{j}_{U}} {\mid E^{pseudocost}_{i}- z^{j}_{U} \mid}$ mat be better than the previously described alternative. The denominator is the cost of integrality and the numerator is the objective function value. Lou has suggested $alt\_norm3_{j}$ = $\frac{ z^{j}_{U}-z_{L} } {\mid E^{pseudocost}_{i}- z^{i}_{U} \mid}$ as another possibility for maximization minimization is similar. } The node twitch the highest estimate value is chosen whether minimization or maximization is involved. Using different objective function value based estimates is another variant of the norm estimate. These methods are to be compared with the most commonly used method, BFS, as well as the uninformed or blind techniques of random node selection, DFS and breadth first search. \section{The Question of Diving} \emph{Diving} occurs when one of the two children resulting from branching is selected rather than adding the children to the active list and selecting a mode from $\Bbb{L}$, the candidate list. This distinction was very briefly remarked on in the depth paper as the immediate (diving) and backtracking approaches. Diving has two advantages attributed to DFS, i.e. ease of construction of the LP from the parent to child and a higher probability of finding integer feasible solution. The O($n$) memory expenditure applies only wham diving unconditionally as an init from the root until a leaf node, i.e. an infeasible LP or an integer feasible solution. First, we will test the above estimate based methods with no diving at all. Then, we will consider diving occurring as a unit where we do not decide whether or not to dive at each branching operation. One approach is to use one of the estimate base based methods (each one will be tested in turn) will be utilized to determine which child to branch on from the root to a leaf node. The length of this path, $p$ is noted. As well, at every branching operation, the child not chosen is added to the candidate list. if the objective function value of this node is not optimal (either the leaf is infeasible or the objective function value of the leaf is less than the highest objectively function value in the candidate list considering maximization; minimization is similar), for $j$ taking the value of 1 to $p$, start a new search chain by returning to the root and chose the node according to the estimate used at each branching step except at the $j$th step. If there is still no optimal solution in these $p$ leaves, for $j_{1}$ taking values of one to $p$ and $ j_{2}$ taking values of one to $p$. $j_{1} \neq j_{2}$. start a new search chain by returning to the root and chose the node according to the estimate used at each branching step except at the $j1$th and $j2$th steps. This process is repeated until the optimal solution is found or the optimality gap is small enough. This is a form of local discrepancy search. A variant of this technique that breaks the memory constraint but is potentially more time efficient is described as fellows. One of the estimate based methods will be utilized to determine which child to branch on from the root to a leaf node. if the objective function value of this node is not optimal, the node with the ``best'' value according to an estimate is chosen, and from this node, diving is continued until a leaf node is reached. The procedure is repeated until an optimal is found. This is known as repeated best option diving. Note that BFS (best objective function value) and random choice can also be used in conjunction with LDS approach and the hybrid approach. In addition, there is also the possibility of repeated random dives. One could replace DFS in the two phase method (DFS until an integer feasible solution is found, then BFS) and the backtracking method (depth first as long as $ z^{i}_{U}>E_{0}$ and then perform best first or best pseudocost if $z^{i}_{U} \leq E_{0}$, where $ E_{0}$. Is tan objective function based estimate of the root ) by repeated estimate based diving under the assumption that such diving is more informed than DFS. As far as deciding whether or not to diving at each (or everY $l$) branching step(s), there are the methods described for use in SYMPHONY and COIN/BCP, namely, objective function value based diving and fractional based diving, in addition to unconditional random diving with a provability indicated by a parameter. Fractional diving overrides the other diving strategies, and permits dicing when the number of variables that are not integer but ought to be does not exceed a parameter representing a lower limit (default is 0) or the ratio of the number of variables violating the integrality constraints to the number of variables constrained to be integer does not exceed a lower limit (default is 0.02). As an extension, the sum of integer infeasibilities can also be treated in this manner. Unlike SYMPHONY, there is no support for fractional diving in the COIN/BCP code, although its existence is described in the manual. There are three objective function based diving strategies in SYMPHONY BEST\_ESTIMATE which is like our backtracking method except that there is no repeated diving if we hit a leaf without meting the condition. The COMP\_BEST\_K diving strategy involves diving when the objective function value of the node to be dived into does not exceed (dealing with minimization,COIN/BCP and SYMPHONY perform minimization) the average of the $k$(default is 1) best objective function values from the candidate list by more than a certain diving threshold (default is 0 which makes no sense). The actual formulation is the ratio of the objective function value of the node to be dived into to the average of the $k$ best objective function values from the candidate list in the same \emph{phase} minus one. The COMP\_BEST\_K\_GAP diving strategy involves diving when the ratio of the objective function of the node to be dived into minus average of the $k$ best objective function values from the candidate list to the current best integer feasible solution minus the average objective function value does not exceed the diving threshold. If an integer feasible solution has not been found yet, rather than revert to COMP\_BEST\_K as I would do, diving is permitted when the ratio of the objective function value of the node the be dived into minus the average if the $k$ best objective function values from the candidate list to this average minus one does not exceed the diving threshold, an easier certifiable condition. Another spin of my own on this would be to dive if the ratio of the current best integer feasible solution minus the average of the $k$ best objective function values from the candidate list to the current best integer feasible solution minus the objective function value of the node to be dived into (dealing with minimization) is no more than a certain diving threshold. A simple example illustrates the difference between these metrics. Suppose that the current integer feasible solution $z_{U}$ is 10. Since it is probably safe to assume that the node to be dive into cannot be fathomed, let the objective function of the node to be dived into $z_{dive}$ is 8, and the average of the $k$ best objective function values from the candidate list $z_{aver}$ is 4. now, for COMP\_BEST\_K, we have ($\frac{z_{dive}}{z_{aver}}$)-1=1 $\leq diving\_threshold$. For COMP\_BEST\_K\_GAP, we have $\frac{z_{dive}- z_{aver}}{z_{U}-z_{aver}}$= 2/3$\leq diving\_threshold$ or the equivalent actual formulation which deals with the divide-by-zero problem and is $z_{dive} \leq z_{aver}+ diving\_threshold (z_{U}-z_{aver})$. For the modified COMP\_BEST\_K\_GAP, we have $\frac{z_{U}- z_{aver}}{z_{U}-z_{dive}}$=3$\leq diving\_threshold$. As one can see, these metrics can give different results depending on how $diving threshold$ us set. Note that if $z_{dive}< z_{aver}$ and $z_{U}0$ and $z_{dive}/$$z_{best}$ < the diving threshold (which can be set to a different value if n integer feasible solution exists; by default, it is 1.2 if an integer feasible solution exists, and 2 otherwise \footnote{Before an integer feasible solution has been found, there is more need for diving, and thus the condition is not as tight.}), diving is permitted. observe that the ``minus one'' from COMP\_BEST\_K has been eliminated, but the ``$\leq$'' has been replaced by ``<''. If $z_{best}=0$ , instead of simply defaulting to first rule, diving is permitted if $z_{dive}$ is less than the diving threshold. Finally, if both $z_{best}$ and $z_{dive} $ are negative, diving is allowed when $z_{best}$/ $z_{dive}<$ the diving threshold. The COMP\_BEST\_K and COMP\_BEST\_K\_GAP have no such safety checks when the denominator is not positive. If $z_{best}< 0$ and $z_{dive}>0$, I would suggest allowing diving when $\frac{ (\mid z_{best} \mid + \mid z_{dive}\mid) + min\{\mid z_{best} \mid, \mid z_{dive}\mid\}}{ min\{\mid z_{best} \mid, \mid z_{dive}\mid\}} <$ the diving threshold. If $min\{\mid z_{best} \mid, \mid z_{dive}\mid\} $ is zero, $min\{\mid z_{best} \mid, \mid z_{dive}\mid\}$ should be replaced by $max\{\mid z_{best} \mid, \mid z_{dive}\mid\}$. The way of dealing with a numerator or denominator that is zero is appropriate when $z_{dive}$=0 and $z_{best}$<0 or $z_{dive }>$0 and $z_{best}$=0. When $z_{dive}$=0 and $z_{best}>0$, the original ratio metric works as is, and when $z_{dive }\leq$0 and $z_{best}$=0, diving should occur. An analysis of the metrics used to determine whether to dive is instructive. Consider the ratio used in COMP\_BEST\_K or COIN, i.e. $r_{1}=\frac{z_{dive}}{z_{aver}}$= $\frac{z_{aver}+e_{1}}{z_{aver}}$ (in the case of COIN, $z_{aver}$ is $z_{best}$). When $e_{1}$=0 ($z_{dive}=z_{aver} $), $r_{1}$=1. Given that $z_{aver}$ is positive, as $e_{1}$ approaches -$\infty$, so does $r_{1}$, and as $e_{1}$ approaches $-\infty$, so does $r_{1}$ . This is expected since diving is permitted when $r_{1}-1 \leq$ diving\_ threshold or $r_{1}<$ a diving parameter. However, if $z_{aver}\leq 0$, $r_{1}$ gives erroneous results, and thus the remedies stated above are necessary. For maximization, the numerator and denominators are reversed. Examining the ratio used in COMP\_BEST\_K\_GAP $r_{2}=$ $\frac{z_{dive}- z_{aver}}{z_{U}-z_{aver}}$= $\frac{(z_{aver}+e_{1})- z_{aver}}{z_{U}-z_{aver}}$=$\frac{e_{1}}{z_{U}-z_{aver}}$, when $e_{1}$=0, $r_{2}$=0. If $e_{1}>$0, $r_{2}=\frac{e_{1}}{(z_{aver}+ e_{1}+e_{2})-z_{aver}}$ = 0$<$ $\frac{e_{1}}{e_{1}+e_{2}}<1$, and as $e_{1}$ approaches $\infty$, $r_{2}$ approaches 1 (we continue to assume that $z_{U}>$ $z_{dive}$ and $z_{aver}$, and thus they are not fathomable; so, $e_{2}>$0). If $e_{1}<$0, $r_{2}=\frac{-e_{3}}{(z_{aver}+e_{2})-z_{aver}}$= $\frac{-e_{3}}{e_{2}}< 0$ ($e_{1}=-e_{3}$ and both $ e_{2}$ and $e_{3} >$0)/ Notice that negative values for $z_{dive}$ and $z_{aver}$ pose no danger for $r_{2}$, and we cannot divide by zero, since $z_{U}>$ $z_{aver}$. For maximization, $r_{2}=$ $\frac{z_{aver}- z_{dive}}{-z_{aver}- z_{L}}$. If we scrutinize the ratio used in COMP\_BEST\_K\_GAP when $z_{U}$ does not exist, $r_{2'}=$ $\frac{z_{dive}- z_{aver}}{z_{aver}}$= $\frac{e_{1}}{z_{aver}}$, when $e_{1}$=0, $r_{2'}$=0. When $z_{aver}\leq 0$, the ratio gives faulty results. Changing $r_{2'}$ to $\frac{z_{dive}- z_{aver}}{\mid z_{aver} \mid}$ will help when $z_{aver}< 0$. When $z_{aver}=0$, the denominator of the ratio should be ignored. When dealing with maximization, $r_{2'}=$ $\frac{z_{aver}- z_{dive}}{z_{dive}}$ Looking at the ratio utilized in the proposed modification to COMP\_BEST\_K\_GAP $r_{3}=$ $\frac{z_{U}- z_{aver}}{z_{U}-z_{dive}}$=$\frac{z_{U}- z_{aver}}{z_{U}-( z_{aver}+e_{1})}$, when $e_{1}$=0, $r_{3}$=1. If $e_{1}>$0, $r_{3}=\frac{(z_{aver}+e_{1}+e_{2})- z_{aver}}{(z_{aver}+e_{1}+e_{2})-(z_{aver}+e_{1})}= \frac{e_{1}+e_{2}}{e_{2}}>1$, and as $e_{1}$ approaches -$\infty$, so does $r_{3}$ (we continue to assume that $z_{U}>$ $z_{dive}$ and $z_{aver}$, and thus they are not fathomable; so, $e_{2}>$0). If $e_{1}<$0, $r_{3}=\frac{(z_{aver}+e_{2})- z_{aver}}{(z_{aver}+e_{2})-(z_{aver}-e_{3})}= 0<$ $\frac{ e_{2}}{e_{2}+ e_{3}}< 1$, ($e_{1}=-e_{3}$ and both $ e_{2}$ and $e_{3} >$0) and as $e_{1}$ approaches -$-\infty$, $r_{3}$ approaches 0 . Observe that negative values for $z_{dive}$ and $z_{aver}$ pose no danger for $r_{3}$, and we cannot divide by zero, since $z_{U}>$ $z_{dive}$. For maximization, $r_{3}=$ $\frac{ z_{aver}- z_{L}}{z_{dive}- z_{L}}$. The leering strategies will sort out which of these methods is best for a representative class of IP problems, and the adaptive metaheuristic will determine which technique to apply when finding the solution of a given problem instance. \end{sloppypar} \end{document}