Quantum tunneling has been proposed as a physical mechanism for solving binary optimization problems on a quantum computer because it provides an alternative to simulated annealing by directly connecting deep local minima of the energy landscape separated by large Hamming distances. However, classical simulations using Quantum Monte Carlo (QMC) were found to efficiently simulate tunneling transitions away from local minima if the tunneling is effectively dominated by a single path. We analyze a new computational role of coherent multi-qubit tunneling that gives rise to bands of non-ergodic extended (NEE) quantum states each formed by a superposition of a large number of deep local minima with similar energies. NEE provide a coherent pathway for population transfer (PT) between computational states with similar energies. In this regime, PT cannot be efficiently simulated by QMC. PT can serve as a new quantum subroutine for quantum search, quantum parallel tempering and reverse annealing optimization algorithms. We study PT resulting from quantum evolution under a transverse field of an n-spin system that encodes the energy function E(z) of an optimization problem over the set of bit configurations z. Transverse field is rapidly switched on in the beginning of algorithm, kept constant for sufficiently long time and switched off at the end. Given an energy function of a binary optimization problem and an initial bit-string with atypically low energy, PT protocol searches for other bitstrings at energies within a narrow window around the initial one. We provide an analytical solution for PT in a simple yet nontrivial model: M randomly chosen marked bit-strings are assigned energies E(z) within a narrow strip [-n -W/2, n + W/2], while the rest of the states are assigned energy 0. The PT starts at a marked state and ends up in a superposition of L marked states inside the narrow energy window whose width is smaller than W. The best known classical algorithm for finding another marked state is the exhaustive search. We find that the scaling of a typical PT runtime with n and L is the same as that in the multi-target Grover's quantum search algorithm, except for a factor that is equal to exp(n /(2B^2)) for finite transverse field B >>1. Unlike the Hamiltonians used in analog quantum unstructured search algorithms known so far, the model we consider is non-integrable and the transverse field delocalizes the marked states. As a result, our PT protocol is not exponentially sensitive in n to the weight of the driver Hamiltonian and may be initialized with a computational basis state. We develop the microscopic theory of PT by constructing a down-folded dense Hamiltonian acting in the space of marked states of dimension M. It belongs to the class of preferred basis Levy matrices (PBLM) with heavy-tailed distribution of the off-diagonal matrix elements. Under certain conditions, the band of the marked states splits into minibands of non-ergodic delocalized states. We obtain an explicit form of the heavy-tailed distribution of PT times by solving cavity equations for the ensemble of down-folded Hamiltonians. We study numerically the PT subroutine as a part of quantum parallel tempering algorithm for a number of examples of binary optimization problems on fully connected graphs.