We study a class of convex-concave saddle-point problems of the form min x maxy < Kx,y > +fP(x)-h*(y) where K is a linear operator, fP is the sum of a convex function f with a Lipschitz-continuous gradient and the indicator function of a bounded convex polytope P, and h* is a convex (possibly nonsmooth) function. Such problem arises, for example, as a Lagrangian relaxation of various discrete optimization problems. Our main assumptions are the existence of an efficient linear minimization oracle (lmo) for P and an efficient proximal map for h* which motivate the solution via a blend of proximal primal-dual algorithms and Frank-Wolfe algorithms. In case h* is the indicator function of a linear constraint and function f is quadratic, we show a O(1/n2) convergence rate on the dual objective, requiring O(nlogn) calls of lmo. If the problem comes from the constrained optimization problem minx{fP(x)\|Ax-b=0} then we additionally get bound O(1/n2) both on the primal gap and on the infeasibility gap. In the most general case, we show a O(1/n) convergence rate of the primal-dual gap again requiring O(nlogn) calls of lmo. To the best of our knowledge, this improves on the known convergence rates for the considered class of saddle-point problems. We show applications to labeling problems frequently appearing in machine learning and computer vision.

```
@article{kolmogorov2021one,
title={One-sided Frank-Wolfe algorithms for saddle problems},
author={Kolmogorov, Vladimir and Pock, Thomas},
journal={arXiv preprint arXiv:2101.12617},
year={2021}
}
```

SIAM Journal on Imaging Sciences (SIIMS), 9(2):605-636, 2016

We consider the problem of minimizing the continuous valued total variation subject to different unary terms on trees and propose fast direct algorithms based on dynamic programming to solve these problems. We treat both the convex and the non-convex case and derive worst case complexities that are equal or better then existing methods. We show applications to total variation based 2D image processing and computer vision problems based on a Lagrangian decomposition approach. The resulting algorithms are very efficient, offer a high degree of parallelism and come along with memory requirements which are only in the order of the number of image pixels.

```
@article{kolmogorov2016total,
title={Total variation on a tree},
author={Kolmogorov, Vladimir and Pock, Thomas and Rolinek, Michal},
journal={SIAM Journal on Imaging Sciences},
volume={9},
number={2},
pages={605--636},
year={2016},
publisher={SIAM}
}
```

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015)

Structural support vector machines (SSVMs) are amongst the best performing methods for structured computer vision tasks, such as semantic image segmentation or human pose estimation. Training SSVMs, however, is computationally costly, because it requires repeated calls to a structured prediction subroutine (called max-oracle), which has to solve an optimization problem itself, e.g. a graph cut. In this work, we introduce a new algorithm for SSVM training that is more efficient than earlier techniques when the max-oracle is computationally expensive, as it is frequently the case in computer vision tasks. The main idea is to (i) combine the recent stochastic Block-Coordinate Frank-Wolfe algorithm with efficient hyperplane caching, and (ii) use an automatic selection rule for deciding whether to call the exact max-oracle or to rely on an approximate one based on the cached hyperplanes. We show experimentally that this strategy leads to faster convergence towards the optimum with respect to the number of required oracle calls, and that this also translates into faster convergence with respect to the total runtime when the max-oracle is slow compared to the other steps of the algorithm. A C++ implementation is provided at http://www.ist.ac.at/˜vnk.