Workshop on Ensemble Methods 2005
at the Max Planck House in Tübingen, March, 4th and 5th, 2005
This workshop is jointly organised by the working groups
This event is sponsored by EU PASCAL Network of Excellence.
Organisers
- Axel Benner, DKFZ Heidelberg
- Torsten Hothorn, Universität Erlangen-Nürnberg
- Gunnar Rätsch, Friedrich Miescher Laboratory of the Max Planck Society
Local Organiser
- Gunnar Rätsch, Friedrich Miescher Laboratory of the Max Planck Society
Aims of the Workshop
Schedule
Talks will be given in English, a LCD projector as well as a laptop for PDF or Powerpoint presentations is available in the lecture room.
| Time | Chair | Speaker | Title |
|---|---|---|---|
| Friday, 14:00 | Gunnar Rätsch | Peter Bühlmann, Zürich | Boosting: a Statistical Perspective |
| Friday, 15:00 | Coffee break | ||
| Friday, 15:30 | Torsten Hothorn | Gunnar Rätsch, Tübingen | Boosting SVMs to understand the SVMs decision function |
| Friday, 16:15 | Gunnar Rätsch | Koji Tsuda, Tübingen | Matrix Exponential Updates for On-line Learning and Bregman Projection |
| Friday, 17:00 | Coffee break | ||
| Friday, 17:30 | Axel Benner | Cesare Furlanello, Trento | An Introduction to Parallel Boosting |
| Friday, 18:15 | Cesare Furlanello | Panel Discussion: Feature Selection with Ensemble Predictors | |
| Friday, 19:00 | Dinner | ||
| Saturday, 9:00 | Gunnar Rätsch | Nicolas Vayatis, Paris | Online Convex Risk Minimization |
| Saturday, 10:00 | Coffee break | ||
| Saturday, 10:30 | Axel Benner | Achim Zeileis, Wien | kernlab - An R package for kernel learning |
| Saturday, 11:15 | Axel Benner | Jussi Klemelä, Mannheim | Density Estimation with Stagewise Optimization of the Empirical Risk |
| Saturday, 12:00 | Lunch | ||
| Saturday, 13:30 | Torsten Hothorn | Gilles Blanchard, Berlin | An accelerated algorithm for MCMC Bayesian decision tree sampling |
| Saturday, 14:30 | Torsten Hothorn | Ursula Garczarek, Penzberg | Simplified quadratic discriminant classifier |
Abstracts
- Peter Bühlmann, Zürich: Boosting: a Statistical Perspective
Abstract: Boosting (Freund and Schapire, early 1990's) has been proposed in the machine learning community as an ensemble technique for binary classification. The gradient descent view, as articulated by Breiman (1998, 1999) turns out to be very fruitful to modify Boosting as a general technique which can also be used in other settings than classification, such as regression or survival analysis. In particular, boosting can be naturally forced into structures, e.g. additive or interaction models, and it can then be seen as a powerful regularization method for statistical modeling.
- Gunnar Rätsch, Tübingen:
Boosting SVMs to understand the SVMs decision function
(joint work with Sören Sonnenburg and Christin Schäfer)
Abstract: We propose novel algorithms for solving the so-called Support Vector Multiple Kernel Learning problem and show how they can be used to understand the resulting support vector decision function. While classical kernel-based algorithms (such as SVMs) are based on a single kernel, in Multiple Kernel Learning a quadratically-constraint quadratic program is solved in order to find a sparse convex combination of a set of support vector kernels. We show how this problem can be cast into a semi-infinite linear optimization problem for whose solution we propose two novel algorithms: A boosting-like iterative method with known convergence rates and a column generation technique. We show that both algorithms solve this quadratically-constrained quadratic program and find a sparse convex combination of a set of support vector kernels. In the second part we show how this technique can be used to understand the obtained decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We consider the problem of splice site identification and combine string kernels at different sequence positions and with various substring (oligomer) lengths. The proposed algorithm computes a sparse weighting over the length and the substring length, highlighting which substrings are important for discrimination. Finally, we propose a bootstrap scheme in order to reliably identify a few statistically significant positions, which can then be used for further analysis such as consensus finding.
- Koji Tsuda, Tübingen: Matrix Exponential Updates for On-line Learning and
Bregman Projection (joint work with Manfred Warmuth and Gunnar
Rätsch)
Abstract: We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: on-line learning with a simple square loss, and finding a symmetric positive definite matrix subject to linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the analyzes of the original EG update and AdaBoost generalize to the non-diagonal case. We apply the new versions of both, called the Matrix Exponentiated Gradient (MEG) update and DefiniteBoost, to learn a kernel matrix from distance measurements.
- Cesare Furlanello, Trento: An Introduction to Parallel Boosting
(joint work with B. Caprile and S. Merler)
Abstract: We show how to exploit an interesting property of the dynamics of weights in the AdaBoost algorithm in order to parallelize its learning process. Based on the notions of "hard" training patterns for boosting, which have properties similar to those of support vectors, we propose a characterization of the distribution of their weights, and then introduce P-AdaBoost. After a reduced number of regular AdaBoost learning cycles, weight distributions are estimated and resampled, providing the weight structure for developing in parallel, without iterations, a family of synthetic base models. The aggregated predictor is shown to have accuracy close to that of a complete AdaBoost model, and different from bagging.
- Nicolas Vayatis, Paris: Online Convex Risk Minimization (joint work
with A. Juditsky, A. Nazin and A. Tsybakov)
Abstract: I will consider the problem of constructing an aggregated estimator from a finite class of base classifiers which approximately minimizes a convex risk functional under an l_1 constraint. I will describe a stochastic gradient algorithm, known as mirror-descent algorithm, which performs gradient descent in the dual space. Mirror-descent algorithms have been developed in different contexts and they are known to be particularly efficient with high dimensional data. Moreover their implementation is adapted to the online setting. The main result will be the rate of convergence of this procedure.
- Achim Zeileis, Wien: kernlab - An R package for kernel learning
(joint work with A. Karatzoglou, A. Smola, K. Hornik)
Abstract: kernlab is an extensible package for kernel-based machine learning methods in R. It takes advantage of R's new S4 object model and provides a framework for creating and using kernel-based algorithms. The package contains dot product primitives (kernels), implementations of support vector machines and the relevance vector machine, Gaussian processes, a ranking algorithm, kernel PCA, kernel CCA, kernel feature analysis, online kernel methods and a spectral clustering algorithm. Moreover it provides a general purpose quadratic programming solver, and an incomplete Cholesky decomposition method.
- Jussi Klemelä, Mannheim:
Density Estimation with Stagewise Optimization of the
Empirical Risk
Abstract: We consider multivariate density estimation with identically distributed observations. We study properties of a density estimator which is a convex combination of simple functions in a fixed dictionary and the convex combination is chosen by minimizing the L_2 empirical risk in a stagewise manner. We derive the convergence rates of the estimator when the estimated density belongs to the L_2 closure of the convex hull of an underlying class of functions, under entropy conditions for the underlying class. The L_2 closure of a convex hull is a large non-parametric class but under suitable entropy conditions on the underlying class sufficiently small so that density estimation is feasible also in high dimensional cases. We present an implementation of the estimation algorithm where the estimator is a convex combination of myopic histograms, that is, histograms whose partition is chosen by minimizing the emprical risk in a myopic way.
- Gilles Blanchard, Berlin: An accelerated algorithm for MCMC Bayesian decision
tree sampling
Abstract: We present a novel algorithm intended to deal with a Bayesian framework for classification and regression trees. Typically, it is hard in practice to sample from the Bayesian posterior on trees (in order to approximate posterior-averaged or MAP estimators) by naive MCMC methods, because the space of trees is huge and the posterior presents a lot of local minima. The algorithm we introduce allows, for a large class of prior distributions, to develop an accelerated MCMC algorithm resulting in a dramatic increase of the number of models actually taken into account. It can be used in a variety of situations for Bayesian inference and we show its practical relevance on a demonstration example.
- Ursula Garczarek, Penzberg: Simplified quadratic discriminant
classifier
Abstract: A very simple classifier for two-class discrimination is introduced which is easy to implement, use, and understand. It is recommended as benchmark for classification based on any high dimensional data that is interpreted as "expression data", as in gene expression data, or "intensities", as in mass spectrometry data.
Participation and Accommodation
Participants can attend the workshop free of charge, the number of participants is restricted to 30 persons. To register, please drop an email to Gunnar Rätsch.
We have booked rooms directly at the MPI and in a hotel nearby the conference site (Hotel Katharina), please inform Gunnar Rätsch about your arrival and departure dates.

