Personal tools
You are here: Home EWS2005
Log in


Forgot your password?
 

Workshop on Ensemble Methods 2005

at the Max Planck House in Tübingen, March, 4th and 5th, 2005

This workshop is jointly organised by the working groups

  • Computational Statistics (DR-IBG),
  • Statistical Computing (GMDS) and the
  • Austrian Association for Statistical Computing.

    This event is sponsored by EU PASCAL Network of Excellence.

    Organisers

    Local Organiser

    Aims of the Workshop

    Based on the success of the Ensemble Workshop 2004, the workshop aims at discussing further theoretical and practical developments for prediction models such as boosting, random forests or support vector machines. The workshop tries to bring together the machine learning view with the statistical view on ensemble methods.

    Schedule

    Talks will be given in English, a LCD projector as well as a laptop for PDF or Powerpoint presentations is available in the lecture room.

    TimeChairSpeakerTitle
    Friday, 14:00Gunnar RätschPeter Bühlmann, Zürich Boosting: a Statistical Perspective
    Friday, 15:00 Coffee break
    Friday, 15:30Torsten HothornGunnar Rätsch, Tübingen Boosting SVMs to understand the SVMs decision function
    Friday, 16:15Gunnar RätschKoji Tsuda, Tübingen Matrix Exponential Updates for On-line Learning and Bregman Projection
    Friday, 17:00 Coffee break
    Friday, 17:30Axel BennerCesare Furlanello, TrentoAn Introduction to Parallel Boosting
    Friday, 18:15Cesare FurlanelloPanel Discussion: Feature Selection with Ensemble Predictors
    Friday, 19:00 Dinner
    Saturday, 9:00Gunnar Rätsch Nicolas Vayatis, Paris Online Convex Risk Minimization
    Saturday, 10:00 Coffee break
    Saturday, 10:30Axel Benner Achim Zeileis, Wien kernlab - An R package for kernel learning
    Saturday, 11:15Axel Benner Jussi Klemelä, Mannheim Density Estimation with Stagewise Optimization of the Empirical Risk
    Saturday, 12:00 Lunch
    Saturday, 13:30Torsten Hothorn Gilles Blanchard, Berlin An accelerated algorithm for MCMC Bayesian decision tree sampling
    Saturday, 14:30Torsten Hothorn Ursula Garczarek, Penzberg Simplified quadratic discriminant classifier

    Abstracts

    • Peter Bühlmann, Zürich: Boosting: a Statistical Perspective

    • Abstract: Boosting (Freund and Schapire, early 1990's) has been proposed in the machine learning community as an ensemble technique for binary classification. The gradient descent view, as articulated by Breiman (1998, 1999) turns out to be very fruitful to modify Boosting as a general technique which can also be used in other settings than classification, such as regression or survival analysis. In particular, boosting can be naturally forced into structures, e.g. additive or interaction models, and it can then be seen as a powerful regularization method for statistical modeling.

    • Gunnar Rätsch, Tübingen: Boosting SVMs to understand the SVMs decision function (joint work with Sören Sonnenburg and Christin Schäfer)

      Abstract: We propose novel algorithms for solving the so-called Support Vector Multiple Kernel Learning problem and show how they can be used to understand the resulting support vector decision function. While classical kernel-based algorithms (such as SVMs) are based on a single kernel, in Multiple Kernel Learning a quadratically-constraint quadratic program is solved in order to find a sparse convex combination of a set of support vector kernels. We show how this problem can be cast into a semi-infinite linear optimization problem for whose solution we propose two novel algorithms: A boosting-like iterative method with known convergence rates and a column generation technique. We show that both algorithms solve this quadratically-constrained quadratic program and find a sparse convex combination of a set of support vector kernels. In the second part we show how this technique can be used to understand the obtained decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We consider the problem of splice site identification and combine string kernels at different sequence positions and with various substring (oligomer) lengths. The proposed algorithm computes a sparse weighting over the length and the substring length, highlighting which substrings are important for discrimination. Finally, we propose a bootstrap scheme in order to reliably identify a few statistically significant positions, which can then be used for further analysis such as consensus finding.

      Slides and Paper

    • Koji Tsuda, Tübingen: Matrix Exponential Updates for On-line Learning and Bregman Projection (joint work with Manfred Warmuth and Gunnar Rätsch)

      Abstract: We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: on-line learning with a simple square loss, and finding a symmetric positive definite matrix subject to linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the analyzes of the original EG update and AdaBoost generalize to the non-diagonal case. We apply the new versions of both, called the Matrix Exponentiated Gradient (MEG) update and DefiniteBoost, to learn a kernel matrix from distance measurements.

      Slides

    • Cesare Furlanello, Trento: An Introduction to Parallel Boosting (joint work with B. Caprile and S. Merler)

      Abstract: We show how to exploit an interesting property of the dynamics of weights in the AdaBoost algorithm in order to parallelize its learning process. Based on the notions of "hard" training patterns for boosting, which have properties similar to those of support vectors, we propose a characterization of the distribution of their weights, and then introduce P-AdaBoost. After a reduced number of regular AdaBoost learning cycles, weight distributions are estimated and resampled, providing the weight structure for developing in parallel, without iterations, a family of synthetic base models. The aggregated predictor is shown to have accuracy close to that of a complete AdaBoost model, and different from bagging.

    • Nicolas Vayatis, Paris: Online Convex Risk Minimization (joint work with A. Juditsky, A. Nazin and A. Tsybakov)

      Abstract: I will consider the problem of constructing an aggregated estimator from a finite class of base classifiers which approximately minimizes a convex risk functional under an l_1 constraint. I will describe a stochastic gradient algorithm, known as mirror-descent algorithm, which performs gradient descent in the dual space. Mirror-descent algorithms have been developed in different contexts and they are known to be particularly efficient with high dimensional data. Moreover their implementation is adapted to the online setting. The main result will be the rate of convergence of this procedure.

    • Achim Zeileis, Wien: kernlab - An R package for kernel learning (joint work with A. Karatzoglou, A. Smola, K. Hornik)

      Abstract: kernlab is an extensible package for kernel-based machine learning methods in R. It takes advantage of R's new S4 object model and provides a framework for creating and using kernel-based algorithms. The package contains dot product primitives (kernels), implementations of support vector machines and the relevance vector machine, Gaussian processes, a ranking algorithm, kernel PCA, kernel CCA, kernel feature analysis, online kernel methods and a spectral clustering algorithm. Moreover it provides a general purpose quadratic programming solver, and an incomplete Cholesky decomposition method.

      Paper Slides R-Session Transcript

    • Jussi Klemelä, Mannheim: Density Estimation with Stagewise Optimization of the Empirical Risk

      Abstract: We consider multivariate density estimation with identically distributed observations. We study properties of a density estimator which is a convex combination of simple functions in a fixed dictionary and the convex combination is chosen by minimizing the L_2 empirical risk in a stagewise manner. We derive the convergence rates of the estimator when the estimated density belongs to the L_2 closure of the convex hull of an underlying class of functions, under entropy conditions for the underlying class. The L_2 closure of a convex hull is a large non-parametric class but under suitable entropy conditions on the underlying class sufficiently small so that density estimation is feasible also in high dimensional cases. We present an implementation of the estimation algorithm where the estimator is a convex combination of myopic histograms, that is, histograms whose partition is chosen by minimizing the emprical risk in a myopic way.

      Slides

    • Gilles Blanchard, Berlin: An accelerated algorithm for MCMC Bayesian decision tree sampling

      Abstract: We present a novel algorithm intended to deal with a Bayesian framework for classification and regression trees. Typically, it is hard in practice to sample from the Bayesian posterior on trees (in order to approximate posterior-averaged or MAP estimators) by naive MCMC methods, because the space of trees is huge and the posterior presents a lot of local minima. The algorithm we introduce allows, for a large class of prior distributions, to develop an accelerated MCMC algorithm resulting in a dramatic increase of the number of models actually taken into account. It can be used in a variety of situations for Bayesian inference and we show its practical relevance on a demonstration example.

      Slides

    • Ursula Garczarek, Penzberg: Simplified quadratic discriminant classifier

      Abstract: A very simple classifier for two-class discrimination is introduced which is easy to implement, use, and understand. It is recommended as benchmark for classification based on any high dimensional data that is interpreted as "expression data", as in gene expression data, or "intensities", as in mass spectrometry data.

      Slides

    Participation and Accommodation

    Participants can attend the workshop free of charge, the number of participants is restricted to 30 persons. To register, please drop an email to Gunnar Rätsch.

    We have booked rooms directly at the MPI and in a hotel nearby the conference site (Hotel Katharina), please inform Gunnar Rätsch about your arrival and departure dates.

  • Document Actions
    « March 2010 »
    March
    MoTuWeThFrSaSu
    1234567
    891011121314
    15161718192021
    22232425262728
    293031