Personal tools
You are here: Home Group Rätsch Supplements Shogun

Shogun - A Large Scale Machine Learning Toolbox

This is the official homepage of the SHOGUN machine learning toolbox.

SHOGUN Logo

The machine learning toolbox's focus is on large scale kernel methods and especially on Support Vector Machines (SVM) [1]. It provides a generic SVM object interfacing to several different SVM implementations, among them the state of the art LibSVM [2], SVMLight, [3] SVMLin [4] and GPDT [5]. Each of the SVMs can be combined with a variety of kernels. The toolbox not only provides efficient implementations of the most common kernels, like the Linear, Polynomial, Gaussian and Sigmoid Kernel but also comes with a number of recent string kernels as e.g. the Locality Improved [4], Fischer [5], TOP [6], Spectrum [7], Weighted Degree Kernel (with shifts) [8] [9] [10]. For the latter the efficient LINADD [10] optimizations are implemented. Also SHOGUN offers the freedom of working with custom pre-computed kernels. One of its key features is the combined kernel which can be constructed by a weighted linear combination of a number of sub-kernels, each of which not necessarily working on the same domain. An optimal sub-kernel weighting can be learned using Multiple Kernel Learning [11] [12] [16]. Currently SVM 2-class classification and regression problems can be dealt with. However SHOGUN also implements a number of linear methods like Linear Discriminant Analysis (LDA), Linear Programming Machine (LPM), (Kernel) Perceptrons and features algorithms to train hidden markov models. The input feature-objects can be dense, sparse or strings and of type int/short/double/char and can be converted into different feature types. Chains of preprocessors (e.g. substracting the mean) can be attached to each feature object allowing for on-the-fly pre-processing.

SHOGUN is implemented in C++ and interfaces to Matlab(tm), R, Octave and Python.

Screenshots

As everyone likes screenshots, we have produced one for each interface: SHOGUN with Octave, Matlab, Python and R. Click on the link for higher resolution images.

Octave Demo Matlab Demo Python Demo R Demo

Applications

We have successfully used this toolbox to tackle the following sequence analysis problems: Protein Super Family classification[6], Splice Site Prediction [8] [13] [14], Interpreting the SVM Classifier [11] [12], Splice Form Prediction [8], Alternative Splicing [9] and Promotor Prediction [15]. Some of them come with no less than 10 million training examples, others with 7 billion test examples.

Licensing Information

Except for SVMLight which is (C) Torsten Joachims and follows a different licensing scheme (cf. LICENSE.SVMLight in the tar achive) SHOGUN is licensed under the GPL version 3 or any later version (cf. LICENSE). GPLv3 Logo

Cite us

If you use SHOGUN in your research you are kindly asked to cite the following paper:

S.Sonnenburg, G.Raetsch, C.Schaefer and B.Schoelkopf, Large Scale Multiple Kernel Learning.
Journal of Machine Learning Research,7:1531-1565, July 2006, K.Bennett and E.P.-Hernandez Editors.

Download Releases

SHOGUN Version 0.5.1
(updated 19.02.2008) Older Versions

This release contains minor bugfixes

  • Allow building w/o doxygen
  • Code cleanups
  • Support newer lapack/atlas/blas
  • New methods:
    • Added several performance measures
    • SVMSGD
    • Efficient reading/writing of svmlight format

Documentation and Examples

We use Doxygen for both user and developer documentation which may be read online here. Additionally many examples can be found in the [interface]/examples directory in the source code (where interface is one of R, octave, matlab, python, python-modular). Note that documentation for python-modular is most complete and also that python's help function will show the documentation when working interactively:

$ python
Python 2.4.4 (#2, Jan  3 2008, 13:36:28) 
[GCC 4.2.3 20071123 (prerelease) (Debian 4.2.2-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from shogun.Classifier import SVM
>>> help(SVM)

class SVM(CSVM)
 |  Method resolution order:
 |      SVM
 |      CSVM
 |      CKernelMachine
 |      Classifier
 |      SGObject
 |      __builtin__.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, kernel, alphas, support_vectors, b)
[...]
Below we provide some of the examples that were used to carry out experiments for a number of publications. Note that all of these can also be found in the source code.

Click on the corresponding link to see classification and regression examples for Matlab(tm), R, Octave or Python:

Below one finds some Bioinformatics examples (for octave and matlab) as presented at BOSC 2006:

Multiple Kernel Learning examples (JMLR 2006 paper "Large Scale Multiple Kernel Learning"):

Mailinglist and Contact

In case of comments, problems, questions, bug-reports etc. please use the mailing list (subscription required)

In case you need to directly get in touch with us, feel free to contact

Developer Information

Want to contribute ? We maintain SHOGUNs source code via SVN

  • To browse the source code of the current and previous releases use
    http://svn.tuebingen.mpg.de/shogun/releases/
  • To access the source code via svn use
    svn checkout http://svn.tuebingen.mpg.de:/shogun/releases shogun-releases
  • To get access to the most up-to-date svn-trunk contact us for read/write access. Then use
    svn checkout https://svn.tuebingen.mpg.de:/shogun/trunk shogun

Acknowlegements

The authors gratefully acknowledge the support of DFG grant MU 987/2-1 and the PASCAL Network of Excellence.

References

[1]C.Cortes and V.N. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995.
[2]C.-C. Chang and C.-J. Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[3]T.Joachims. Making large-scale SVM learning practical. In B.Schoelkopf, C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 169--184, Cambridge, MA, 1999. MIT Press.
[4] V. Sindhwani, S. S. Keerthi. Large Scale Semi-supervised Linear SVMs. SIGIR, 2006.
[5] L. Zanni, T. Serafini, G. Zanghirati. Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems. JMLR 7(Jul), 1467-1492, 2006.
[6]A.Zien, G.Raetsch, S.Mika, B.Schoelkopf, T.Lengauer, and K.-R. Mueller. Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites. Bioinformatics, 16(9):799-807, September 2000.
[7]T.S. Jaakkola and D.Haussler.Exploiting generative models in discriminative classifiers. In M.S. Kearns, S.A. Solla, and D.A. Cohn, editors, Advances in Neural Information Processing Systems, volume 11, pages 487-493, 1999.
[8]K.Tsuda, M.Kawanabe, G.Raetsch, S.Sonnenburg, and K.R. Mueller. A new discriminative kernel from probabilistic models. Neural Computation, 14:2397--2414, 2002.
[9]C.Leslie, E.Eskin, and W.S. Noble. The spectrum kernel: A string kernel for SVM protein classification. In R.B. Altman, A.K. Dunker, L.Hunter, K.Lauderdale, and T.E. Klein, editors, Proceedings of the Pacific Symposium on Biocomputing, pages 564-575, Kaua'i, Hawaii, 2002.
[10](1, 2, 3) G.Raetsch and S.Sonnenburg. Accurate Splice Site Prediction for Caenorhabditis Elegans, pages 277-298. MIT Press series on Computational Molecular Biology. MIT Press, 2004.
[11](1, 2) G.Raetsch, S.Sonnenburg, and B.Schoelkopf. RASE: recognition of alternatively spliced exons in c. elegans. Bioinformatics, 21:i369--i377, June 2005.
[12](1, 2) S.Sonnenburg, G.Raetsch, and B.Schoelkopf. Large scale genomic sequence SVM classifiers. In Proceedings of the 22nd International Machine Learning Conference. ACM Press, 2005.
[13](1, 2) S.Sonnenburg, G.Raetsch, and C.Schaefer. Learning interpretable SVMs for biological sequence classification. In RECOMB 2005, LNBI 3500, pages 389-407. Springer-Verlag Berlin Heidelberg, 2005.
[14](1, 2) G.Raetsch, S.Sonnenburg, and C.Schaefer. Learning Interpretable SVMs for Biological Sequence Classification. BMC Bioinformatics, Special Issue from NIPS workshop on New Problems and Methods in Computational Biology Whistler, Canada, 18 December 2004, 7:(Suppl. 1):S9, March 2006.
[15]S.Sonnenburg.New methods for splice site recognition. Master's thesis, Humboldt University, 2002. supervised by K.-R. Mueller H.-D. Burkhard and G.Raetsch.
[16]S.Sonnenburg, G.Raetsch, A.Jagota, and K.-R. Mueller. New methods for splice-site recognition. In Proceedings of the International Conference on Artifical Neural Networks, 2002. Copyright by Springer.
[17]S.Sonnenburg, A.Zien, and G.Raetsch. ARTS: Accurate Recognition of Transcription Starts in Human. 2006. (accepted).
[18]S.Sonnenburg, G.Raetsch, C.Schaefer, and B.Schoelkopf,Large Scale Multiple Kernel Learning, Journal of Machine Learning Research, 2006, K.Bennett and E.P.-Hernandez Editors, (accepted)
Document Actions