|
|
- Info
Shogun - A Large Scale Machine Learning Toolbox
This is the official homepage of the SHOGUN machine learning toolbox.
|
The machine learning toolbox's focus is on large scale kernel methods and
especially on Support Vector Machines (SVM) . It provides a generic SVM
object interfacing to several different SVM implementations, among them the
state of the art LibSVM , SVMLight,
SVMLin and GPDT . Each of the SVMs can be
combined with a variety of kernels. The toolbox not only provides efficient
implementations of the most common kernels, like the Linear, Polynomial,
Gaussian and Sigmoid Kernel but also comes with a number of recent string
kernels as e.g. the Locality Improved , Fischer , TOP , Spectrum ,
Weighted Degree Kernel (with shifts) . For the latter the efficient
LINADD optimizations are implemented. Also SHOGUN offers the freedom of
working with custom pre-computed kernels. One of its key features is the
combined kernel which can be constructed by a weighted linear combination
of a number of sub-kernels, each of which not necessarily working on the same
domain. An optimal sub-kernel weighting can be learned using
Multiple Kernel Learning .
Currently SVM 2-class classification and regression problems can be dealt
with. However SHOGUN also implements a number of linear methods like Linear
Discriminant Analysis (LDA), Linear Programming Machine (LPM), (Kernel)
Perceptrons and features algorithms to train hidden markov models.
The input feature-objects can be dense, sparse or strings and
of type int/short/double/char and can be converted into different feature types.
Chains of preprocessors (e.g. substracting the mean) can be attached to
each feature object allowing for on-the-fly pre-processing.
SHOGUN is implemented in C++ and interfaces to Matlab(tm), R, Octave and Python.
|
We have successfully used this toolbox to tackle the following sequence
analysis problems: Protein Super Family classification[6],
Splice Site Prediction , Interpreting the SVM Classifier ,
Splice Form Prediction , Alternative Splicing and Promotor
Prediction . Some of them come with no less than 10
million training examples, others with 7 billion test examples.
| Except for SVMLight
which is (C) Torsten Joachims and follows a different licensing scheme
(cf. LICENSE.SVMLight in the tar achive) SHOGUN is licensed under the
GPL version 3 or any later version (cf. LICENSE). |
 |
|
If you use SHOGUN in your research you are kindly asked to cite the following paper:
S.Sonnenburg, G.Raetsch, C.Schaefer and B.Schoelkopf, Large Scale Multiple Kernel Learning.
Journal of Machine Learning Research,7:1531-1565, July 2006, K.Bennett and E.P.-Hernandez Editors.
|
SHOGUN Version 0.5.1 (updated 19.02.2008)
Older Versions
|
This release contains minor bugfixes
- Allow building w/o doxygen
- Code cleanups
- Support newer lapack/atlas/blas
- New methods:
- Added several performance measures
- SVMSGD
- Efficient reading/writing of svmlight format
|
|
We use Doxygen for both user and developer documentation which may be read online here.
Additionally many examples can be found in the [interface]/examples
directory in the source code (where interface is one of R, octave, matlab,
python, python-modular). Note that documentation for python-modular is most complete and also that python's help function will show the documentation when working interactively:
$ python
Python 2.4.4 (#2, Jan 3 2008, 13:36:28)
[GCC 4.2.3 20071123 (prerelease) (Debian 4.2.2-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from shogun.Classifier import SVM
>>> help(SVM)
class SVM(CSVM)
| Method resolution order:
| SVM
| CSVM
| CKernelMachine
| Classifier
| SGObject
| __builtin__.object
|
| Methods defined here:
|
| __init__(self, kernel, alphas, support_vectors, b)
[...]
|
Below we provide some of the examples that were used to carry out experiments for a number of publications. Note that all of these can also be found in the source code.
|
Click on the corresponding link to see classification and regression examples for Matlab(tm), R, Octave or Python:
|
Below one finds some Bioinformatics examples (for octave and matlab) as presented at BOSC 2006:
|
Multiple Kernel Learning examples (JMLR 2006 paper "Large Scale Multiple Kernel Learning"):
|
|
|
|
|
|
|
In case of comments, problems, questions, bug-reports etc. please use the mailing list (subscription required)
In case you need to directly get in touch with us, feel free to contact
|
Want to contribute ? We maintain SHOGUNs source code via SVN
- To browse the source code of the current and previous releases use
http://svn.tuebingen.mpg.de/shogun/releases/
- To access the source code via svn use
svn checkout http://svn.tuebingen.mpg.de:/shogun/releases shogun-releases
- To get access to the most up-to-date svn-trunk contact us for read/write access. Then use
svn checkout https://svn.tuebingen.mpg.de:/shogun/trunk shogun
|
The authors gratefully acknowledge the support of DFG grant MU 987/2-1 and the PASCAL Network of Excellence.
|
|
|