STAT (StatPackage)

by Jainab Khatun, Ph.D.

The program analyzes a whole set of mass lists and their corresponding sequences to derive statistics. It matches ions from an MS/MS spectrum to those predicted from a corresponding peptide sequence, then calculates the following:
1. Frequencies of each of 18 different ion types
2. The frequencies of each of the 18 different ion types in each of 10 different mass and intensity bin.
3. The cleavage frequencies for each of 20 different amino acids in both N-terminal and C-terminal side.
4. The occurence of residues internally to daughter ions, and their effect on observation.

A manuscript further describing the program is presently under review.

File formats:
The input mass lists should be in .pkl format. The sequences should be plain text files.


Running the program


This is a command line application. No fancy GUI here. Executables are provided for Mac OS X, Linux, and Windows XP in the "bin" subdirectory. It can likely be compiled on a lot of other platforms as well (any platform on which Gnustep compiles).

Platform specific run and/or compile instructions are below.

Command Line Arguments:
STAT -s <<sequence directory>> -m <<mass list directory>:
The <<mass list directory> is a directory of pkl files to be analyzed
The <<sequence directory>> is a directory consisting files each of which is a text file containing the peptide sequence. For each of the pkl files in the mass directory, the program will read the sequence file with the same name and if the same name sequence does not exits, the program will exit. Therefore for each pkl file, the program needs corresponding sequence file, however, extra sequence files may exist in the directory.

Bootstrap resampling:
STAT -s <<sequence directory>> -m <<mass list directory>> -b <<number-to-repeat>> <<number-to-sub-sample>>: The <<number-to-repeat>> is the number of times to repeat the calculation and <<number-to-sub-sample>> is the how many sub- samples to randomly choose from the total data set.


Examples:
In the Examples directory, there are some pkl mass lists (Examples/Masses) and corresponding sequences (Examples/Sequences) directories. One might use the program as follows:

cd StatPackage/binSTAT -s ../Examples/Sequences -m ../Examples/Masses

with bootstrap
STAT -m ../Examples/MASSES -s ../Examples/Sequences -b 20 100 (to repeat the calculation 20 times with 100 randomly selected spectra in each)

Resources:
There is a "Resources" subdirectory that comes with the distribution, which contains a file with the masses for all amino acids. This should reside as a peer directory to the "bin" directory. It's location is presently hardcoded into the program, relative to the location from whence the binary/executable is run (../Resources/). Hence, it's usually good to invoke the executable from within "bin".


Linux


The program has been pre-compiled and tested to run under Ubuntu 5 or 6, Debian 3.0, and Fedora Core 5.
The executable name is "STAT.linux". See the example above for usage.

To re-compile under Linux, it is necessary to install the gnustep developer package. This can be done in one of two ways:

1. (easy) use apt-get install or synaptic package manager to get and install "gnustep-core-devel", for Debian, or download and install one of the rpm files at http://rpmforge.net/user/packages/gnustep-make/ for Redhat/Fedora.

2. (some effort) Go to http://wwwmain.gnustep.org/resources/downloads.php?site=ftp%3A%2F%2Fftp.gnustep.org%2Fpub%2Fgnustep%2F#core and first make sure that your system has all the "Pre-requisites" installed. Then download and install the GNUStep Startup package (which contains the Make/Base/GUI/Backend).

Windows


The program has been pre-compiled (STAT.exe) and will run at the command prompt (and might even work if you just double click it).
Command line example:
1. Start->>Run...
2. type "cmd" then hit enter
3. navigate to the "bin" directory of the distribution using "cd"
4. invoke the program with "STAT.exe -s ../Examples/Sequences -m ../Examples/Masses"
The "STAT.exe" executable must reside in the same directory as the supplied .dll files.

To compile under Windows you will need MinGW, MSYS, and Gnustep installed. Instructions are at http://mediawiki.gnustep.org/index.php/Installation_on_Windows. There is also a pre-package Gnustep/Windows installer you could try, downloadable from ftp://ftp.gnustep.org/pub/gnustep/binaries/windows/base-1.11.1-gui-0.10.1.

Mac OS X


The binary is named STAT.osx. This is a universal binary for both intel and powerpc. It will run only on V10.4 and greater.

Open a terminal, navigate to the bin directory, and invoke it just as above.

To compile for OS X, you will need Xcode installed. Then just double click "STAT.xcodeproj" and then when it opens in Xcode, hit "build". Or play with the code for a while then see if you broke it!

Other Platforms

The program can be compiled for any other platform on which Gnustep is supported.

Download

The current version of StatPackage can be downloaded here, or from our Downloads page.

License

StatPackage Open Source Software License
(c)2006-2007 The University of North Carolina at Chapel Hill

The University of North Carolina at Chapel Hill (the "Licensor") through its Department of Microbiology & Immunology and Dr. Morgan Giddings is making an original work of authorship for StatPackage (Calculating statistics for MS/MS fragmentations and hereinafter the "Software") available upon the terms set forth in this Open Source Software License (this "License"). This License applies to any Software that has placed the following notice immediately following the copyright notice for the Software: Licensed under the WStatPackage Open Source Software License v. 1.0.

Licensor grants You, free of charge, a world-wide, royalty-free, non-exclusive, perpetual, sublicenseable license to do the following to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

-- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimers.

-- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimers in the documentation and/or other materials provided with the distribution.

-- Neither You nor any sublicensor of the Software may use the names of Licensor (or any derivative thereof) or of contributors to the Software without explicit prior written permission. Nothing in this License shall be deemed to grant any rights to trademarks, copyrights, patents, trade secrets or any other intellectual property of Licensor except as expressly stated herein.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE CONTIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Under no circumstances and under no legal theory, whether in tort (including negligence), contract, or otherwise, shall the Licensor be liable to any person for any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or the use of the Software including, without limitation, damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses. This limitation of liability shall not apply to liability for death or personal injury resulting from Licensor's negligence to the extent applicable law prohibits such limitation. Some jurisdictions do not allow the exclusion or limitation of incidental or consequential damages, so this exclusion and limitation may not apply to You.

This License represents the complete agreement concerning the subject matter hereof. If any provision of this License is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable.

"You" throughout this License, whether in upper or lower case, means an individual or a legal entity exercising rights under this License. For legal entities, "You" includes any entity that controls, is controlled by, or is under common control with you. For purposes of this definition, "control" includes (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.

You may use the Software in all ways not otherwise restricted or conditioned by this License or by law, and Licensor promises not to interfere with or be responsible for such uses by You. This Software may be subject to U.S. law dealing with export controls. If you are in the U.S., please do not mirror this Software unless you fully understand the U.S. export regulations. Licensees in other countries may face similar restrictions. In all cases, it is licensee's responsibility to comply with any export regulations applicable in licensee's jurisdiction.