Mittwoch, 16. Dezember 2015

cloning my git repos

Setting up a new box and need to keep track of the stuff I work on.

R packages

in home directory
mkdir prog
cd prog

git clone https://github.com/protviz/quantable
git clone https://github.com/protviz/pepfdr
git clone https://github.com/wolski/imsbInfer
git clone https://github.com/wolski/topGoUniprot
git clone https://github.com/wolski/prozor

easiest way to install them in R is in RStudio - advantage : also saturates dependencies
library(devtools)

install_github("protviz/quantable")
install_github("protviz/pepfdr")
install_github("wolski/imsbInfer")
install_github("wolski/topGoUniprot")
install_github("wolski/prozor")


## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("topGO")

Mittwoch, 9. Dezember 2015

Links to useful bioinformatics resources

http://humancyc.org/

CPDB
http://www.consensuspathdb.org/
Works for Human, Yeast and Mouse

http://www.wikipathways.org/index.php/WikiPathways

http://www.webgestalt.org/


String
http://string-db.org/
Protein interactions for many organisms

David
https://david.ncifcrf.gov/
Can help to annotate, classify your proteins



Some other super fancy links I need to have closer look at in the near future:
http://bost.ocks.org/mike/algorithms/

Dienstag, 10. November 2015

ML in python

This week google opensourced tensorflow. (available on apple, but no prebuild pip package for windows).

To me anyway good reason to start collecting my links to python ml libraries in one place.

Freitag, 6. November 2015

Go or not to go?

Go or not to go? After 10 years of C++ software development, and sometimes spending more time with writing make files and adjusting build configurations, 'discussing' advantages of build systems (cmake vs bjam), than with programming; and without a platform independent dependency management for C++, as part of the language standard on the horizon, spending more time in setting up dependencies than I wish I would, it's time to have a look at alternatives.

What criteria to use to evaluate the suitability of a programming ecosystem for scientific computing?

Some requirements (*** - very important, **-important, *- nice to have) : 
  1. Simplicity ***
  2. Dependency management as part of the language standard ***
  3. Build system implicit  - preferably by convention - NO makefiles, CMaklists.txt, bjam etc ***
    1. But if implicit how does it support integration of platform specific third party C API libraries (i.e. windows specific dll for hardware)?
  4. Are compiler options part of the language standard ***
  5. Is the documentation format part of the standard?*
  6. Compile time. **
  7. Runtime performance. * (as long as it is factor 10 for 99.9 of the use cases compared with C it's fine).
  8. Threading support
  9. Ease to integrate legacy C and C++ code (i.e. ffi - java project Panama, better late than never) *
  10. extern "C"; C linkage - be able to expose C interface, so integration in other languages through an C interface possible. Creating lib's with C interface possible. *
  11. Support in deploying applications (preferably and ideally statically linked executable).
  12. Tooling support (java eclipse, intellij) ** 
    1. interactive debugger **
    2. lint tool
    3. test runner
    4. test coverage tool **
  13. Std library - support for logging, testing, serialization to text formats.
  14. Language - "fits into a single developers mind". **
  15. Platform independence (linux and windows suffices) **
  16. Coding style guidelines part of the language standard. **
  17. Modern syntax (python) **.

Add 8. and 9. It is a must have, but since C and C++ does not have a dependency management and you cant just specify a dependency to some C++, just to have the dll as a dependency, not speaking about linking to it, is a pain already -> see C and C++ lack of package manger...

Summary


Python has it to a large extend, and although it does not make the cut in point 7 and 8 (although one could argue that cython has it) it's anyway in. I use it everyday but as a platform independent bash. Compared with bash it's a huge step forward ;-).

I am going to more closely evaluate : JVM and java in particular (with groovy or kotlin as goodies), Go and D.


JVM

Java or rather the JVM with modern languages such as Kotlin or Groovy, with its huge code base, and popularity exceeding since many years that of C++ and matching that of C in programming language rankings, is by far the strongest contender.

Some new features planned for java 9 (module system - Jigsaw - which will reduce startup times and memory footprint)  and java 9+ such as Java project Valhalla (especially value objects) or Panama are exciting.

So why at all to consider anything else:
- JVM not a default on many architectures/machines
- memory footprint (see value objects which aren't there yet and who knows if they arrive and when).
- you can't just run it on the command line as a native binary. You will need some executable wrapper.
- Actually 2 Javas - Oracle java and Android Java 6 and the housekeepers have a fight about it since many years although according to news from http://www.heise.de/developer/meldung/Android-N-Googles-Mobilsystem-wird-auf-Open-Source-Java-OpenJDK-aufsetzen-3057453.html Google seems to be moving to OpenJDK.

Ad 2. there is no standard although maven is the "semi standard". Still the mvn files are sometimes 1000 lines long, support by the IDE's is mixed, and each IDE has it's own build system in addition.

Pros:
- community and 3 party libraries with spark, hadoop, weka an imagj etc etc.


GO

General introduction:
https://talks.golang.org/2012/splash.article

ad 5: https://golang.org/cmd/cgo/
ad 9.1https://blog.cloudflare.com/go-has-a-debugger-and-its-awesome/

ad 1: There are not there yet although they recognized it as essential

Machine learning with go
https://github.com/josephmisiti/awesome-machine-learning#go-nlp
http://www.fodop.com/ar-1002
http://biosphere.cc/software-engineering/go-machine-learning-nlp-libraries

A go webserver with exe embedded content.
http://peterhoward42.wim42.webfactional.com/media/go-std-alone-gui.pdf
https://github.com/peterhoward42/godesktopgui



D

Violates as C++ does point 1.


Aknowledgements:
https://darrenjw.wordpress.com/2013/12/23/scala-as-a-platform-for-statistical-computing-and-data-science/


Dienstag, 27. Oktober 2015

My first python package on pypi - mspy

Have you heard about http://www.mmass.org/?
It's a python application (with a GUI) for processing and visualizing mass spectrometric data.
Unfortunately, development is discontinued (the latest release was  Jul 1, 2013)*. Unfortunately since there are a few cool features in mMass:
  • mzXML file access
  • peak detection and deisotoping
  • recalibration
  • ...

Since mMass, at the first glance is code of good quality, and algorithms are separated from the GUI, it was relatively easy to create a python package containing the algorithm part**. This part is now available as package here:

https://pypi.python.org/pypi/mspy

(Windows 64 version for the start)

so
pip install mspy
will do.

The sources of the package is now here:
https://github.com/wolski/mspy

There are no unit tests (I did not find any, if I will be using mspy I will be adding some).


* Martin Strohalm, PhD. the developer of mMass is now, according to his linked in profile, at Thermo Fischer in Bremen developing I guess some super cool software in C# for the Thermo Orbitrap Mass-specs.

** Although the author specified in the setup.py file is Martin Strohalm at pypi I am missleadingly specified as the author!?

Montag, 26. Oktober 2015

Resharper C++ and findMFBase

Finally got findMFBase
build in with VS. (see my previous post about findMFBase).

The generated .vcproj's build the unit tests shipped with findMFBase.
I have an academic licence of C++ Resharper and would like to put this software to some use.

The problem is that C++ ReSharper seems not to be aware of the header files of findMFBase (a header only library).

As long as I am in the .cpp file everything seems fine but if I open an header than the #include statements pointing to other header files within the findMFBase library are marked in red.

I did the cmake glob trick to have the header files included in the project.

file(GLOB Demo_HEADERS "sql/*.sql" "include/*.h" "include/**/*.h" "include/**/**/*.h"
 "include/**/**/**/*.h" "src/**/*.h" "src/**/**/*.h" ".travis.yml")
add_library(headers SHARED ${Demo_HEADERS} Dummy.cpp)


But as already said Resharper C++ is not aware of them.

I did post the problem also at the C++ ReSharper forum. Curious what they reply? Hopefully they are not going recommend to use clion because of cmake (Visual Studio is so much cooler).

Freitag, 23. Oktober 2015

Building findMFBase on windows for VisualStudio

I am trying to build the project https://github.com/findMF/findMFBase 
on windows. (for details of linux build check the .travis.yml file and take a look at Travis build results).

Since, cmake's find_boost does not work on windows, and also setting Boost_DIR in CMake GUI does not work I needed to add to the topmost CMakeLists.txt file

set(BOOST_ROOT "C:/boost/boost_1_57_0/")

Likely you boost installation is somewhere else so adjust it.

I did download prebuild boost libraries from here:
http://sourceforge.net/projects/boost/files/boost-binaries/1.57.0/

And also renamed the lib_win64... directory (the containing the .lib files) into lib/

Than I did run (in PowerShell):

mkdir findMFBaseBuild
cd findMFBaseBuild
cmake ..\findMFBase -G "Visual Studio 12 2013 Win64"

(Since I have VS 2013 community edition installed)

Than I open the generated sln in VS and build the ALL_BUILD project.


What I struggled with when building on windows was the following (now fixed)


Some projects are build but the first project having a dependency on data_time fails with
fatal error LNK1104: cannot open file 'libboost_date_time-vc120-mt-gd-1_59.lib'

I did a search for liboost_*.lib vs. boost_*.lib which returns some discussions on Stackoverflow pointing too:

SET( USE_STATIC_BOOST ON)
set(Boost_USE_STATIC_LIBS ${USE_STATIC_BOOST})


So the solution:
add_definitions( -DBOOST_ALL_NO_LIB )

For more detail see http://www.boost.org/doc/libs/1_57_0/libs/config/doc/html/index.html



Donnerstag, 15. Oktober 2015

Virtualenv on windows and building python packages

using now conda to manage virtual envs on windows (update 19/12/2015)

http://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/

for interop with pycharm see:

http://stackoverflow.com/questions/28390961/using-anaconda-within-pycharm

Very helpful (notice got it running only with python 2.7):

http://www.tylerbutler.com/2012/05/how-to-install-python-pip-and-virtualenv-on-windows-with-powershell/

Once you have set up the virtual env and you want to use it in a new powershell just edit the shortcut to PS so that :
Target: %SystemRoot%\syswow64\WindowsPowerShell\v1.0\powershell.exe -ExecutionPolicy Unrestricted
Start In: %HOMEDRIVE%%HOMEPATH%

Creating virtual environment:
Import-Module virtualenvwrapper
New-VirtualEnvironment applicake


Loading a virtual env:

Import-Module virtualenvwrapper 
(not sure if it is actually necessary?)

# I have already these environments
Set-VirtualEnvironment engineer
Set-VirtualEnvironment applicake
Set-VirtualEnvironment mspy

Virtual env on linux

source virtenvs/systemmhc/bin/activate

Building native python packages on windows just download:

Microsoft Visual C++ Compiler for Python 2.7


afterwards you can install numpy by executing
pip install numpy

Some package are more difficult to build because of their dependencies. In such a case visit:
http://www.lfd.uci.edu/~gohlke/pythonlibs/

Building python packages

Just create a folder in the folder create a setup.py file, create a subfolder, place in the subfolder your .py files and an __init__.py file (see my tiny python project cakeme).

install package in user directory  (no virtual env but --user directory):

pip install --user cakeme

install package locally in development mode:

pip install --user -e . (begin in the package directory)

create local whl file

python setup.py bdist_wheel

Register package with pypi.python.org


python setup.py register

Pushing changes in python package to pypi:


python setup.py bdist_wheel upload




How to install a whl file:

pip install some-package.whl

Some more tipps for powershell users:

$env:Path += ";C:\Program Files\Sublime Text 2\"

Montag, 12. Oktober 2015

Calling pwiz build from visual studio

I got this information from Parag and since it might be relevant to many I post it here:

So it means that you can use visual studio as plain text editor when developing pwiz. I might use notepad++ instead.

Donnerstag, 1. Oktober 2015

maltcms - getting started

http://maltcms.sourceforge.net/

git clone git://git.code.sf.net/p/maltcms/code
git fetch origin
git checkout release-1.4.0

cd maltcms-code
mvn install


Furthermore, download data from:
http://sourceforge.net/projects/maltcms/files/maltcms/example-data/

in
maltcms-code\maltcms-distribution\target\maltcms-1.4.0-SNAPSHOT-bin\maltcms-1.4.0-SNAPSHOT>

bin\maltcms.bat -Xrunjdwp:transpt=dt_socket,server=y,address=8888,suspend=y -c cfg\pipelines\bipace.mpl -f C:\Users\wolski\prog\maltcms-sampledata\maltcms-example-data\cdf\1D\*.cdf -o testOutput

bin\maltcms.bat -c cfg\pipelines\bipace.mpl -f C:\Users\wolski\prog\maltcms-sampledata\maltcms-example-data\cdf\1D\*.cdf -o testOutput


user information:
http://maltcms.sourceforge.net/maven/maltcms/1.3.2/gettingStarted.html


More information:
http://www.biomedcentral.com/1471-2105/13/214

Mittwoch, 30. September 2015

Parameter file format in search gui

This is an issue regarding the parameter file format of the search gui software.
They are using a binary format to store the software configuration. I think it is a no go, so I did open up an issue on github, which was immediately closed.

https://github.com/compomics/searchgui/issues/60

8/10/2015 Update - it's open again....

Mittwoch, 9. September 2015

Building Crux


Quite straightforward. Otherwise well documented here:
http://cruxtoolkit.sourceforge.net/install-tutorial.html


You will need wget with ssl and https. I got mine from :
https://eternallybored.org/misc/wget/


Running comet with crux?

Beware! crux comet and comet have different interfaces.

Comet - you can download from here:
http://comet-ms.sourceforge.net/




Montag, 7. September 2015

R Package Quantable

New package quantable on CRAN:

Methods which streamline the descriptive analysis of quantitative matrices.


Has some routines to simplify various plots (i.e. qqplot) for multivariate data. 
Also a nice volcano.

Most up to date version can be found here:
https://github.com/protViz/quantable


Donnerstag, 6. August 2015

Converting sql database schemas

After some thinkering I believe the best way to cope with various DB providers is to use an ORM.
I did try python SQLAlchemy.
At every corner I run into performance problems with python.



I might post an update on it in the near future.

mysql -> sqlite3
postgre -> sqlite3

CREATE  TABLE IF NOT EXISTS `pyMS`.`feature` (
  `feature_id` VARCHAR(40) NOT NULL ,
  `intensity` DOUBLE NOT NULL ,
  `overallquality` DOUBLE NOT NULL ,
  `quality` DOUBLE NOT NULL ,
  `charge` INT NOT NULL ,
  `content` VARCHAR(45) NOT NULL ,
  `msrun_msrun_id` INT NOT NULL ,
  PRIMARY KEY (`feature_id`, `msrun_msrun_id`) ,
  UNIQUE INDEX `id_UNIQUE` (`feature_id` ASC) ,
  INDEX `fk_feature_msrun1` (`msrun_msrun_id` ASC) ,
  CONSTRAINT `fk_feature_msrun1`
    FOREIGN KEY (`msrun_msrun_id` )
    REFERENCES `pyMS`.`msrun` (`msrun_id` )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;


CREATE  TABLE IF NOT EXISTS `feature` (
  `feature_id` VARCHAR(40) NOT NULL ,
  `intensity` DOUBLE NOT NULL ,
  `overallquality` DOUBLE NOT NULL ,
  `quality` DOUBLE NOT NULL ,
  `charge` INT NOT NULL ,
  `content` VARCHAR(45) NOT NULL ,
  `msrun_msrun_id` INT NOT NULL,
  CONSTRAINT `fk_feature_msrun1`
    FOREIGN KEY (`msrun_msrun_id` )
    REFERENCES `msrun` (`msrun_id` )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION);
CREATE UNIQUE INDEX `id_UNIQUE` ON `feature` (`feature_id` ASC);
CREATE INDEX `fk_feature_msrun1` ON `feature` (`msrun_msrun_id` ASC);



Montag, 13. Juli 2015

Windows tooling

Depends http://www.dependencywalker.com/
SysinternalsSuite https://technet.microsoft.com/en-us/sysinternals/bb842062.aspx

Building Pwiz is nonstandard ....

checkout 

svn checkout --username=witek96 svn+ssh://witek96@svn.code.sf.net/p/proteowizard
/code/trunk proteowizard-code

Download msparser from the matrixscience website...

tar jxf msparser_2_5_2_x86_linux64.tar.bz2

On Linux

./quickbuild.sh -j4 toolset=gcc --msparser-path=/home/witold/prog/msparser/gnu variant=debug pwiz_tools/BiblioSpec > log.txt

./quickbuild.sh -j4 toolset=gcc variant=debug pwiz_tools/BiblioSpec > log.txt


./quickbuild.sh -j4 toolset=gcc --msparser-path=/scratch/wolski/msparser/gnu


with prefix directory:

 ./quickbuild.sh -j4 --i-agree-to-the-vendor-licenses --msparser-path=/scratch/wolski/msparser variant=debug pwiz_tools/BiblioSpec --prefix=/scratch/wolski/pwizbuild

On Windows


With --msparser-path set

win 64

quickbuild.bat toolset=msvc-12.0  address-model=64  --i-agree-to-the-vendor-licenses --msparser-path=C:\Users\Wolski\Downloads\msparser2_5\vs2013 variant=debug pwiz_tools/BiblioSpec > log.txt

please not that I did specify pwiz_tools/Bibliospec as the build target..

win 32

just do not specify the address-model

.

Update : 12/10/2015 - building all of pwiz

quickbuild.bat  --i-agree-to-the-vendor-licenses toolset=msvc-12.0  -j8 --hash --without-compassxtract --msparser-path=C:\Users\Wolski\Downloads\msparser2_5\vs2013 optimization=space secure-scl=off address-model=64 variant=release






Further pwiz build options



debug
runtime-debugging=on
--without-binary-msdata (make sure to run clean.bat before this one)

quickbuild -j4 --i-agree-to-the-vendor-licenses --abbreviate-paths pwiz_tools/Bumbershoot/myrimatch//install
quickbuild -j4 --abbreviate-paths pwiz_tools/Bumbershoot/myrimatch//install


Some more ways to build Bumbershoot :) 

quickbuild.bat --i-agree-to-the-vendor-licenses -j4 --abbreviate-paths address-model=64 pwiz_tools/Bumbershoot

How to run clean on pwiz/pwiz directory?

there is a clean.bat file

Installing pwiz headers and libraries for use in other projects:


  • I am just in the process to experiment with this command lines: 

quickbuild.bat libraries --libdir=<path to blib>\win\lib --includedir=<path to blib>\src\extern\proteowizard\install\include

quickbuild.bat toolset=msvc-12.0  address-model=64  --i-agree-to-the-vendor-licenses libraries --prefix=C:\Users\wolski\prog\pwiz-libraries-64

 ./quickbuild.sh -j4 --i-agree-to-the-vendor-licenses --msparser-path=/scratch/wolski/msparser variant=debug pwiz_tools/BiblioSpec --prefix=/scratch/wolski/pwizbuild


Donnerstag, 29. Januar 2015

Pointers to Functions with Swig

http://web.mit.edu/svn/src/swig-1.3.25/Examples/java/funcptr/index.html
http://stackoverflow.com/questions/4313004/registering-java-function-as-a-callback-in-c-function

http://stackoverflow.com/questions/12210129/how-should-i-write-the-i-file-to-wrap-callbacks-in-java-or-c-sharp

void* als jobject:
http://stackoverflow.com/questions/26110908/swig-typemap-java-object

Mocking singletons....


 public class LicenseDataLocator {
private static LicenseData licenseData;

    public static synchronized LicenseData getLicenseData() {
        if (licenseData == null) {
            licenseData = new LicenseDataLocator().readLicenseData();
        }
        return licenseData;
    }
   

// here you can insert your mock.
    public static void setLicenseData(LicenseData licenseData) {
        LicenseDataLocator.licenseData = licenseData;
    }