Dienstag, 10. November 2015

ML in python

This week google opensourced tensorflow. (available on apple, but no prebuild pip package for windows).

To me anyway good reason to start collecting my links to python ml libraries in one place.

Freitag, 6. November 2015

Go or not to go?

Go or not to go? After 10 years of C++ software development, and sometimes spending more time with writing make files and adjusting build configurations, 'discussing' advantages of build systems (cmake vs bjam), than with programming; and without a platform independent dependency management for C++, as part of the language standard on the horizon, spending more time in setting up dependencies than I wish I would, it's time to have a look at alternatives.

What criteria to use to evaluate the suitability of a programming ecosystem for scientific computing?

Some requirements (*** - very important, **-important, *- nice to have) : 
  1. Simplicity ***
  2. Dependency management as part of the language standard ***
  3. Build system implicit  - preferably by convention - NO makefiles, CMaklists.txt, bjam etc ***
    1. But if implicit how does it support integration of platform specific third party C API libraries (i.e. windows specific dll for hardware)?
  4. Are compiler options part of the language standard ***
  5. Is the documentation format part of the standard?*
  6. Compile time. **
  7. Runtime performance. * (as long as it is factor 10 for 99.9 of the use cases compared with C it's fine).
  8. Threading support
  9. Ease to integrate legacy C and C++ code (i.e. ffi - java project Panama, better late than never) *
  10. extern "C"; C linkage - be able to expose C interface, so integration in other languages through an C interface possible. Creating lib's with C interface possible. *
  11. Support in deploying applications (preferably and ideally statically linked executable).
  12. Tooling support (java eclipse, intellij) ** 
    1. interactive debugger **
    2. lint tool
    3. test runner
    4. test coverage tool **
  13. Std library - support for logging, testing, serialization to text formats.
  14. Language - "fits into a single developers mind". **
  15. Platform independence (linux and windows suffices) **
  16. Coding style guidelines part of the language standard. **
  17. Modern syntax (python) **.

Add 8. and 9. It is a must have, but since C and C++ does not have a dependency management and you cant just specify a dependency to some C++, just to have the dll as a dependency, not speaking about linking to it, is a pain already -> see C and C++ lack of package manger...

Summary


Python has it to a large extend, and although it does not make the cut in point 7 and 8 (although one could argue that cython has it) it's anyway in. I use it everyday but as a platform independent bash. Compared with bash it's a huge step forward ;-).

I am going to more closely evaluate : JVM and java in particular (with groovy or kotlin as goodies), Go and D.


JVM

Java or rather the JVM with modern languages such as Kotlin or Groovy, with its huge code base, and popularity exceeding since many years that of C++ and matching that of C in programming language rankings, is by far the strongest contender.

Some new features planned for java 9 (module system - Jigsaw - which will reduce startup times and memory footprint)  and java 9+ such as Java project Valhalla (especially value objects) or Panama are exciting.

So why at all to consider anything else:
- JVM not a default on many architectures/machines
- memory footprint (see value objects which aren't there yet and who knows if they arrive and when).
- you can't just run it on the command line as a native binary. You will need some executable wrapper.
- Actually 2 Javas - Oracle java and Android Java 6 and the housekeepers have a fight about it since many years although according to news from http://www.heise.de/developer/meldung/Android-N-Googles-Mobilsystem-wird-auf-Open-Source-Java-OpenJDK-aufsetzen-3057453.html Google seems to be moving to OpenJDK.

Ad 2. there is no standard although maven is the "semi standard". Still the mvn files are sometimes 1000 lines long, support by the IDE's is mixed, and each IDE has it's own build system in addition.

Pros:
- community and 3 party libraries with spark, hadoop, weka an imagj etc etc.


GO

General introduction:
https://talks.golang.org/2012/splash.article

ad 5: https://golang.org/cmd/cgo/
ad 9.1https://blog.cloudflare.com/go-has-a-debugger-and-its-awesome/

ad 1: There are not there yet although they recognized it as essential

Machine learning with go
https://github.com/josephmisiti/awesome-machine-learning#go-nlp
http://www.fodop.com/ar-1002
http://biosphere.cc/software-engineering/go-machine-learning-nlp-libraries

A go webserver with exe embedded content.
http://peterhoward42.wim42.webfactional.com/media/go-std-alone-gui.pdf
https://github.com/peterhoward42/godesktopgui



D

Violates as C++ does point 1.


Aknowledgements:
https://darrenjw.wordpress.com/2013/12/23/scala-as-a-platform-for-statistical-computing-and-data-science/