Marginally Interesting: Command Line Interactive Machine Learning on the JVM. Part 3: Missing Parts
This is Part 3 of a series. Part 1, Part 2.
In Part 2 I’ve discussed different options as a scripting options for a command line environment to do machine learning data analysis. In the final part, I want to mention two areas where I see most need for improvement currently.
No good readline for Java
You need some minimal editing capabilities on the command line to be productive. The most well-known project seems to be jline. It is used by practically all scripting languages on their shell, for example, JRuby, Groovy, Scala. There exists an interface to readline from Java
However, in its current form, JLine is quite buggy. Most importantly,
it lacks the convenient “Search Backward in History” feature which I
use a lot to find lines in the history. ;)
, while
Jason Dillon has cleared up the code base
significantly.
Still, JLine is actually quite a hack. It uses the stty
command to
control the terminal, meaning that it integrates quite poorly with
changes of the terminal window size, or signals. On Windows, it has
the annoying bug that you cannot see the cursor as you move it around.
Some work would should be put into cleaning the code base, adding sensible terminal control and more features, but as it sort of works, nobody (including me, of course) feels the urge or has the time to really do something about this.
No flexible plotting for Java
Concerning the plotting library, probably the most well-known is JFreeChart, but I’m not really satisfied with that library for a number of reasons: Although it is open source, you have to buy a book to get some decent documentation (javadocs are available, though). JFreeChart produces some nice plots, but I think they are closer to what you get in Excel than what matlab provides. JFreeChart also comes with its own classes for handling the data which means that you have to copy your data into those structures to display them. There are some more options, but none of them seems as feature rich as JFreeChart.
One other problem is that printing is more or less broken under Linux when you’re relying on CUPS. On my debian box, I invariably get a “No Printing Services found” error every time I try to print from any Java program. There are also some bugs which haven’t been fixed in years. The bottom line is that you cannot really rely on the built in printing capabilities of Java to generate plots for your paper - which is really a shame.
Other options probably are to use a SVG library like batik, or switch to pure Javascript graphics libraries like Raphaël or processing.js to do the plotting inside a web browser.
So in summary, there are two main missing features: A feature rich, stable readline replacement, and a flexible plotting solution which also prints.
Some pointers
I haven’t talked about this at all until now, but of course there are also already several machine learning toolboxes in Java or other JVM related languages. Of course, these projects are more or less ignorant of one another, yet, so more work would be require to write some common interfaces. Here is just a short list to get you started, also look at mloss.org
- Weka is quite mature and comes with a GUI to do experiments.
- JavaML is a collection of many common machine learning algorithms.
- Apache Mahout is a library for doing map-reduce-style machine learning on a Hadoop cluster.
- Finally, there are also several more specialized projects, for example RL Glue and Codecs for reinforcement learning, or factorie for graphical models.
Don’t hesitate to post more links in the comments!
Posted by Mikio L. Braun at 2010-04-19 12:55:00 +0200
blog comments powered by Disqus