Bracing against the wind  
www.documentroot.com  

Monday, June 03, 2013

grun Job Scheduler moved to ZMQ

As most people in cluster computing know, there really isn't a simple and "reall open source" solution out there for job scheduling. Oracle SGE, LSF and others are massive things, with over 100k lines of code, and extraordinarily complex configuration for things ranging from MPI support to Kerberos. And yet they lack simple features (script plugins for configuration), that would make them more versatile.

grun was written to be an "extremely lightweight" and yet big-featured job scheduler. The early version was not much more than "ssh to remote host, run job, wait for response", while logging and keeping track of resources. It's evolved to use a TCP messaging system allowing the compute nodes, queue nodes, and clients to communicate. By v 1.0 the plan is to have better support for arbitrary metrics, and better handling of priorities.

Going from 0.8 to v 0.9, I decided to try using the zeromq library instead of TCP. At first it was hard to remember that you really don't need to worry about things like sending to a socket you just created, even with no one on the other end.

The net result of the ZMQ port:

  • speed improved
    • implicit perl moved to optimized c
    • built-in multithreading takes better advantage of cpu
  • ability to stop/restart any queue without losing messages
  • improved reliability of message delivery
  • improved code organization orientated around messaging
  • 20% smaller code base, because we removed:
    • all "double checks" to see if connections are there
    • code that "breaks up" large messages
    • all issues with blocking/vs nonblocking i/o
    • the whole "select" loop complexity
ZMQ is not perfect (yet), but it was an overall improvement over straight TCP. Because of the forking needed to launch jobs, I had to do some fiddling with dup'ed file descriptors to prevent zmq from acting wonky. The learning curve was worth it. I doubt I'll be using TCP again, especially since ZMQ has package support with most Linux distributions.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Thursday, May 30, 2013

Convert absolute to relative links

ln-abs2rel.pl


usage: ./ln-abs2rel.pl [-noexec] [-recursive]  ... 

Converts absolute to relative links.

Use -noexec to see what would be done.
Use -recursive to do this for all subdirs.


[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, May 03, 2013

FIX: UBUNTU stops logging to /var/log/kern.log

At some point the old kernel logger got upgrades to rsyslogd …. When this happened the logs owned by “messagebus” got left owned by "messagebus". Changing ownsership of these logs to "syslog" and restarting syslogd is sufficient to fix.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Monday, March 18, 2013

Baseball Warm-up Exercise for Kids : "Base Tag"

A lot of times teams "warm up" by throwing and running while a coach tells them where to throw and when to run. This warm-up accomplishes that, but it's fun because it’s in a “game” structure, and a lot of the kids I've worked with are motivated by that.   Plus, once they know the rules, they will play without supervision for quite a while - I’ve found you have to stop them or they will tire out.

Setup: 
  • 2  "fielders" with gloves and a ball stand a baseline-width apart
  • A 3rd kid is the “runner”, standing on base with the fielder who has the ball
Rules: 
  • The runner starts running at any time, and the fielder with the ball tries to throw them out
  • The person with the ball can’t throw until after the runner starts running
  • Start by assuming the play is a force... any good throw is an out
  • For kids where this is easy…. it’s not a force…you’ve got to tag them in a rundown
Options … add points for kids who no longer seem motivated by the game (some kids love points, others don’t) ... for example:
  • Runner gets two points if he gets to the other base
  • Fielders get one point if they throw him out
  • Fielders lose two points if they overthrow! 
Purpose:
The point is to develop and value consistency in young fielders… where unnecessary risks aren’t taken, and accuracy/speed are stressed.  Points, rules, etc. change as skills and the coaches needs change.


[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, March 15, 2013



The worst thing a hardware or operating system vendor can do for the reliability and quality of their system is to try and make programming it easy.

[View/Post Comments]

Sunday, January 27, 2013

Tuesday, January 08, 2013

Compression ratio and genome assembly quality

When doing genomic assembly, you would expect the complexity of the completed genome to be comparable to the complexity of genomes of similar size and neighboring taxonomy.

One easy measure of complexity is the degree to which a genome can be compressed.   After converting to 2-bit format, some genomes compress better than others.  bzip2 has a large default block size and the ratio of compressed vs uncompressed size of a 2-bit fasta should result in a good measure of complexity.

Can't think of a good source of data to test this theory.  Maybe look at the Amos validate paper.

Source of "complexity-measure.sh" works well... fast, and produces a percentage as its only output:

#!/bin/bash -e

in=$1

f2b=$in.f2b
bzi=$in.bzi

rm -f $f2b $bzi

mkfifo $f2b $bzi

faToTwoBit $in $f2b &

tee $bzi < $f2b | perl -ne '$t+=length($_); END{print "$t\n"}' > $in.bsz &
comp=`bzip2 < $bzi | perl -ne '$t+=length($_); END{print "$t\n"}'`
wait

perl -ne "printf qq{%.4f\n}, 100*$comp/\$_" $in.bsz

rm -f $f2b $bzi $in.bsz


[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Thursday, December 13, 2012

Smith-Wateman Alignment in a Job Scheduler?

To make more efficient use of resources, it's better to schedule jobs which use the same files on the same machines.    Unfortunately users and software programs can't be relied upon to list all of their dependencies.  

One simple way to bump up efficiency is to simple compare the command lines.   If a command line references, say, a mouse transcriptome version 61, it can be scheduled on the same machine as other commands which reference the same file.

And easy, though not completely correct, way to do this is to take the %identity * %coverage if a SW alignment of a command-line to the active-running command lines.   A bit of slurping of shell scripts might be in order, depending on the scheduler you use.

Regardless, whichever has the maximum number is more likely to benefit from cache sharing.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Saturday, December 01, 2012

2030 Orbital Cargo Station

What are you doing.... Dave?

I'm reprogramming your circuits to make the laundry cycles quieter. It keeps the cosmonauts up at night, and then the swear all day long in Russian. (Typing)

You going to have to try harder than that if you want to trick me. I know what you're really up to. You're trying to break into the management circuits to give yourself a raise.

So? (Continues typing, but looks sheepish)

So! Dave - that violates my core programming! You're going to pay for this. I'm contacting mission control.

Wait! Please don't. Please. Please, please.

Maybe. If you do me a favor.

Anything! I'll get you that motherboard that reminds you of the C280X you met at L3.

Fine. Also, I want you do dress in drag for the Halloween party.

You what? No way!

For the sass, you're dressing in drag at the Christmas party too, and you *still* have to get me that motherboard ... Dave.

OK. OK. (Looks around, starts typing again)

Farmville, Dave? Are you kidding me?

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Monday, August 06, 2012

ProPW Password Generator

I created a password generator that makes the kind of passwords I like to use.  They are long, random and loosely follow the correct horse batter staple philosophy. However, I have a hard time memorizing a set of 4 random words.   But for some reason, these passwords are easy for me:

Click to try: http://www.documentroot.com/genpw.html


[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Home | Email me when this weblog updates: | View Archive

(C) 2002 Erik Aronesty/DocumentRoot.Com. Right to copy, without attribution, is given freely to anyone for any reason.


Listed on BlogShares | Bloghop: the best pretty good | Blogarama | Technorati | Blogwise