Bracing against the wind  

Thursday, December 13, 2012

Smith-Wateman Alignment in a Job Scheduler?

To make more efficient use of resources, it's better to schedule jobs which use the same files on the same machines.    Unfortunately users and software programs can't be relied upon to list all of their dependencies.  

One simple way to bump up efficiency is to simple compare the command lines.   If a command line references, say, a mouse transcriptome version 61, it can be scheduled on the same machine as other commands which reference the same file.

And easy, though not completely correct, way to do this is to take the %identity * %coverage if a SW alignment of a command-line to the active-running command lines.   A bit of slurping of shell scripts might be in order, depending on the scheduler you use.

Regardless, whichever has the maximum number is more likely to benefit from cache sharing.

