Bracing against the wind  
www.documentroot.com  

Wednesday, February 29, 2012

gethostbyname command line

Pasteable script below. I can't believe this doesn't exist. No command line tool to get a host name using the resolver on linux?


#!/usr/bin/perl
use Socket;

$host = shift @ARGV;
die("usage: gethostbyname hostname\n") unless(defined($host));

$packed_ip = gethostbyname($host);

if (defined $packed_ip) {
$ip_address = inet_ntoa($packed_ip);
print "$ip_address\n";
exit 0
} else {
warn "$host not found\n";
exit 1
}

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Thursday, February 23, 2012

Google Scholar Bookmarklet

This link is a bookmarklet which takes the current query string in google, OR the current selection (if there's no q= parameter), and searches google scholar. Now that Google has removed scholar from the dropdown (sad), this is necessary for me to work the way I've been working.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, February 17, 2012

As predicted

As predicted 6 months ago, and again last week, Oxford Nanopore is shaking up the industry. Pacbio, Complete Genomics, Life technologies and others are all seeing their stocks drop 5% on the news that sequencing a genome just got 10x cheaper.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Monday, January 30, 2012

Church-Turing thesis and strings

My empirically determined corollary to Turing's computability thesis is that any problem, no matter how interesting it seems on the surface, can be reduces to a deja-vu inducing set of string handling, vector sorting and hash lookups.

And as new features are added, the problem of writing out and reading files slowly becomes most of what that program does - until the program resembles an actual Turing machine: ploddingly scribbling and reading things from an infinite tape which, because of flooding in Thailand, is infintely expensive.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Monday, January 23, 2012

Embed Tab Size Info in Source Text

IDEA: a "universal tab stop signature" line you can add to any text file in which tabs may be found and need to be rendered in an editor.

The sig must be in the first or last 128 bytes of a file (so reading the sig is faster)

The syntax could be something like: open paren or brace, % or # sign, open paren or brace, the word "tab" then a ':', then anything except a closing/matching paren or brace, then a percent and then another closing paren/brace. [(\[][%#][(\[]tab:[^\)]*[%#][(\[]

Examples:

C++
// (%(tab:4)%)

Perl:
# (%(tab:4)%)

HTML (for source editing only):
<!-- (%(tab:4)%) -->

LISP (no parens... too confusing):
; [%[tab:4]%]

SMX (no % signs or parens):
%null([#[tab:4]#])

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, December 23, 2011

Uncovering significant miRNA species with RNA-seq

Note to self: An easy way to uncover significant miRNA species with Illumina's RNA seq data is to run counts on the small RNA between two cell types, then toss out anything that appears ribosomal, filter for some minimum threshold and run a negative binomial distribution test on them like edge-R or DEseq. The resulting adjusted p-values can illuminate new miRNA species.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Monday, December 19, 2011

Xenograft & Contamination Filter

Useful code that uses libbamtools to filter out alignments to another genome. Tested with xenograft and bacterial contamination removal...works great.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, October 21, 2011

Connecticut Martin

Martin Hedge paused to wipe his glasses with the sleeve of his shirt. The crowd at the coffee shop had gotten louder.

It had never been enough. Even though idle wealthy housewies had long since taken up the pen and written libraries of wisdom and humour, and the girls hanging out at the tattoo parlors had discovered the zen of automotive engineering. Even though the unemployed had long ago sought out physics texts and wrangled secrets of a new energy density theory, it still hadn't been enough. The question remained... what were we working for?

The crowd at the coffee shop, from here, seemed angry, but as Martin walked closer he could see familiar faces, sternly triumphant.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, October 14, 2011

Non "allelic" variation - thinking out loud

(Later Note: I wrote this before I knew what it's called. The term most people use is "Somatic Mosaicism". Apparently this is a pretty well researched topic... so I can go back to all the biologists that looked at me like I was crazy and tell them .... hmm.)

Here's a link to a good article on the topic:


Original rant below...

Much of genetics is concerned with "alleles" and "variations". That is, an organism is assumed to be comprised of "one kind of dna". That DNA has 1) inherited alleles from it's parents, and 2) de-novo alleles (also called somatic mutations). The 1000 genomes project estimates at about 30 or so per person. This is probably a very conservative number considering the nature of the 1000 genomes project. IE: all of these mutations are whole-organism, detectible, validated mutations. E-coli error rate estimates would put the range at 30-300, and this might be a better estimate because of how the study was done. Human blastoma cells have has error estimates accurately measured at 10 times that rate. Individual organs may be less sensitive to immune response correction.

But let's assume 30 is our number. It's nice and small. And it's good to have a lower-bound.

That is only the set of variants that went into the "first cell" (fertilized egg) of an organism.

When that egg divides, half the organism has another, different, set of mutations. So 100% of the organism has 30 de-novo mutations and 50% of the organism will have *another* 30 de-novo mutations (30 new ones in that 50%, plus 30 original).

But wait, there's more. When those 2 cells divide in half again, you now get 60 new mutations, 30 from each cell. These 60 will be detectible at the "25%" level ... IE: 25% of the final organism will have them.

High-throughput sequencing can readily detect variation at the "1%" admixture level. That is, commonly detect variation when as little as 1% of the cells have that variation.

So how much variation can we expect, based on a low de-novo mutation rate, detectible at the 1% level?

100%->30, 50% ->30, 25% ->60, 12.5% ->120, 7.25% ->240, 3.12% ->480, 1.6% -> 960

So we can expect about 1000 de-novo variants in a healthy individual, or 32 times the mutation rate. But what if the somatic mutation rate is higher, say, 3000 variants per replication? This may be the case in some organ development.

Thus, at the 1% level, would that be 96000 non-inherited detectable variants. I would call that my "upper bound". In real pileup data... I see around 30% "non-allelic" variation. So if, say, you've got 15000 SNPS (a reasonable number), we would expect 5000 "background" snps.....putting the mutation rate at "156.25" (5000/32). That's smack in the middle of the e-coli based estimate.

Lots of variant callers filter these out.... but I'm interested in them... i think they may be a lot more important than people think.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, September 30, 2011

Multiplayer TD

Looks fun. Seems like it would be cool to play against other players... but they are rarely there. The developer needs to link-in to Armor or Kongregate or Facebook or some other PVP network.

There's probably a market for that... build a PVP portal so that lesser known games that are still fun can have a base of players.


[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Home | Email me when this weblog updates: | View Archive

(C) 2002 Erik Aronesty/DocumentRoot.Com. Right to copy, without attribution, is given freely to anyone for any reason.


Listed on BlogShares | Bloghop: the best pretty good | Blogarama | Technorati | Blogwise