Bracing against the wind  
www.documentroot.com  

Monday, January 23, 2012

Embed Tab Size Info in Source Text

IDEA: a "universal tab stop signature" line you can add to any text file in which tabs may be found and need to be rendered in an editor.

The syntax could be something like: open paren or brace, % or # sign, open paren or brace, the word "tab" then a ':', then anything except a closing/matching paren or brace, then a percent and then another closing paren/brace. [(\[][%#][(\[]tab:[^\)]*[%#][(\[]

Examples:

C++
// (%(tab:4)%)

Perl:
# (%(tab:4)%)

HTML (for source editing only):
<!-- (%(tab:4)%) -->

LISP (no parens... too confusing):
; [%[tab:4]%]

SMX (no % signs or parens):
%null([#[tab:4]#])

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, December 23, 2011

Uncovering significant miRNA species with RNA-seq

Note to self: An easy way to uncover significant miRNA species with Illumina's RNA seq data is to run counts on the small RNA between two cell types, then toss out anything that appears ribosomal, filter for some minimum threshold and run a negative binomial distribution test on them like edge-R or DEseq. The resulting adjusted p-values can illuminate new miRNA species.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Monday, December 19, 2011

Xenograft & Contamination Filter

Useful code that uses libbamtools to filter out alignments to another genome. Tested with xenograft and bacterial contamination removal...works great.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, October 21, 2011

Connecticut Martin

Martin Hedge paused to wipe his glasses with the sleeve of his shirt. The crowd at the coffee shop had gotten louder.

It had never been enough. Even though idle wealthy housewies had long since taken up the pen and written libraries of wisdom and humour, and the girls hanging out at the tattoo parlors had discovered the zen of automotive engineering. Even though the unemployed had long ago sought out physics texts and wrangled secrets of a new energy density theory, it still hadn't been enough. The question remained... what were we working for?

The crowd at the coffee shop, from here, seemed angry, but as Martin walked closer he could see familiar faces, sternly triumphant.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, October 14, 2011

Non "allelic" variation - thinking out loud

(Later Note: I wrote this before I knew what it's called. The term most people use is "Somatic Mosaicism". Apparently this is a pretty well researched topic... so I can go back to all the biologists that looked at me like I was crazy and tell them .... hmm.)

Here's a link to a good article on the topic:


Original rant below...

Much of genetics is concerned with "alleles" and "variations". That is, an organism is assumed to be comprised of "one kind of dna". That DNA has 1) inherited alleles from it's parents, and 2) de-novo alleles (also called somatic mutations). The 1000 genomes project estimates at about 30 or so per person. This is probably a very conservative number considering the nature of the 1000 genomes project. IE: all of these mutations are whole-organism, detectible, validated mutations. E-coli error rate estimates would put the range at 30-300, and this might be a better estimate because of how the study was done. Human blastoma cells have has error estimates accurately measured at 10 times that rate. Individual organs may be less sensitive to immune response correction.

But let's assume 30 is our number. It's nice and small. And it's good to have a lower-bound.

That is only the set of variants that went into the "first cell" (fertilized egg) of an organism.

When that egg divides, half the organism has another, different, set of mutations. So 100% of the organism has 30 de-novo mutations and 50% of the organism will have *another* 30 de-novo mutations (30 new ones in that 50%, plus 30 original).

But wait, there's more. When those 2 cells divide in half again, you now get 60 new mutations, 30 from each cell. These 60 will be detectible at the "25%" level ... IE: 25% of the final organism will have them.

High-throughput sequencing can readily detect variation at the "1%" admixture level. That is, commonly detect variation when as little as 1% of the cells have that variation.

So how much variation can we expect, based on a low de-novo mutation rate, detectible at the 1% level?

100%->30, 50% ->30, 25% ->60, 12.5% ->120, 7.25% ->240, 3.12% ->480, 1.6% -> 960

So we can expect about 1000 de-novo variants in a healthy individual, or 32 times the mutation rate. But what if the somatic mutation rate is higher, say, 3000 variants per replication? This may be the case in some organ development.

Thus, at the 1% level, would that be 96000 non-inherited detectable variants. I would call that my "upper bound". In real pileup data... I see around 30% "non-allelic" variation. So if, say, you've got 15000 SNPS (a reasonable number), we would expect 5000 "background" snps.....putting the mutation rate at "156.25" (5000/32). That's smack in the middle of the e-coli based estimate.

Lots of variant callers filter these out.... but I'm interested in them... i think they may be a lot more important than people think.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Friday, September 30, 2011

Multiplayer TD

Looks fun. Seems like it would be cool to play against other players... but they are rarely there. The developer needs to link-in to Armor or Kongregate or Facebook or some other PVP network.

There's probably a market for that... build a PVP portal so that lesser known games that are still fun can have a base of players.


[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Getting perl readline to work in Ubuntu

I use perl's debugger and psh (simple perl shell). When the up arrow doesn't work, I always forget how to fix it. Hopefully by posting the solution here, it will be easier for me to find:

apt-get install libterm-readline-gnu-perl

I use perl's debugger and psh (simple perl shell). When the up arrow doesn't work, I always forget how to fix it. Hopefully by posting the solution here, it will be easier for me to find:


[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Wednesday, September 28, 2011

Tuesday, August 23, 2011

Should we try to contact ET?

Would it be safe to meet an intelligent alien lifeform from another planets?

Let's assume an ET (extra terrestrial, alien) is more advanced than we are. Let's assume, again, because it's the only experience we have, that we are a good example of an intelligent life form.

Have other species benefited from their contact with humans? Species that are useful to us, like cows and dogs have proliferated in population, but are controlled/used by us. Species that feed on our leavings, like roaches and pigeons do well.

But, given that we're intelligent, and we'd like to think that aliens would regard us as so, we may want to look only at "higher" mammals, with comparable intelligence. Primates certainly don't do well when humans show up. In fact, iut seems we are particularly brutal when dealing with them. Large mammals were nearly driven extinct from North America after human contact. Some very intelligent species, like right whales, with rich communication systems and highly social habits, were driven to near extinction.

OK, but maybe we're talking about "modern man". Maybe we've escaped that brutal past.

We still round up dolphins to kill them. And, to this day, very little effort has been spent in attempting to understand the language and society of the "alien species" we share our home with. Any time even a modicum of effort has been spent it's been met with "shocking" revelations about how other species have complex grammars, notions of fairness, etc. And still most people in the world refuse to believe that other species can feel pain the way humans do - largely (IMO) as a way of justifying abuse and mistreatment.

So.... I would expect to be treated at least as well as we have treated others. Accordingly, someone should *shut down* the SETI program. ASAP.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Thursday, August 04, 2011

Oracle fails to crush open source

Oracle has repeatedly attempted to acquire it's way into the open source movement, each time charging and licensing technologies it had no hand in developing. Berkeley DB, Sun Grid Engine, Open Office, MySQL, and Java (JRE/JDK), are now all Oracle licensed technologies.

What is admirable is that they make no pretense of being "friends" of open source. They stopped releasing new versions of SGE, and transitioned to a closed-source system immediately. Similar moves are happening in other products at different speeds... with a "what the market will bear" approach.

It took me 6 months to move all my BDB code to SQLite, switch SGE stuff to condor. I never touched MySQL because of the InnoDB creepiness and I'm glad the decision has been vindicated... Postgres was the obvious choice. Oracle's falesly named "Open Office" has thankfully been forked to become the truly open "Libre Office".

The only thing left is Java. And there's nowhere near an adequate replacement. Open JDK is limited to a handful of operating systems. And Java itself may have some questionable licensing - Oracle may be able to seize the whole thing by the fistful.

I've despised Java from the beginning, mostly for it's poor architecture and the way it encourages bad coding (not as bad as python). Watching Oracle roll in and step all over it is, for me, merely another "I told you so" moment.

[View/Post Comments] [Digg] [Del.icio.us] [Stumble]

Home | Email me when this weblog updates: | View Archive

(C) 2002 Erik Aronesty/DocumentRoot.Com. Right to copy, without attribution, is given freely to anyone for any reason.


Listed on BlogShares | Bloghop: the best pretty good | Blogarama | Technorati | Blogwise