Thursday, December 11, 2008

Google Docs Presentation

I have created a Google Docs presentation here

Its titled 'Biological basics for non-biologists'

Its aimed at people who work with biological information a lot of the time
but who don't have any formal training in biology and want to be able to 
understand what they're working with a little better.



Thursday, May 01, 2008

Using names of methods and techniques as a way to make your work 'fit in'

Consider these two recently published article titles.

"GAPscreener: An Automatic Tool for Screening Human Genetic Association Literature in PubMed Using the Support Vector Machine Technique"

"Extraction of semantic biomedical relations from text using conditional random fields"


I have no problem with the work, or the conclusions they draw or the methods they use.
But I do have a problem with the way that this title.

"Extraction of semantic biomedical relations from text, using a method which we chose because it was the most appropriate for task, but it doesn't really have a name that you will know"

Sounds rubbish, even after ignoring its extreme verbosity.

How can you present work that uses methods or techniques that are novel and well chosen and yet could be previously unpublished, don't have a specific name, and are not well known in the field?

Also it does make you wonder if people use 'known' methods for tasks, even if they are not the most appropriate choice, just to ease the process of peer review and publication.

Personally i'm not keen on fitting in, just for the sake of simplicity.

Thursday, April 10, 2008

Searching code

Google code search is truly brilliant.

Useful options
lang:java
Limits the search to Java code only, same works for many other languages see here.

Recently I wanted to look at some good examples of SwingWorker implementations.
And rather than doing the normal plain Google searches for "SwingWorker", or "SwingWorker example"/"SwingWorker tutorial", I thought i'd try code search. It worked really well and the best thing is, you get to see the code straight away instead of having to wade through download pages etc. It also has a very nice package hierarchy on the top left of any class so you can follow the usage of classes.

I searched for "extends SwingWorker lang:java".

Language as a complex phenotype

Just read this, an essay by Mark Pagel in Nature.
Other things I have read recently should by on the sidebar of this blog or you can look at them here.

I thought it was an extremely well thought out essay, with some very original and creative ideas that brought to my mind other ideas I'd had in the past.

On the nature of intergenic DNA (normally called 'junk' DNA, although this name significant underepresents its importance in the genome).
  • I agree that it is important for phenotypic regulation, and that it must be important in developing the complexity seen in phenotypically complex organisms (humans, trees, beetles).
  • RNA may play an important role, and may be what most of this DNA is doing there (see work by John Mattick).
  • I also believe intergenic DNA has physical importance in regulation, i.e. intergenic regions create novel promoter/enhancer elements, modifying polymerase assembly/transcription factor recruitment. Imagine shapes within shapes, of promoters enhancing TATA boxes to regulate transcription factor bound enhancement of ncRNA structures that catalyse RNA cleavage.
  • Even though the previous point is entirely unproven (and mostly rubbish), you can't deny it all adds up to a whole lot of complexity.

I have to say though I didn't totally agree with some of his statements.

  • He suggests analogue measurements are less precise, when it seems to me they can be more precise, and as long as you don't need to store analogue data (i.e. reduce its precision for storage e.g. rounding errors) then analogue will always be more precise.

In my opinion the genome cannot be encoded in an analogue way, but can be interpreted so. If you use an inexact system to read the digital genome you get an analogue result. E.g. transcription does not produce 10 RNA copies of a gene, it just transcribes it until the transcriptional machinery is no longer available or moves away.

Having said all of that though I do think that in regulatory systems, number/counts of molecules are important, and that concentrations often 'miss the point'. If an enzyme is at a very low concentration, there will be very few molecules of it around and therefore the chances of it bumping into its necessary reagents/cofactors etc. are not necessarily concentration dependant.

I think language is 'the voice of our genes' just as our brains are an adaptive strategy. The brain is a truly brilliant product of evolution. It allows us to evolve our behaviour, and to some extent our bodies, within our own lifetime, this is something that 'hard-coded' behaviour/instinct and non-plastic development cannot do. The only trouble is we can't pass it on to our offspring directly, we have to use language to tell our children about our experiences, so they can improve/modify them.