Tuesday, March 28, 2006

Here is also an interesting bit of info

This is something i wasn't really aware of.
In a scan through ~200 journal of virology papers with the search "phylogenetic*" in abstract or title, maximum likelihood was the most popular tree-inference method. closely followed by neighbor-joining and the jukes-cantor model of nucleotide substitution was the most popular from the list screened (see below). Also the command-line only version of clustal; version w was much more popular than the GUI/user-friendly clustalX.

neighbor$joining 137
parsimony 94
likelihood 156
bayesian 26
upgma 9
p-distance 5
jukes-cantor 21
kimura$2-parameter 8
kimura$3-parameter 0
tamura-nei 20
f81 3
hky 13
general$time-reversible 14
dayhoff 13
jtt 6
wag 2
modeltest 17
model$of$nucleotide 17
model$of$protein 3
clustal$x 36
clustal$w 84

An update on what is happening

I have moved very strongly into the world of text-mining.
Essentially we are looking to get an idea of how people do their phylogenetics
by extracting it from papers. Im doing pretty well with the extracting side of things but how to visualise the data isn't so straightforward. Should it be a network and ontology or just a chart and what is the best way to visualise these.