bioinformatics

Useful links

One of the persons I met last week in Portugal was Dora Lange Canhos, director of the Centro de Referência em Informação Ambiental, CRIA (Reference Center on Environmental Information). It is a Brazilian not-for-profit NGO aimed at a more sustainable use of biodiversity through the dissemination of high quality information.

One of the CRIA projects is
SpeciesLink, a distributed database that integrates information from biological collections. I mentioned it before. Currently restricted to Brazilian collections, it plans to widen its scope and hope to encompass all collections related to the Amazon Basin. Maybe, also the Suriname collection in Naturalis will be integrated in this system; this collection is currently being digitized.
When I did the ‘Bulimulidae check’ I found 225 records listed.
SpeciesLink1

There is a possibility to plot the data on Google Maps.

SpeciesLink2

CRIA also publishes two journals.
Biota Neotropica is an open access, tri-lingual journal on conservation and sustainable use of biodiversity. It is published online.
Checklist is a quarterly journal that publishes occurrence lists, geographic distribution maps and notes on the geographic distribution of taxa. It is also published online and open access.
So far, each of both journals has published only two molluscan papers (both on non-terrestrial snails). And although the relevance for Neotropical malacology of these journals is limited, I hope that in the future also papers on land snails will be published.

Scratchpad training

Taxonomy is a science with an enormous task and limited capacity. Estimates are 4-6000 taxonomists worldwide, 30-40000 amateurs that are involved in a professional way, plus an unknown number of citizens that are interested in natural history. Moreover there is a mismatch between the potential for web-based taxonomy and the technological and management resources available. Here comes in Scratchpads...

Scratchpads specifically enable community efforts and allow data to be uploaded and tagged in an intuitive way. The published data can be easily reviewed and maintained by the community, e.g. a group of collaborators.
This potential for sharing and building data made me decide to start using a Scratchpad. Hopefully, some of you will join me in this new way to participate in science.
See
here, for more details on Scratchpads and what they are all about.

Functionality that I personally like are modules for bibliography, image galleries, phylogenetic trees, character matrices (under development), distribution maps. Also functionality is available for web fora and newsletters (both with email integration) and user blogs.

After an intensive day of training I have updated the
Orthalicoidea site at several points. Have a look yourself!

Scratchpad1
Scratchpad2

Needless to say that this will always be a work in progress, evolving slowly, but hopefully steadily...

Reference:
Sarkar, I.N., 2009
. Biodiversity Informatics: the emergence of a field. BMC Bioinformatics 10 (Suppl. 14): S1



New phylogenetic data

While there is an ongoing debate the role of bioinformatics and the species concept (see here and below), the number of sequences of Neotropical land snails in GenBank grows slowly. E.g. available for Orthalicoidea are 159 nucleotide sequences of 48 species and 105 protein sequences of 38 species. There remains some work to be done!

Recently, a paper was published by a Peruvian group (Ramirez et al., 2009). It describes an analysis of three species of the Orthalicidae,
Bostryx scalariformis, B. sordidus and Scutalus versicolor, based on the 16S rRNA mitochondrial marker. Currently, only similar data are available on Placostylus bivaricosus. So, this paper is very useful in adding species of two more genera.

Phylogeny 16S_Ramirez

Both the tree in Neighbour-Joining (left) as Maximum Parsimony (right), and the analyses using Maximum Likelihood and Bayesian Inference (not shown here), show the orthalicoid genera as a closely related group.

Most data in GenBank are on other regions of the gene, notably CO1. Barcoding “species” becomes ‘en vogue’, but on the Taxacom list the following remark by Bob Mesibov was noted:

“I urge Taxacomers to read Roger Hyam's blog
(
http://www.hyam.net/blog/archives/598) in full, but here's an
interesting chunk:

"Up to now the assumption has been that we are discovering taxa in
nature and then attempting to describe them. It is undoubtedly true that
taxa do exist in nature. However, in order to construct a usable map of
biodiversity, we need to turn this on its head. It is the act of minting
an identifier and linking it to a circumscription that creates the
taxon. We then discover which specimens in the wild fit into this taxon.
Philosophically this his how we act anyway (see Identifiers, Identity
and Me). Taxa are currently hypotheses (things we invent) that may break
down as our knowledge grows."

Much of the Taxacom discussion so far has been about species
identification, because species identification is what barcoding
promises. But Hyam says 'taxon'. Re-read the paragraph above
substituting 'genus' or 'family' for 'taxon'. Still OK? (That is, if you
thought the paragraph was OK when 'taxon' meant 'species'.) Note also
that barcodes could also theoretically be used to predefine taxa higher
than species, by relaxing the sequence requirements in ways indicated by
species sampling within the higher taxon.

Now, what strikes me as strange and wonderful is that OTTH I'm perfectly
happy with Hyam's approach when thinking about genera and families,
which are constructs with a lower-grade 'existence in nature' than
species. In fact, this is how I think genera and families get built into
classifications, traditionally. It's certainly how I go about erecting
new genera for my beloved millipedes

But OTOH, Hyam's approach just doesn't click with me when I think about
circumscribing new species. Not already recognised species, of the kind
we identify a la the Taxacom discussion, but previously unrecognised
species. Like, most of the world's species?

If I read Hyam correctly, his circumscription of new species, just like
that of old species, is by means of a barcode. Quick, simple and
unambiguous (caveats, caveats), this approach *replaces* morphospecies
with barcodes. The option of linking Hyam's Barcode Taxa to
morphospecies data (with keys, diagnoses, images, etc) is just that, an
option - to create 'secondary taxonomic products' (Hyam's phrase) or
not.

So you could produce a 'map of biodiversity' by barcoding madly on a
field trip and recognising - excuse me, defining - heaps of new species.
Think of that as step 1. Steps 2, 3, etc would be learning the answers
to questions like 'How big is it?', 'What life stage?', 'Male or
female?', 'Associated with what [plant/animal]?'. Lotta work there, but
that would certainly make the 'map of biodiversity' more usable. Take
biological control, for example. Don't know how far I could get with
'GenBank RQ561336 a possible parasite of GenBank AE699133', but it would
be a real comfort to know that these entities had been rigorously
circumscribed right from the beginning.”

Personally, I prefer a sound morphological hypothesis to start with. Any barcoding may then falsify or corroborating the hypothesis. Not vice-versa.
Afbeelding 1 09-12-18
Reference:
Ramirez, J., Ramirez, R., Romero, P., Chumbe, A. & Ramirez, P., 2009. Posición evolutiva de caracoles terrestres peruanos (Orthalicidae) entre los Stylommatophora (Mollusca: Gastropoda). - Revista Peruana de Biologia 16: 51-56.


LifeDesks

Yesterday I created a new LifeDesk on Neotropical snails.

LifeDesks1
The site provides tools for classification, taxon pages, bibliography and image galleries.
Potentially this is a great tool and I hope that, in the end, it may contain all information that gives a relevant and accurate picture of this group. At least, I will start to supply data for the Orthalicidae. With one taxon page and one bibliographic item put up, the start has been made. Until now, I was unable to upload an image that I wanted to complement the taxon page.

The first potential improvement I noticed is a link to other sites, like e.g.
MorphBank, that gather partially the same information (images, bibliography). It is a nuisance to do double work and to go through different learning curves; each site has its own way of navigating and managing. Inevitable, but tedious. Integration by linking should be the direction to move forward.

This is part of the ongoing
Encyclopedia of Life project, aiming at making taxonomy available to anyone at a click of your keybord. However, to make this authoritative one has to rely on the few experts that are available. But also non-experts may contribute, albeit the tools are not in place yet for making direct contributions.

If you feel you can make a useful contribution to document the biodiversity of Neotropical snails, please become a member of the team. You are more than welcome!

Phylogeographer

Another piece of software, that looks potentially useful. Phylogeographer is designed to test phylogeographic hypotheses, allowing the hypotheses to be converted into distance matrices. These can be used to calculate correlations between various hypotheses and genetic distance matrices. This way dispersal routes can be explored with a graphical interface.
The (condensed) information on the homepage suggests that this piece of software is relatively easy to operate, once you stick to some basic requirements (formats). One of the big advantages is that it runs under Java, so platform-independent.
Phylogeographic

Later this year I hope to have enough data for a further exploration. So this topic might return.

Online databases

Undoubtedly a great help and I wished there were even more of them, but at the same time it has to be recognized that the content cannot always be trusted. It is the same old story: GIGO, garbage in, garbage out.

The most logical start would be GBIF. I have seen some quite good results when I consulted their Australian data on Bulimulidae, but when I searched for data on South American species the results were poor. So far, databases from individual museums are for me a better choice. Here is a list of the databases that I have consulted quite frequently during my current research.

EU. There are several databases, of which the Senckenberg, Berlin and Brussels museums are the most promising. Yet they all ask me to come over to study their collection and and to look for what I need. To travel or to loan, that's an other interesting topic...

USA. For me the most informative database is those of the Florida Museum (chapeau Fred and John!). Also very useful are the Field Museum and Philadelphia museum databases. Furthermore I have consulted the Smithsonian and Harvard museum databases. It is always a good start, but don't forget to contact the curator to get the full details

Brazil. The list of institutions that participate in the speciesLink project is impressive. However, there are currently only three malacological collections accessible (although I regularly encountered server errors): INPA-Mollusca (Manáus, Amazonas), UFES-Malacologia (Vitória, Espírito Santo) and ZUEC-GAS (Unversidade de UNICAMP, Campinas, São Paulo). After several trials I ended up with a partial list of results only. UFES was the only database to respond, which may be due to the distributed set up of the project. Still a start that can only improve over time.

Finally, I like to mention here a recent paper by Neale et al.* on some principles for usability. They highlight the need for end-user involvement in the development of online databases. I cannot stress the importance of this point enough!

Reference:
Neale, S. H., M.R. Pullan & M.F. Watson. (2007). Online biodiversity resources - principles for usability. Biodiversity Informatics, 4, 27-36.

Blessing or disguise?

Information technology penetrates our lives deeper and deeper. Some are warning for the privacy crisis that is becoming manifest. So far in biology IT has been more a blessing in disguise, opening up tremendous opportunities in different field and making us all far more productive.
However, the cyber era is coming rapidly and invading taxonomy now in earnest. The opening of the
ZooBank is the latest milestone in cybertaxonomy, requiring all new names and papers to be registered in their official registry.

In Zootaxa recently the first
paper appeared that described new species registered in ZooBank and also made use of several other modern bioinformatic tools, like MorphBank, links to online collection databases, GBIF and GenBank. References are also registered in ZooBank and some are available through the Biodiversity Heritage Library. The paper also uses descriptive data standards (making use of XML, but several standards seem to be around!) and this part of the bioinformatica certainly does not make me happy. I think it will be laborious and time-consuming to get everything into prescribed formats, databases, etc. before even a page can be published.
Progress comes at a cost!

EOL: an ecosystem of websites

Hot from the press: the Encyclopedia of Life. A new portal pretending to have (in the end) a webpage for each species described. It was publicized in my newspaper on their science page.
NRC-20080228-01011009

Let's explore...The site is not very responsive (which might be due to the overwhelming number of visits after the publication, possibly a good sign) and contains not much information yet (you have to start with something). Looking at the Gastropoda, only quite number of familiar names turn up: Helix pomatia, Cepaea nemoralis and Arion subfuscus. All common European snails. The pictures are taken from a Czech biological encyclopedia (online), text and graphic show the place in a tree of life (interactive) and the source gives the origin of the classification.
My verdict: surely an endeavour that needs support, but also one that has a long, long way to go... If anyone is interested, I volunteer for the Orthalicidae :-)

Graphing populations

One of the topics that I have been wrestling with since quite some time, is how to deal with statistics and graphing of populations (and other data as well). Today I may have found a solution that will work for me.

Especially when dealing with different populations of a species, one might consider taking various measurements with a marking gauge and calculate some ratios. Unlike the 'old days', there are now light-weight digital gauges (I bought one earlier this week), that make measuring a shell relatively a snap. And instead of jotting down the data with pencil on a sheet of paper, it's much more convenient now to fill in a worksheet of Excel (or what I prefer, Numbers). I like this part of my life to be digitally.
But to do some statistics on them and to make graphs that show to what extent populations differ from each other was long time problematic for me. Unable to afford a de-facto standard like SPSS (not to speak of the ability to handle it), I have been looking for a nifty piece of software that will run on my Mac. Until I found today
StatCrunch, a web-based tool that fitted my needs: it is nearly for free (US$ 5/half a year), can handle different inputs (file, copy-paste, URL), does both calculations and graphing, is very flexible in its outputs and stores your results for easy retrieving afterwards.

I had a very suitable problem for a test run today. Thaumastus alutaceus (Reeve) is a rather small Bulimulid from Central Peru, where I had 4 different populations, ranging from 2950-3600m:
Afbeelding 20
When I took the measurements, it was immediately clear that one population was different. But I was unsure how different it was, until I tried some of the options in StatCrunch. Here is one of the results I obtained:
Afbeelding 10
I don't know yet if this output is acceptable for printing, but surely it is good enough for presentations. Anyhow, what was interesting that one population (3) clearly differentiates from the others (in some other ratios as well).
Despite the fact that StatCrunch does not accept input from Numbers, only from Excel (and for me Microsoft sucks), it gets a 5-star ranking!