Posts Tagged ‘Bioinformatics


Bio::Blogs #17 (Courtesy of Mr. Claus) is now available

The seventeenth edition of the premier bioinformatics blog carnival, Bio::Blogs is now available over at Paulo’s blog. Give it a read why don’t you!


(Oh, and the new Futurama movie, Bender’s Big Score is out now too! Oh happy day.)


More on data munging…

Nsaunders pointed out last week the sad trend of bioinformaticians spending more and more time parsing data into something useful and less time actually using the data.

Meta servers in particular pose both a problem and a potential solution to some of this. Since it is often helpful to query more than one server, meta servers, by relaying your query to a number of selected servers, provide a simple solution to avoid having to copy and paste sequences to each and every server you want to query. @Tome for example can take a single sequence and run structure or fold-prediction queries to six different servers simultaneously. MetaPP (although recently beginning to charge for most types of usage) can query an even larger number of servers with just a few clicks.

This is where the usefulness of meta servers ends however. Although they may collect
Your data and put it all in a single location for you, the data is still just as jumbled and heterogeneous as if you had gone to each server and run the queries yourself. Although you may have saved yourself five minutes of copying and pasting, you still have many hours of hackery ahead of yourself before you will be able to do anything useful with the results.

One obvious solution would be for servers to coordinate and come up with some standard format for how data that is similar in nature should be outputted. This isn’t going to happen anything soon. A more realistic solution might be to handle the formatting at the meta-server level. Rather than creating web applications that simply relay queries to multiple servers and return the results as a single html file, why not first parse those results and turn them into something more readable?

This wouldn’t be an easy task of course, and since, as nsaunders mentioned, most servers don’t provide any API, would mean more hacking around with cumbersome html parsing classes. This would at least save a lot of hours for all the people who run those queries. Rather than spending days writing scripts to parse the data, the bionformaticians could instead spend days maintaining the improved meta servers move onto more important tasks like interpreting results.


Get every new post delivered to your Inbox.