Bioinformatics tools: a blog

Bioinformatics is one of many multi-disciplinary fields in science. A bioinformatics lab obviously requires biological expertise but it also needs people with experience in data mining, analysis, software engineering, computer science, and advanced understanding of statistical methods [1].

Analysis expertise is particularly important in bioinformatics. Since the advent of high-throughput technology, data collection has become faster than ever. Now we are faced with a problem: there is so much data and not enough individuals with the skills to analyze it. This is referred to as a bioinformatics bottleneck, where data is generated much faster than it can be processed [2].

Photo credit: higyou

A bottleneck that occurs when only a few individuals are available to analyze the mass amounts of data that can be generated from high-throughput technology in bioinformatics.

Because of this bottleneck, the advancement of science is delayed. Biological analysis can be slow and only provide some information about a given experiment.

That’s where bioinformatics tools come in. The vast databases which have been built thanks to high-throughput technology are difficult to search. Tools within the BLAST family make database searching easier [3].

Software like MetaboAnalyst and Galaxy allow anyone to run statistical and machine learning models on raw data without running code.

Visualization and simulation tools are created to better understand interactions in different fields. For example, molecular modelling technology like Abalone, Amber, and FoldX allow scientists to simulate protein interactions [4] and the Molecular Evolutionary Genetics Analysis (MEGA) software helps people to build phylogenetic trees and perform statistical analysis of molecular evolution [5].

Visualization and simulation have always been of particular interest to me, because they add a degree of interpretability to raw data that anyone can understand at a glance. They make boring dataframes accessible to everyone, which is crucial for attracting interest and funding. For this reason, I have developed a visualization tool using data from The Environmental Agency.

The data was provided to my group during our ‘Seed of an Idea’ project at the University of Birmingham. It contains over 500 chemicals and their concentrations in different waterways around the UK for the last seventeen years.

We used this data to find correlation between chemical concentrations and fish populations, and ultimately hope to submit our findings to the EA to try to get some particularly dangerous chemicals banned from use.

While I was visualizing and pre-processing the data, I generated a series of interactive maps to view the data quickly. My supervisor suggested that I make a tool that could view the entire dataset – not just the area I focused on in my research.

The tool I created allows the user to swiftly see any information about the data they require, without writing one line of code. Using pandas and numpy libraries, I processed the data to ensure easy integration into the mapping software. I used Tableau to create an interactive map which modelled all the data.

First, the user clicks on the area of the map they want to view.

This will take them to a zoomed in area on the map, where by default, every chemical measurement taken throughout each year is selected.

From here, the user is able to filter chemicals by their use, using a drop-down menu in the top right.

They are also able to filter the points based on the years the measurements were taken, using a slider at the bottom of the map.

If you hover over a point on the map, it will show you more information about that point, including the chemical name, its concentration, the year it was measured, and the latitude and longitude it was taken.

Finally, the user is able to return home by clicking the button in the lower left, and navigate to a different area from there.

This tool can be embedded on any webpage, so it is available for use below.

Bioinformatics tools not only help scientists with visualization and simulation, but can also allow scientists who can’t code to perform statistical analysis and create machine learning models. Tools bridge gaps between the tech side and the biology side of bioinformatics, and they help to close the bottleneck by making analysis more accessible to everyone. This highlights the importance of software developers and data analysts within the field of bioinformatics.

Bioinformatics tools: a blog

Contact