BIOINFORMATICS 2011 Abstracts


Full Papers
Paper Nr: 14
Title:

DISULFIDE CONNECTIVITY PREDICTION WITH EXTREME LEARNING MACHINES

Authors:

Monther Alhamdoosh, Castrense Savojardo, Piero Fariselli and Rita Casadio

Abstract: Our paper emphasizes the relevance of Extreme Learning Machine (ELM) in Bioinformatics applications by addressing the problem of predicting the disulfide connectivity from protein sequences. We test different activation functions of the hidden neurons and we show that for the task at hand the Radial Basis Functions are the best performing. We also show that the ELM approach performs better than the Back Propagation learning algorithm both in terms of generalization accuracy and running time. Moreover, we find that for the problem of the prediction of the disulfide connectivity it is possible to increase the predicting performance by initializing the Radial Basis Function kernels with a k-mean clustering algorithm. Finally, the ELM procedure is not only very fast but the final predicting networks can achieve an accuracy of 0.51 and 0.45, per-bonds and per-pattern, respectively. Our ELM results are in line with the state of the art predictors addressing the same problem.
Download

Paper Nr: 18
Title:

MORPHOLOGICAL ANALYSIS OF 3D PROTEINS STRUCTURE

Authors:

Virginio Cantoni, Riccardo Gatti and Luca Lombardi

Abstract: The study of the 3D structure of proteins supports the investigation of their functions and represents an initial step towards protein based drug design. The goal of this paper is to define a technique, based on the geometrical and topological structure of protein surfaces, for the detection and the analysis of sites of possible protein-protein and protein-ligand interactions. In particular, the aims is to identify concave and convex regions which constitute ‘pockets’ and ‘protuberance’ that can make up the interactions ‘active sites’. A segmentation process is applied to the solvent-excluded-surface (SES) through a sequence of propagation steps applied to the region between the protein convex-hull and the SES: the first phase generates the pockets (and tunnels) set, meanwhile the second (backwards) produces the protrusions set.
Download

Paper Nr: 19
Title:

AUTOMATED COUNTING OF YEAST COLONIES USING THE FAST RADIAL TRANSFORM ALGORITHM

Authors:

Jan Schier and Bohumil Kovář

Abstract: A method for counting yeast colonies in images of Petri dishes, based on the fast radial transform by Loy and Zelinsky, is introduced and evaluated in the paper. The characteristic properties of the images of yeast colonies, as produced by the setup used in the cooperating genetics laboratory, are described, the underlying counting algorithm is reviewed and the performance of the method is evaluated, using a test set of 245 images. The images, included in this set, typically contained between 10 and 70 colonies per dish, with relative coverage of the diosh less than 10% of the area. The average counting error (missed colonies) on this set was under 4%. A tool, implementing the method, has been developed in Matlab. The motivation to develop our own colony counting program, instead of using a commercial solution, was to reuse the imaging equipment already existing in the laboratory and to adapt the tool to the needs of this laboratory. The tool provides a batch mode for processing of larger image sets prepared beforehand in the darkroom and it automatizes the process of counting as much as possible. It is available for download or can requested from the authors of the paper, in both cases free on charge.
Download

Paper Nr: 26
Title:

MINIMUM MUTATION ALGORITHM FOR GAPLESS METABOLIC NETWORK EVOLUTION

Authors:

Esa Pitkänen, Juho Rousu and Mikko Arvas

Abstract: We present a method for inferring the structure of ancestral metabolic networks directly from the networks of observed species and their phylogenetic tree. Our method aims to minimize the number of mutations on the phylogenetic tree, whilst keeping the ancestral networks structurally feasible, i.e., free of reaction gaps. To this end, we present a parsimony-based method that generates metabolic network phylogenies where the ancestral nodes are required to represent gapless metabolic networks, networks where all reactions are reachable from external substrates. In particular, we introduce the gapless minimum mutation problem: finding phylogenies of gapless metabolic networks when the topology of the phylogenetic tree is given, but the content of ancestral nodes is unknown. The gapless minimum mutation problem is shown to be computationally hard to solve even approximatively. We then propose an efficient dynamic programming based heuristic that combines knowledge on both the metabolic network topology and phylogeny of species. Specifically, the reconstruction of each ancestral network is guided by the heuristic to minimize the total phylogeny cost. We experiment by reconstructing phylogenies generated under a simple random model and derived from KEGG for a number of fungal species.
Download

Paper Nr: 32
Title:

A MATHEMATICAL MODEL FOR THE ENHANCED CYTOPLASMIC TRANSPORT - How to Get (Faster) to the Nucleus

Authors:

Luna Dimitrio, Roberto Natalini and Luciano Milanesi

Abstract: We consider a simple model for signal transport in the cytoplasm. Following some recent experimental evidences, the standard diffusion model is supplemented by advection operated through an attachement/detachement mechanism along microtubules. This model is given by a system of partial differential equations which are cast in different dimensions and connected by suitable exchange rules. A numerical scheme is introduced and some simulations are presented and discussed to show the performances of our model.
Download

Paper Nr: 70
Title:

KINETIC MODELS AND QUALITATIVE ABSTRACTION FOR RELATIONAL LEARNING IN SYSTEMS BIOLOGY

Authors:

Gabriel Synnaeve, Katsumi Inoue, Andrei Doncescu, Hidetomo Nabeshima, Yoshitaka Kameya, Masakazu Ishihata and Taisuke Sato

Abstract: This paper presents a method for enabling the relational learning or inductive logic programming (ILP) framework to deal with quantitative information from experimental data in systems biology. The study of systems biology through ILP aims at improving the understanding of the physiological state of the cell and the interpretation of the interactions between metabolites and signaling networks. A logical model of the glycolysis and pentose phosphate pathways of E. Coli is proposed to support our method description. We explain our original approach to building a symbolic model applied to kinetics based on Michaelis-Menten equation, starting with the discretization of the changes in concentration of some of the metabolites over time into relevant levels. We can then use them in our ILP-based model. Logical formulae on concentrations of some metabolites, which could not be measured during the dynamic state, are produced through logical abduction. Finally, as this results in a large number of hypotheses, they are ranked with an expectation maximization algorithm working on binary decision diagrams.
Download

Paper Nr: 74
Title:

BISTABILITY AND THE COMPLEX DEPLETION PARADOX IN THE DOUBLE PHOSPHORYLATION-DEPHOSPHORYLATION CYCLE

Authors:

Guido Dell'Acqua and Alberto Bersani

Abstract: In this paper we discuss the applicability of the standard quasi steady-state approximation (sQSSA) to complex enzyme reaction networks, like the ones involved in intracellular signal transduction. In particular we focus on the dynamics of the intermediate complexes, which in common literature either are ignored or are supposed to rapidly become negligible in the quasi steady-state phase, differently from what really happens. This brings to what we call ”complex depletion paradox”, according to which complexes disappear in the conservation laws, in contrast with the equations of their dynamics. Applying the total quasi steady-state approximation (tQSSA) to the double phosphorylation-dephosphorylation cycle, we show how to solve the apparent paradox, without the need of further hypotheses, like, for example, the substrate sequestration.
Download

Paper Nr: 76
Title:

A NEW LATENT SEMANTIC ANALYSIS BASED METHODOLOGY FOR KNOWLEDGE EXTRACTION FROM BIOMEDICAL LITERATURE AND BIOLOGICAL PATHWAYS DATABASES

Authors:

F. Abate, A. Acquaviva, E. Ficarra and E. Macii

Abstract: Nowadays, a considerable amount of genetic and biomedical studies are mostly diffused on theWeb and freely available. This exciting capability, if from one side opens the way to new scenarios of cooperating research, on the other side makes the knowledge retrieval and extraction an extremely time consuming operation. In this context, the development of new tools and algorithms to automatically support the scientist activity to achieve a reliable interpretation of the complex interactions among biological entities is mandatory. In this paper we present a new methodology aimed at quantifying the biological degree of correlation among biomedical terms present in literature. The proposed method overcomes the limitation of current tools based on public literature information only, by exploiting the trustworthy information provided by biological pathways databases. We demonstrate how to integrate trusted pathway information in a semantic correlation extraction chain based on UMLS Metathesaurus and relying on PubMed as literature database. The effectiveness of the obtained results remarks the importance of automatically quantifying the degree of correlation among biomedical terms in order to helpfully support the scientist research activity.
Download

Paper Nr: 81
Title:

PREDICTION OF REGULATORY sRNAs IN PROKARYOTES USING MACHINE LEARNING TOOLS

Authors:

Nael Abu-halaweh, Amit Sabnis and Robert Harrison

Abstract: Small RNAs (sRNAs) are vital components prokaryotic regulatory processes. Many sRNAs exert their function via complimentary base pairing with target mRNAs. Existing methods for computational identification of these molecules rely heavily on sequence and secondary structure conservation among known homologs. Machine learning tools, on the other hand, have already been proven to be effective de novo predictors for biomolecules; however, their utility in predicting sRNAs has been severely limited. In this work, we have utilized the sRNA sequence and putative secondary structure information to extract a set of features and tested them for several machine learning classification tools. Our results indicate that these features can be effectively used to distinguish the sRNA sequences from their decoys with very high accuracy, sensitivity and specificity as compared to existing methods. In addition, we have shown that these features can also be used for de novo prediction of sRNAs from unknown genomic background. The use of machine learning can thus provide a viable medium for accurate computational identification of sRNAs.

Paper Nr: 84
Title:

GAST, A GENOMIC ALIGNMENT SEARCH TOOL

Authors:

Kalle Karhu, Juho Mäkinen, Jussi Rautio, Jorma Tarhio and Hugh Salamon

Abstract: Alignment to a genomic sequence is a common task in modern bioinformatics. By improving the methods used, significant amount of time and resources can be saved. We have developed a new genomic alignment search tool, called GAST, for sequences of at least 160 nt. GAST is many times faster than commonly used alignment tools BLAT and Mega BLAST. As the sizes of query sequences and the database increase, the advantage grows. This paper describes the principles of GAST and reports a comparison of GAST with BLAT and Mega BLAST. The effects the query sequence length and the number of queries have on run times were studied using the full human genome and the chromosome 1 of human genome separately. Additionally, the error tolerance and behaviour of GAST when handling sequences with lower similarity to a database was studied. Lastly, we compared the quality of exon mappings produced by the three tools and the genomic mapping tool GMAP.
Download

Short Papers
Paper Nr: 7
Title:

MODELING CELL PROLIFERATION ACTIVITY OF HUMAN INTERLEUKIN-3 UPON SINGLE RESIDUE REPLACEMENTS

Authors:

Majid Masso and Iosif I. Vaisman

Abstract: The signaling molecule human interleukin-3 (IL-3) is responsible for promoting the growth of a wide range of hematopoietic cell lineages in the bone marrow. In this study, we apply an in silico mutagenesis technique to investigate the effects of single amino acid substitutions in the IL-3 protein on cell proliferation activity. The computational mutagenesis, which utilizes the IL-3 protein structure as well as a knowledge-based, four-body statistical potential, empirically quantifies environmental perturbations at the mutated residue position in IL-3 and at all neighboring positions in the folded structure. In particular, mutated position perturbation scores alone are capable of characterizing IL-3 residues grouped by physicochemical, functional, or structural properties. Additionally, these scores elucidate an IL-3 structure–function relationship based on a collection of 630 single residue replacements for which activity changes were experimentally measured. A random forest classifier trained on this dataset of experimental mutants, whose respective feature vectors include environmental changes at the mutated position and at six nearest neighbors in the IL-3 structure, achieves 80% accuracy and outperforms related state-of-the-art methods.
Download

Paper Nr: 15
Title:

ACCURATE LONG READ MAPPING USING ENHANCED SUFFIX ARRAYS

Authors:

Michaël Vyverman, Joachim De Schrijver, Wim Van Criekinge, Peter Dawyndt and Veerle Fack

Abstract: With the rise of high throughput sequencing, new programs have been developed for dealing with the alignment of a huge amount of short read data to reference genomes. Recent developments in sequencing technology allow longer reads, but the mappers for short reads are not suited for reads of several hundreds of base pairs. We propose an algorithm for mapping longer reads, which is based on chaining maximal exact matches and uses heuristics and the Needleman-Wunsch algorithm to bridge the gaps. To compute maximal exact matches we use a specialized index structure, called enhanced suffix array. The proposed algorithm is very accurate and can handle large reads with mutations and long insertions and deletions.
Download

Paper Nr: 20
Title:

METABOLIC MODELING OF CONVERGING METABOLIC PATHWAYS - Analysis of Non-steady State Stable Isotope-resolve Metabolism of UDP-GlcNAc and UDP-GalNAc

Authors:

Hunter N. B. Moseley, Richard M. Higashi, Teresa W-M. Fan and Andrew N. Lane

Abstract: We have developed a novel metabolic modeling methodology that traces the flow of functional moieties (chemical substructures) through metabolic pathways via the deconvolution of mass isotopologue data of specific metabolites. We have implemented a general simulated annealing/genetic algorithm for parameter optimization called Genetic Algorithm for Isotopologues in Metabolic Systems (GAIMS), with a model selection method developed from the Akaike information criterion. GAIMS is tailored for analysis of ultra-high resolution, high mass-accuracy isotopologue data from Fourier transform-ion cyclotron resonance mass spectrometry (FT-ICR-MS) for interpretation of non-steady state stable isotope-resolved metabolomics (SIRM) experiments. We applied GAIMS to a time-course of uridine diphospho-N-acetylglucosamine (UDP-GlcNAc) and uridine diphospho-N-acetylgalactosamine (UDP-GalNAc) isotopologue data obtained from LNCaP-LN3 prostate cancer cells grown in [U-13C]-glucose. The best metabolic model was identified, which revealed the relative contribution of specific metabolic pathways to 13C incorporation from glucose into individual functional moieties of UDP-GlcNAc and UDP-GalNAc. Furthermore, this analysis allows direct comparison of MS isotopologue data with NMR positional isotopomer data for independent experimental cross-verification.
Download

Paper Nr: 31
Title:

ACCURATE LATENCY CHARACTERIZATION FOR VERY LARGE ASYNCHRONOUS SPIKING NEURAL NETWORKS

Authors:

Mario Salerno, Gianluca Susi and Alessandro Cristini

Abstract: The simulation problem of very large fully asynchronous Spiking Neural Networks is considered in this paper. To this purpose, a preliminary accurate analysis of the latency time is made, applying classical modelling methods to single neurons. The latency characterization is then used to propose a simplified model, able to simulate large neural networks. On this basis, networks, with up to 100,000 neurons for more than 100,000 spikes, can be simulated in a quite short time with a simple MATLAB program. Plasticity algorithms are also applied to emulate interesting global effects as the Neuronal Group Selection.
Download

Paper Nr: 34
Title:

BIODBLINK: MULTI-LEVEL DATA MATCHING FOR AUTOMATIC GENERATION OF CROSS LINKS AMONG BIOCHEMICAL PATHWAY DATABASES

Authors:

Jyh-Jong Tsay, Bo-Liang Wu and Hou-Ji Dai

Abstract: Most of biological databases provide cross links that point to data records describing the same object in other databases. However, as more and more databases are available, manually creating and maintaining cross links becomes very time consuming, if not impossible. Existing databases provide only a small portion of all possible links. In this paper, we present a database cross link server BioDBLink that can automatically collect and generate cross links among biological databases. The core of BioDBLink is a data matching technique that identifies and matches data records or elements describing the same object among pathway databases. Experiment on a data set collected from several pathway, enzyme and compound databases shows that our approach is able to identify most of the cross links provided by current databases, discover a large number of missing links, and detect inconsistency and duplicate errors.
Download

Paper Nr: 35
Title:

INFERRING MOBILE ELEMENTS IN S. CEREVISIAE STRAINS

Authors:

Giulia Menconi, Giovanni Battaglia, Roberto Grossi, Nadia Pisanti and Roberto Marangoni

Abstract: We aim at finding all the mobile elements in a genome and understanding their dynamic behavior. Comparative genomics of closely related organisms can provide the data for this kind of investigation. The comparison task requires a huge amount of computational resources, which in our approach we alleviate by exploiting the high similarity between homologous chromosomes of different strains of the same species. Our case study is for RefSeq and two other strains of S. cerevisiæ. Our fast algorithm, called REGENDER, is driven by data analysis. We found that almost all the chromosomes are composed by resident genome (more than 90% is conserved). Most importantly, the inspection of the non-conserved regions revealed that these are putative mobile elements, thus confirming that our method is useful to quickly find mobile elements. The software tool REGENDER is available online at http://www.di.unipi.it/gbattag/regender.
Download

Paper Nr: 39
Title:

APPLYING CONCEPTUAL MODELING TO ALIGNMENT TOOLS ONE STEP TOWARDS THE AUTOMATION OF DNA SEQUENCE ANALYSIS

Authors:

Maria José Villanueva, Francisco Valverde and Oscar Pastor

Abstract: Nowadays, the search of variations in DNA samples according to a reference sequence is performed using several bioinformatic tools. Due to the process complexity, none of these tools fulfill all the functionality required by biologists. For that reason, the definition of an integration process between these different tools becomes a mandatory requirement. One interesting issue is that bioinformatic tools do not comply with any standard format for expressing the output reports. As a consequence, the flow among tools must be manually solved. This paper proposes a conceptual model in order to formalize how the output from alignment tools must be produced. This work also provides a textual format based on this conceptual model. Thanks to both contributions, the integration is handled in the problem space and the related technological details are avoided. As a proof of concept of these ideas, the proposed format has been applied in a DNA sequence analysis process which uses two bioinformatic tools.
Download

Paper Nr: 41
Title:

SalamboMiner - A Biomedical Literature Mining Tool for Inferring the Genetics of Complex Diseases

Authors:

Leonor Rib, Ricard Gavaldà, Jose Manuel Soria and Alfonso Buil

Abstract: In the Era of Information researchers have utilized the Web to make their knowledge readily available. The Web is an important tool to improve the communication in the research community. But, the large amounts of information available makes it difficult to access the information that is needed. We present SalamboMiner, a Text-Mining tool that helps biomedical researchers to obtain the information about the genetics of complex diseases which is in the published biomedical literature. The methodology is based in the idea of co-citation: the co-citation of two concepts gives the significance of the relationship between the pair of concepts. In addition, the co-citation allows to infer new relationships that are not explicitly said in the literature. By using a Bayesian network, we infer the significant relationships between those concepts that are co-cited in two steps.
Download

Paper Nr: 46
Title:

SUBSET SEED EXTENSION TO PROTEIN BLAST

Authors:

Anna Gambin, Sławomir Lasota, Michał Startek, Maciej Sykulski, Laurent Noé and Gregory Kucherov

Abstract: The seeding technique became central in the theory of sequence alignment and there are several efficient tools applying seeds to DNA homology search. Recently, a concept of subset seeds has been proposed for similarity search in protein sequences. We experimentally evaluate the applicability of subset seeds to protein homology search. We advocate the use of multiple subset seeds derived from a hierarchical tree of amino acid residues. Our method computes, by an evolutionary algorithm, seeds that are specifically designed for a given protein family. The representation of seeds by deterministic finite automata (DFAs) is developed and built into the NCBI-BLAST software. This extended tool, named SeedBLAST, is compared to the original NCBI-BLAST on the GPCR protein family. Our results demonstrate a clear superiority of SeedBLAST in terms of efficiency, especially in the case of twilight zone hits. SeedBLAST is an open source software freely available http://bioputer.mimuw.edu.pl/papers/sblast. Supplementary material and user manual are also provided.
Download

Paper Nr: 51
Title:

IMPROVED BREAST CANCER PROGNOSIS BASED ON A HYBRID MARKER SELECTION APPROACH

Authors:

L. Hedjazi, M.-V. Le Lann, T. Kempowsky-Hamon, F. Dalenc and G. Favre

Abstract: Clinical factors, such as patient age and histo-pathological state, are still the basis of day-to-day decision for cancer management. However, with the high throughput technology, gene expression profiling and proteomic sequences have known recently a widespread use for cancer and other diseases management. We aim through this work to assess the importance of using both types of data to improve the breast cancer prognosis. Nevertheless, two challenges are faced for the integration of both types of information: high-dimensionality and heterogeneity of data. The first challenge is due to the presence of a large amount of irrelevant genes in microarray data whereas the second is related to the presence of mixed-type data (quantitative, qualitative and interval) in the clinical data. In this paper, an efficient fuzzy feature selection algorithm is used to alleviate simultaneously both challenges. The obtained results prove the effectiveness of the proposed approach.
Download

Paper Nr: 52
Title:

ON VACCINATION CONTROLS FOR THE SEIR EPIDEMIC MODEL WITH SUSCEPTIBLE PLUS IMMUNE POPULATIONS TRACKING THE WHOLE POPULATION

Authors:

M. De la Sen, S. Alonso-Quesada and A. Ibeas

Abstract: This paper presents a simple continuous-time linear vaccination-based control strategy for a SEIR (susceptible plus infected plus infectious plus removed populations) propagation disease model. The model takes into account the total population amounts as a refrain for the illness transmission since its increase makes more difficult contacts among susceptible and infected. The control objective is the asymptotically tracking the joint susceptible plus the removed-by-immunity population to the total population while achieving simultaneously the remaining population (i.e. infected plus infectious) to asymptotically tend to zero.
Download

Paper Nr: 53
Title:

ROBUSTNESS OF EXON CGH ARRAY DESIGNS

Authors:

Tomasz Gambin, Pawel Stankiewicz, Maciej Sykulski and Anna Gambin

Abstract: Array-comparative genomic hybridization (aCGH) technology enables rapid, high-resolution analysis of genomic rearrangements. With the use of it, genome copy number changes and rearrangement breakpoints can be detected and analyzed at resolutions down to a few kilobases. An exon array CGH approach proposed recently accurately measures copy-number changes of individual exons in the human genome. The crucial and highly non-trivial starting task is the design of an array, i.e. the choice of appropriate (multi)set of oligos. The success of the whole high-level analysis depends on the quality of the design. Also, the comparison of several alternative designs of array CGH constitutes an important step in development of new diagnostic chip. In this paper we deal with these two often neglected issues. We propose new approach to measure the quality of array CGH designs. Our measures reflect the robustness of rearrangements detection to the noise (mostly experimental measurement error). The method is parametrized by the segmentation algorithm used to identify aberrations. We implemented the efficient Monte Carlo method for testing noise robustness within DNAcopy procedure. Developed framework has been applied to evaluation of functional quality of several optimized array designs.
Download

Paper Nr: 55
Title:

A COMPUTATIONAL STRATEGY TO INVESTIGATE RELEVANT SIMILARITIES BETWEEN VIRUS AND HUMAN PROTEINS - Local High Similarities between Herpes and Human Proteins

Authors:

Anna Marabotti, Corrado Cirielli, Daniela Agnese D’Arcangelo, Claudia Giampietri, Francesco Facchiano, Antonio Facchiano and Angelo M. Facchiano

Abstract: Investigating primary sequence and structural features of viral proteins/genes has revealed molecular mimicry and evolutionary relationship linking viruses to eukaryotes. The continuous improvement in sequencing-techniques makes available almost daily the whole genome/proteome of several microorganisms, making now possible systematic analyses of evolutionary correlations and accurate phylogeny investigations. In the present study we set up a methodology to identify significant and relevant similarities between viral and human proteomes. To this aim, the following steps were applied: i) identification of local similarity corresponding to continuous identity over at least 8-residues long fragments; ii) filtering results for statistical significance of the identified similarities, according to BLAST parameters for short sequences; iii) additional filters applied to the BLAST outputs, to select specific viruses. The present study indicates a novel accurate methodology to find relevant similarities among virus and human proteomes, useful to further investigate pathogenic mechanisms underlying infectious and non-infectious diseases.
Download

Paper Nr: 63
Title:

SINGULAR VALUE DECOMPOSITION (SVD) AND BLAST - Quite Different Methods Achieving Similar Results

Authors:

Bráulio Roberto Gonçalves Marinho Couto, Macelo Matos Santoro and Marcos Augusto dos Santos

Abstract: The dominant methods to search for relevant patterns in protein sequences are based on character-by-character matching, performed by software known as BLAST. In this paper, sequences are recoded as p-peptide frequency matrix that is reduced by singular value decomposition (SVD). The objective is to evaluate the association between statistics used by BLAST and similarity metrics used by SVD (Euclidean distance and cosine). We chose BLAST as a standard because this string-matching program is widely used for nucleotide searching and protein databases. Three datasets were used: mitochondrial-gene sequences, non-identical PDB sequences and a Swiss-Prot protein collection. We built scatter graphs and calculated Spearman correlation () with metrics produced by BLAST and SVD. Euclidean distance was negatively correlated with bit score (>-0.6) and positively correlated with E value (>+0.7). Cosine had negative correlation with E value (>-0.7) and positive correlation with bit score (>+0.8). Besides, we made agreement tests between SVD and BLAST in classifying protein families. For the mitochondrial gene database, we achieved a kappa coefficient of 1.0. For the Swiss-Prot sample there is an agreement higher than 80%. The fact that SVD has a strong correlation to BLAST results may represent a possible core technique within a broader algorithm.
Download

Paper Nr: 66
Title:

PadeNA: A PARALLEL DE NOVO ASSEMBLER

Authors:

Gaurav Thareja, Vivek Kumar, Mike Zyskowski, Simon Mercer and Bob Davidson

Abstract: Recent technological advances in DNA sequencing technology are resulting in ever-larger quantities of sequence information being made available to an increasingly broad segment of the scientific and clinical community. This is in turn driving the need for standard, rapid and easy to use tools for genomic reconstruction and analysis. As a step towards addressing this challenge, we present PadeNA (Parallel de Novo Assembler), a parallelized DNA sequence assembler with a graphical user interface. PadeNA is designed using interface-driven architecture to facilitate code reusability and extensibility, and is provided as part of the open source Microsoft Biology Foundation. Installers and documentation are available at http://research.microsoft.com/bio/.
Download

Paper Nr: 69
Title:

GEOMETRICAL CONSTRAINTS FOR LIGAND POSITIONING

Authors:

Virginio Cantoni, Alessandro Gaggia, Riccardo Gatti and Luca Lombardi

Abstract: The purpose of the activity here described is the morphological and subsequently the geometrical and topological analysis of the active sites in protein surfaces for protein-ligand docking. The approach follows a sequence of three steps: i) the solvent-excluded-surface is analyzed and segmented in a number of pockets and tunnels; ii) the candidate binding sites are detected through a structural matching of pockets and ligand, both represented through a suitable Extended Gaussian Image modality; iii) the loci of compatible positions of the ligand is identified through mathematical morphology. This representation of ligand and candidate binding pockets, the comparison of the morphological similarity and the identification of potential ligand docking are the novelties of this proposal.
Download

Paper Nr: 71
Title:

TOWARDS FAST 3D NANOPARTICLE LOCALIZATION FOR STUDYING MOLECULAR DYNAMICS IN LIVING CELLS

Authors:

Stefan Sokoll, Klaus Tönnies and Martin Heine

Abstract: Studying molecular dynamics is crucial for understanding biological processes in living cells. In principle, this is achieved by attaching fluorescent particles to molecules of interest and their detection using fluorescence microscopy. These analysis require fast optical techniques with at least 20Hz frame rate and a resolution below the diffraction limit in all three spatial dimensions. Current approaches basically rely on determining the correlation between features of the particle’s 2D point spread function (PSF) and the focal distance to the center of the particle. However, they are still unsuitable for the application to live cell imaging where the refractive index mismatch is present. This mismatch leads to non-stationary optical properties of the particles on which the algorithms rely, necessitating a calibration procedure prior to every experiment. However, this is almost unfeasible to particles attached to living cells. We established a spinning disk confocal setup and employ Quantum dots (QD) as fluorescence particles. Corresponding models of the axial PSF features that define the distance to the center of the particle are developed and analyzed in the presence of the refractive index mismatch. We present this analysis as the base for the future development of a 3D localization technique applicable to living cells.
Download

Paper Nr: 72
Title:

RECONFIGURABLE COMPUTING IP CORES FOR MULTIPLE SEQUENCE ALIGNMENT

Authors:

M. Lakka, A. Desarti, G. Chrysos, E. Sotiriades, I. Papaefstathiou and A. Dollas

Abstract: Multiple Sequence Alignment (MSA) is a principal tool in computational molecular biology. MSA is considered to be a very challenging problem as many software implementations suffer from quadratic time performance. Two of the best known MSA algorithms, which offer high accuracy and great speed, are T-Coffee and MAFFT. Reconfigurable technology provides a dramatic reduction of execution time by taking advantage of high parallelism. It also allows for different problem sizing solutions within a generic intellectual property (IP) core. This paper presents the implementation of MAFFT and T-Coffee algorithms on present-day Field Programmable Gate Arrays (FPGAs). The performance of the FPGA systems is compared against software implementations, concluding that the parallelism of reconfigurable technology can offer significant computational power to the bioinformatics community..
Download

Paper Nr: 77
Title:

A HIGH ACCURACY CT BASED FEM MODEL OF THE LUMBAR SPINE TO DETERMINE ITS BIOMECHANICAL RESPONSE

Authors:

A. Tsouknidas, N. Michailidis, S. Savvakis, K. Anagnostidis, K.-D. Bouzakis and G. Kapetanos

Abstract: The lumbar spine is origin of the most frequent complains among all human body parts, since almost 80% of the population will at some point in life exhibit back related pathologies which in their majority will not require invasive surgery. Regardless the cause or the development of the problem, the in-depth investigation of its cause is of the upmost importance during treatment or preoperative evaluation. In this context a model of the L1-L5 vertebra, capable of accurately assessing the biomechanical response of the lumbar spine derived from human activity as well as externally induced loads, would be an effective tool during the examination of normal or clinical conditions. This study presents a CT based FEM model of the lumbar spine taking into account all function related boundary conditions such as mechanical property anisotropy, ligaments, contact elements mesh size etc. The developed model is capable of comparing the mechanical response of a healthy lumbar spine to any given pathology, which can be easily introduced into the model, thus providing valuable insight on the stress development within the model and predict critical movements and loads of potential patients.
Download

Paper Nr: 78
Title:

MODELING INTERNAL RADIATION THERAPY

Authors:

Egon L. van den Broek and Theo E. Schouten

Abstract: A new technique is introduced to model (internal) radiation therapy. It is founded on morphological processing, in particular distance transforms. Its formal basis is presented as well as its implementation via the fast exact Euclidean distance (FEED) transform. Its use for all variations of internal radiation therapy is described. In a benchmark FEED proved to be truly exact as well as faster than a comparable technique. In particular, this 100% accuracy can be of crucial importance for radiation therapy purposes as the balance between maximization of treatment effect and doses that cause unwanted damage to health tissue is fragile. Through the modeling technique presented here this balance can be secured.
Download

Paper Nr: 79
Title:

A THEORETICAL ANALYSIS OF SCALABILITY OF THE PARALLEL GENOME ASSEMBLY ALGORITHMS

Authors:

Munib Ahmed, Ishfaq Ahmad and Samee Khan

Abstract: A rapid growth of the sequenced genomic data over the last two decades has far exceeded the advancement of both the algorithms and the computing horsepower required to expeditiously process and analyze it. Several algorithms have been devised and implemented to assist the process of genome fragments assembly: one of the most challenging and computationally intensive processes that may take weeks to assemble large size genomes. A few such algorithms have also been parallelized to speed up the process. However, there is a need to analyze such parallel algorithms using the specific metrics of parallel computing to ascertain their scalability and efficiency. The fact that the problem size can vary from a few million units of data to several billions, along with the vast differences in the degree of repetition in data sets, calls for the ability to establish an association between the nature of the problem and the algorithm that best solves it. This paper analyzes the scalability of two most widely used parallel genome assembly algorithms using Isoefficiency (Grama, Gupta & Kumar 1993) metric which will help provide a guideline to determine when and how to choose a particular genome assembly technique based on the nature and the size of the problem being solved.

Paper Nr: 80
Title:

LITTER EFFECT IN MOUSE PHENOTYPIC STUDIES

Authors:

Petr Simecek, Maria Dzur-Gejdosova, Irena Chvatalova and Jiri Forejt

Abstract: The laboratory mouse is the most common mammalian model organism for research of the human body functions and disorders. For experimental purposes mice selected from inbred strains, developed by many generations of brother-sister crosses, are usually used. Individual mice of a given inbred strain are therefore considered genetically identical. However, our preliminary observations suggest that for a number of phenotypic traits mice originating from the same litter are significantly more similar than mice coming from different litters of the same inbred strain. We estimated the size of this litter effect for a number of traits in several phenotypic studies. By means of simulation we showed that ignoring the litter effect may result in several fold higher false positive rate and severe underestimation of minimal sample size.
Download

Paper Nr: 92
Title:

BIO-INFORMATICS IN THE LIGHT OF THE MAXIMUM ORDINALITY PRINCIPLE - The Case of Duchenne Muscular Dystrophy

Authors:

Corrado Giannantoni

Abstract: In a previous paper (presented at the Third International Conference on Bioinformatics) we have shown that Protein Folding, although considered as being an “intractable” problem that would require thousands of years to be solved, in reality can be solved in less than 10 minutes when modeled in terms of Incipient Differential Calculus (IDC). Such an evaluation was specifically made with reference to Dystrophin, precisely because, being made up of about 100,000 atoms, it represents the largest protein in a human being. Consequently it can be considered as being the most significant ostensive example in the context of such Informatics problems. The present paper aims to show that the folding of Dystrophin can also be run on a simple PC in less than two hours, as a consequence of very profound “symmetry” properties of the Ordinal Matrices that characterize the mathematical model adopted. The same happens in the case of dynamic interactions, such as Molecular Docking and computer-aided Drug Design, which can be obtained in absolutely comparable computation time. This is also why, by keeping the original reference to Dystrophin, we assumed Duchenne Muscular Dystrophy as the pertinent corresponding example. The paper will also point out that such advantages are strictly referable to a different gnoseological (and mathematical) approach based on the Maximum Ordinality Principle, which can be considered as being the most advanced Ordinal Self-organization Principle for living (and also non-living) Systems.
Download

Paper Nr: 5
Title:

CUDA PERFORMANCE IN DNA ANALYSIS - Analysis of CUDA Architecture Performance in DNA Analysis

Authors:

Daniel Cadete, António dos Anjos, Hamid Reza Shahbazkia and Richard Christen

Abstract: It is shown how Nvidia Cuda can contribute to the analysis of DNA data. Three approaches are described, one using CPU threads and the other two using Cuda. For both Cuda approaches, advantages and disadvantages related to the Cuda Architecture, are presented. It is shown that, in some cases, one of these two approaches is competitive with the approach using CPU threads, and the other approach may produce good results in a near future.

Paper Nr: 10
Title:

IMAGE ANALYSIS COMBINED FLUORESCENCE MICROSCOPY - Examples of ImageJ Software Application in Yeast Studies

Authors:

Evgeny O. Puchkov

Abstract: For Saccharomyces cerevisiae yeast studies, three approaches have been developed. They are based on the image analysis (ImageJ software) application for the fluorescence microscopy data treatment. The first is a computer-aided fluorescence microscopy procedure for quantifying of the damaged cells in the ethanol-producing yeast culture. It was shown to be applicable for the assessment of the culture viability. The second is a means of characterizing Brownian motion of the insoluble polyphosphate complexes in the vacuoles. Using this approach, the apparent viscosity in the vacuoles was measured. The third is a method for locating intracellular sites/targets of the nucleic acid intercalators. This method may be of help in designing of new DNA-targeted drugs and in preliminary studies of their interaction with eukaryotic cells.
Download

Paper Nr: 17
Title:

CHAOS LEVEL INVESTIGATION OF CENTRE-OF-PRESSURE SINGLE-STEP DISPLACEMENT IN STATIC AND DYNAMIC VISUAL CONDITIONS

Authors:

Lili Pei, Shujia Qin, Wei Ding, Lei Miao and Hongyi Li

Abstract: As a convenient and feasible measure of postural control, centre-of-pressure (CoP) trajectories are investigated in most of postural research. The characteristics extracted from CoP trajectories provide valuable evidences in nature explorations of postural control. In this research, Shannon entropy is introduced into CoP trajectories analysis to reveal random characteristics of human upright postural control. In our Shannon entropy analysis, chaos level of CoP single-step displacement is inspected in static and dynamic visual conditions. Experimental results from twenty-one subjects under four visual conditions indicate that human postural control in upright stance appears more regulated in direction control than in amplitude control. This conclusion has specific significance in postural experiment design and postural control improvement.
Download

Paper Nr: 22
Title:

MUTATIONAL DATA LOADING ROUTINES FOR HUMAN GENOME DATABASES - The BRCA1 Case

Authors:

Matthijs van der Kroon, Ignacio Lereu Ramirez, Ana M. Levin, Óscar Pastor and Sjaak Brinkkemper

Abstract: The last decades a large amount of research has been done in the genomics domain which has and is generating terabytes, if not exabytes, of information stored globally in a very fragmented way. Different databases use different ways of storing the same data, resulting in undesired redundancy and restrained information transfer. Adding to this, keeping the existing databases consistent and data integrity maintained is mainly left to human intervention which in turn is very costly, both in time and money as well as error prone. Identifying a fixed conceptual dictionary in the form of a conceptual model thus seems crucial. This paper presents an effort to integrating the mutational data from the established genomic data source HGMD into a conceptual model driven database HGDB, thereby providing useful lessons to improve the already existing conceptual model of the human genome.
Download

Paper Nr: 25
Title:

INTERACTIVE VISUALIZATION TOOL FOR TUMOR GROWTH SIMULATIONS

Authors:

Rafal Wcislo

Abstract: We present the main requirements and ready-to-use components of the interactive visualization tool for modeling of solid tumor proliferation. As the simulation engine it uses complex automata paradigm, which integrates cellular automata with particle dynamics. To make it sufficiently fast for interactive visualization we show that the system can be efficiently implemented on multicore workstations, with moderate number of processors controlled by data parallel interface such as OpenMP. In the near future the system will be empowered by a combined CPU and GPU computational environment. This in silico lab system is intended for medical laboratories doing research in oncology and/or in anticancer drug design.
Download

Paper Nr: 28
Title:

EXPERIMENTAL RESULTS ON MULTIPLE PATTERN MATCHING ALGORITHMS FOR BIOLOGICAL SEQUENCES

Authors:

Charalampos S. Kouzinopoulos, Panagiotis D. Michailidis and Konstantinos G. Margaritis

Abstract: With the remarkable increase in the number of DNA and proteins sequences, it is very important to study the performance of multiple pattern matching algorithms when querying sequence patterns in biological sequence databases. In this paper, we present a performance study of the running time of well known multiple pattern matching algorithms on widely used biological sequence databases containing the building blocks of nucleotides (in the case of nucleic acid sequence databases) and amino acids (in the case of protein sequence databases).
Download

Paper Nr: 29
Title:

MODELING DELAYS IN STATE TRANSITION OF A BISTABLE GENETIC SWITCH UNDER THE INFLUENCE OF EXTRINSIC NOISE

Authors:

Jaroslav Albert and Marianne Rooman

Abstract: Among other functions, bistable genetic switches serve as decision-makers, accepting or rejecting noisy input signals. In some instances, e. g. during developmental stages, it is imperative that, once an input signal is accepted, the gene’s expression remains virtually unchanged for a certain period of time before evolving to its other stationary state. In this paper, we aim to tackle the question of what causes this delay to occur. We look at a particular model of a bistable switch and study the conditions which lead to delayed state transitions. Given that every biological system is subject to noise, it is imperative that any model capable of explaining and predicting these delays is robust against random parameter perturbations. Therefore, in order to test the robustness of the model, we subject the system to random noise and show that for particular combinations of parameter values, its effects on the delays are negligible. It is demonstrated that the ratio of protein to mRNA degradation rates plays a critical role in the system’s confidence to generate accurate delays.
Download

Paper Nr: 30
Title:

THE MAMDANI CONTROLLER IN PREDICTION OF THE SURVIVAL LENGTH IN ELDERLY GASTRIC PATIENTS

Authors:

Hang Zettervall, Elisabeth Rakus-Andersson and Henrik Forssell

Abstract: Strict analytic formulas are the tools derived for determining the formal relationships between a sample of independent variables and a variable which they affect. If we cannot formalize the function tying the independent and dependent variables then we will utilize fuzzy control actions. The algorithm is particularly adaptable to support the problem of prognosticating the survival length for gastric cancer patients. We thus formulate the objective of the current paper as the utilization of fuzzy control action for the purpose of making the survival prognoses.
Download

Paper Nr: 33
Title:

OPTIMIZATION OF A SOLID STATE FERMENTATION BASED ON RADIAL BASIS FUNCTION NEURAL NETWORK AND PARTICLE SWARM OPTIMIZATION ALGORITHM

Authors:

Badia Dandach-Bouaoudat, Farouk Yalaoui, Lionel Amodeo and Françoise Entzmann

Abstract: Radial basis function neural network (RBF) and particle swarm optimization (PSO) are used to model and optimize a solid state fermentation (SSF) for production of the enzyme. Experimental data reported in the literature are used to investigate this approach. The response surface methodology (RSM) is applied to optimize PSO parameters. Using this procedure, two artificial intelligence techniques (RBF-PSO) have been effectively integrated to create a powerful tool for bioprocess modelling and optimization. This paper describes the applications of this approach for the first time in the solid state fermentation optimization.
Download

Paper Nr: 36
Title:

MICROARRAY SYSTEM - A System for Managing Data Produced by DNA-microarray Experiments

Authors:

Alberto Calvi, Pietro Lovato, Simone Marchesini, Barbara Oliboni, Massimo Delledonne and Alberto Ferrarini

Abstract: In this paper, we present the Microarray System which is based on a MIAME-compatible database and allows the users to store and retrieve data produced by experiments made with the DNA-microarray technology. This system was designed and implemented for managing data coming from the Functional Genomics Centre (FGC) of the University of Verona.
Download

Paper Nr: 37
Title:

A METHOD TO IMPROVE THE ACCURACY OF PROTEIN TORSION ANGLES

Authors:

J. C. Calvo, J. Ortega, M. Anguita, J. Taheri and A. Zomaya

Abstract: Protein structure prediction (PSP) is an open problem with many useful applications in disciplines such as Medicine, Biology and Biochemistry. As this problem presents a vast search space where the analysis of each protein structure requires a significant amount of computing time, it is necessary to propose efficient search procedures in this very large space of possible protein conformations. Thus, an important issue is to add vital information (such as rotamers) to the process to decrease its active search space –rotamers give statistical information about torsional angles and conformations. In this paper, we propose a new method to refine the torsional angles of a protein to remake/reconstruct its structures with more resemblance to its original structure. This approach could be used to improve the accuracy of the rotamer libraries and/or to extract information from the Protein Data Bank to facilitate solution of the PSP problem.
Download

Paper Nr: 54
Title:

A COMPUTATIONAL MODELLING APPROACH TO EXPLORE THE ANTI-MICROBIAL PRO-DRUG DELIVERY SYSTEM

Authors:

James T. Murphy, Ray Walshe and Marc Devocelle

Abstract: This article documents simulations using an agent-based modelling approach to analyse the system dynamics of the b-lactamase-dependent therapeutic activation pro-drug delivery system, a novel approach for achieving selective release of anti-microbial drugs for treating antibiotic-resistant bacteria. It is thought that this strategy could be a promising approach for treating b-lactamase over-expressing strains of bacteria that are resistant to traditional b-lactam antibiotics such as penicillin. Test simulations were carried out to investigate the prodrug system from a theoretical standpoint and assess the effects of key parameters such as half-life, diffusion rate and reaction kinetics on the system behaviour. It is important to obtain a thorough understanding of the complex interplay between the various components involved in the pro-drug delivery system to be able to interpret results from laboratory testing, and ultimately, from the clinical setting. The agent-based model described here represents an important stepping stone in connecting the theoretical and practical understanding of the system as a whole.
Download

Paper Nr: 57
Title:

UNIFIED MODELING OF SEVERAL PERTURBATION EXPERIMENTS IN SYSTEMS BIOLOGY - A Case Study on the Glucose Uptake of Lactococcus Lactis

Authors:

András Hartmann, Susana Vinga and Joao M. Lemos

Abstract: Dynamic modeling of the metabolism is one of the main research areas of systems biology. A typical but yet unresolved problem is the modeling of glucose uptake of Lactococcus lactis bacteria upon in-vivo NMR measurements in perturbation experiments. Most modelers are focusing on the inverse problem, namely to identify the parameters of a set of differential equations using the available dataset. Majority of the available models suffer from the drawback that even if a perfect fit to a single experiment was achieved, they can not explain the systems’ behavior in different experimental conditions. The aim of this study is to introduce an appropriate method and a model to fit one set of parameters to several different experiments, enabling unify modeling of the glucose decay of the bacteria. With the proposed approach a good overall fit was obtained to the dataset. The results confirm that this could be a future way towards unified modeling of data with heterogeneous experimental conditions.
Download

Paper Nr: 65
Title:

UNREVEALING BIOLOGICAL PROCESS WITH LINEAR ALGEBRA - Extracting Patterns from Noisy Data

Authors:

Bráulio Roberto Gonçalves Marinho Couto, Marcelo Matos Santoro and Marcos Augusto dos Santos

Abstract: Extracting patterns from protein sequence data is one of the challenges of computational biology. Here we use linear algebra to analyze sequences without the requirement of multiples alignments. In this study, the singular value decomposition (SVD) of a sparse p-peptide frequency matrix (M) is used to detect and extract signals from noisy protein data (M = USVT). The central matrix S is diagonal and contains the singular values of M in decreasing order. Here we give sense to the biological significance of the SVD: the singular value spectrum visualized as scree plots unreveals the main components, the process that exists hidden in the database. This information can be used in many applications as clustering, gene expression analysis, immune response pattern identification, characterization of protein molecular dynamics and phylogenetic inference. The visualization of singular value spectrum from SVD analysis shows how many processes can be hidden in database and can help biologists to detect and extract small signals from noisy data.
Download

Paper Nr: 73
Title:

BINDING FREE ENERGY CALCULATION VIA MOLECULAR DYNAMICS SIMULATIONS FOR A miRNA:mRNA INTERACTION

Authors:

G. Paciello, A. Acquaviva, E. Ficarra, M. A. Deriu, A. Grosso and E. Macii

Abstract: In this paper we present a methodology to evaluate the binding free energy of a miRNA-mRNA complex through Molecular Dynamics-Thermodynamic Integration simulations. We applied our method on the C−elegans let-7 miRNA:lin-41 mRNA complex, known to be a validate miRNA:mRNA interaction, in order to evaluate the energetic stability of the structure. The methodology has been designed to face the various challenges of nucleic acid simulations and binding free energy computations and to allow an optimal trade-off between accuracy and computational cost.
Download

Paper Nr: 98
Title:

TEXTURE ANALYSIS OF MILK PROTEIN GELS USING DIGITAL IMAGE ANALYSIS

Authors:

Juan Pablo Costa, Horacio Castellini, Patricia Risso and Bibiana Riquelme

Abstract: Sodium caseinate (NaCAS) is a very useful ingredient in food industry because of its nutritional and functional properties. Acidification produces a gel structure as a result of the dissociation and aggregation of caseinic fractions. Formation of these protein gels can be made by the slow reduction of pH through the addition of glucono-delta-lactone (GDL). Depending on its concentration and temperature, hydrolysis speed of GDL can affect the grade of hardness and elasticity of the formed gel. This study evaluated the effect on the formation and structure of protein gels induced by different relations of GDL through analysis of digital images obtained in an inverted conventional microscope and a confocal microscope. The entropy, smoothness and variance decrease with the added GDL quantity, but the uniformity increases. Results confirm that the texture depends on gelification speed, which is directly related to the amount of added GDL. This digital image analysis technique using conventional or confocal microscopy is, therefore, suitable and very useful for the texture analysis of acid gels formed by different GDL/NaCAS rates.
Download

Paper Nr: 100
Title:

COLOUR CORRECTION FOR ASSESSING PLANT PATHOLOGY USING LOW QUALITY CAMERAS

Authors:

Sion Hannuna, Timo Kunkel, Nantheera Anantrasirichai and Nishan Canagarajah

Abstract: We describe a framework for standardising the colour of plant images taken using both mobile phones and compact cameras. This is with a view to maximising the accuracy of plant pathology diagnosis. Rather than attempt to characterise each camera, we place cheap and easily portable custom colour targets in the scene being captured. We propose a novel weighted least squares formulation for optimising the transformation function deduced for each image, where the relative contribution for each patch is proportional to the number of closely matching pixels in the image to be transformed. We also introduce our custom colour target which has been designed to preferentially map plant colours and facilitate simple automatic extraction from the image being processed. We provide subjective and objective results demonstrating the efficacy of our approach. It is anticipated that the methods described here could be applied to any application where perceptual consistency is of value.
Download