Skip to content.

UPR HPCf

Sections
Personal tools
You are here: Home » Members » humberto's Home » Workshop » Bioinfo Workshop » Workshop Abstracts

Workshop Abstracts

Abstracts of the talks scheduled for the Bioinformatics and Biostatistics Workshop.

The Challenge of Reverse Engineering Dynamical Models in Systems Biology

John Ambrosiano

Los Alamos National Laboratory

Systems biology is an emerging field that seeks to develop predictive models of biological processes. Presently most explorations are at the molecular and cellular level, but we can be sure these will eventually encompass much larger systems to include whole organisms and organism communities. There are many challenges to reverse engineering dynamical models of biological systems. Many current efforts are devoted to finding good ways of inferring network topology from experimental sources such as gene microarrays or other assays. Our concerns are mainly focused on the problem of fitting a dynamical model to time series data given a reasonable qualitative view of the system topology as a starting point. There are two main challenges that confront us: (1) to find methods of fitting the parameters associated with systems of nonlinear differential equations, particularly methods that are extensible to a wide range of systems, and that are also likely to be computationally scalable; and (2) once a fit is obtained, to harness machine learning methods that would allow one to explore the categories of system behaviors accessible to the model. In this talk I will describe work in progress on the first of these issues, using an example motivated by a problem in intercellular signaling connected with host-pathogen immune response.


Statistical Issues in Admixture Mapping for Disease

Nick J. Patterson, David E. Reich, Neil Hattengadi and David Altshuler

A number of diseases such as prostate cancer, multiple sclerosis, hypertension, type 2 diabetes and lupus show large differences in incidence between European and African populations. In each case there is good evidence that disease incidence has a partial genetic basis. A promising approach to determining genomic regions implicated in disease is to carry out an admixture study in African-Americans. We attempt to assign chromosomal regions in the genomes of African-Americans to European or African population samples, and seek regions where the mean probability of assignment to a European sample is very different from background in a population of African-American patients. We are carrying out such a study in a group of African-Americans with multiple sclerosis (MS). Some delicate statistical issues arise. In particular there is uncertainty in the composition of the parental European and African populations, and further uncertainty in the true population frequencies of the genetic markers used in our study. Misestimation of these critical model parameters can lead to apparent but spurious genomic association with disease. We describe a Bayesian analysis that deals with these difficulties. We also propose a novel Bayesian `whole-genome' statistic. There has been considerable controversy about how significance levels in genomic disease scans should be handled, and our Bayesian statistic is a new approach bypassing most of the difficulties that have arisen with earlier methods.


Getting Usable Data from Microarrays: The Role of Statisticians

Rafael Irizarry

John Hopkins University

In this talk I will give some examples of why I think it is important that statisticians be involved in the preprocessing of microarray data. I will then describe a specific example related to preprocessing Affymetrix GeneChip high density oligonucleotide array raw data. High density oligonucleotide expression array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11-20 pairs of oligonucleotides separately referred to as probes. Typically 12,000 to 20,000 probe sets are arrayed on a silicon chip. RNA samples are prepared, labeled and hybridized to the arrays. Arrays are then scanned, and images produced and analyzed to obtain an intensity value for each probe. These intensities quantify the extent of the hybridization between the labeled target sample and the oligonucleotide probe. A final step to obtain expression measures is to summarize the probe intensities for a given gene in order to quantify the amount of the corresponding mRNA species in the sample. Using two extensive spike-in studies and a dilution study, we performed a careful assessment of the method of summarizing probe level data provided by the current version of the Affymetrix Microarray Suite (MAS 5.0). We found that the performance of the Affymetrix technology can be greatly improved by the use of expression measures derived from empirically motivated statistical models. The advantages of a new expression measure are assessed through bias, variance, sensitivity, and specificity. In particular, the improvements achieved by a 10-fold decrease in variability for low expression levels are demonstrated. A paper describing this example can be found on the web: http://www.biostat.jhsph.edu/~ririzarr/papers


Reverse Engineering of Genetic Networks Including Clustering and Error Correction

Oscar Moreno, Humberto Ortíz, M. Aviñó, D. Bollman, E. Orozco, S. Peña

University of Puerto Rico, Rio Piedras

Many research groups have described genetic networks as networks of Boolean variables, and provided procedures for reverse engineering [1, 7, 3]. Recently, Laubenbacher and colleagues proposed using finite fields to represent genes and proteins in biological networks [4, 5, 6]. We demonstrate how to generalize Boolean genetic networks to finite field genetic network models. We develop a procedure for error-correction of microarray data based on majority logic decoding, and we apply Hamming distance as a metric for clustering. We demonstrate the utility of finite fields by applying the techniques to a data set developed at the University of Puerto Rico, from rats trained in a memory task called conditioned taste aversion [2]. Our method also validates the experimental design of doing 5 replications of the experiment. Finally we also have a very efficient method for reverse engineering.


Phylogenetic Trees: Statistical Approaches

Maria Gloria Dominguez Bello

Dept. Biology UPR

DNA is the tie that links all organisms together, and sequences are not independent. We can learn about novel sequences by comparing them with other sequences in other organisms, and learn about their similarity. Phylogenetic inference attempts to establish relations among taxa, and their evolutionary history. The aim of this talk is to stress the need of a multidisciplinary approach biological-statistical-computational, to improve solutions of phylogenetic analyses in biology. We review currently used distance methods, maximum parsimony and maximum likelihood that attempt to estimate topology and branch length. All of them make assumptions and no one is best for all circumstances. Methods of estimation of variation in the pattern of sequence substitutions also make assumptions. Genebank has already reached 16 million sequences, and sequence comparison and inference of phylogeny demand increasing computer time as more sequences are fed into the databases and models gain complexity. Biologists increasingly use phylogenetic tools without a proper conceptual understanding. Furthermore, theoretical foundations of statistical tests are not completely established, particularly for methods of constructing and testing trees. Statistical principles need to be established with biological sense, and the multidisciplinary interactions of biologists, statisticians and computer scientists is required to find appropriate solutions.


Dynamical Systems Over Different Finite Fields

M. A. Aviñó and Oscar Moreno

We present a mathematical model: dynamical systems over different finite fields (DSF), and we will show that Boolean and discrete genetic models are special cases of DSF. In addition, we prove that a function defined over different finite sets (or finite fields) can be represented as a polynomial function over a finite field. We describe an algorithm for giving the data of a function over different finite fields obtain all the polynomial functions associate to this data.  We apply the algorithm to solve the Reverse Engineering Problem over different finite fields.


On the Applications of Objective Bayesian Model Selection in Statistical Analysis of Gene Expression Data and Phylogenetics

Pericchi L. R. Sisson S.A. and Yang Ch.

Several problems in Biostatistics are really Model Selection problems. Until recently it has not been possible to perform truly objective Bayesian analyses for the comparison of different models seeking to explain the given data; the adoption of improper prior distributions risks leading to improper posterior distributions, and in all cases incorporates arbitrary constants in the analysis. However, the adoption of proper priors may not accurately reflect true ignorance regarding component parameters. This can be especially important in the analysis of microarray gene expression data as the propagation of such prior beliefs into the posterior can lead to differences in the rankings of genes selected for future investigation. Using recent advances in model comparison technology we advocate the use of Intrinsic and EP priors. Also there is a clear problem with type (I or II) errors when repeating hypothesis tests. This is of particular importance in testing for differential expression of microarray data where the number of genes (and therefore hypotheses) under consideration is measured in the thousands. While genome-wide significance levels may be calculated, this presents a generally unsatisfactory solution to the problem of considering large numbers of dependent tests. We examine a number of methods that propose to circumvent this problem, and conclude that the selection of the overall best model is the justified fully probabilistic solution to overall types of errors. Also the choice of the best mixture model is really a model selection problem, and the general theory of EP-priors and Intrinsic Bayes Factors can be applied to mixtures. The previous applications are also relevant to Phylogenetic trees on which the model selection is about choosing the best Bayesian classification.


Bioinformatics Management System to Integrate and Federate Heterogeneous Biological Databases

Jaime E. Ramírez-Vick, Manuel Rodríguez-Martínez*, Bienvenido Vélez-Rivera, and Pedro I. Rivera-Vega

General Engineering, Electrical and Computer Engineering Departments, UPR - Mayagüez

One of the biggest challenges in bioinformatics and drug discovery today is data access and integration. This issue has become a major bottleneck to R&D productivity for many biotechnology and pharmaceutical companies. The challenge exists because biomedical data sources are geographically distributed, complex and heterogeneous in data types and structures, and are constantly changing. With the unprecedented growth of genomic, proteomic, and other types of scientific data, the challenge now is how large volumes of data can first be retrieved from multiple databases, files, and instruments and then transformed and integrated automatically and flexibly. This process is critical for turning data into knowledge.

Deploying generalized integrated Biomedical data services has proven to be a daunting task, particularly for sites with limited technical expertise. An alternative is the development of BioWeb, a Light-Weight Bioinformatics Database Integration Middleware System consisting of a comprehensive set of software tools designed to ease the task of deploying new specialized integrated biomedical data services (SIB) requiring integration of multiple autonomous bioinformatics databases. A SIB is a service capable of answering a relatively narrow set of data requests or queries with high effectiveness and reliability. For instance, a simple SIB may provide consolidated access to both internal and public sequence data. Another more sophisticated SIB may exploit a patented third party algorithm capable of accurately inferring protein function by combining sequence, family and structure data from several distributed data sources.

The initial efforts will focus on demonstrating the feasibility of this approach by developing a first generation of BioWeb software components which include: A BioWeb SQL-based Bioinformatics query language (BQL) interpreter serving as the lingua franca among a collection of BioWeb databases supporting a SIB. A BioWeb extensible search and retrieval engine (SRE) capable of dynamically integrating new modules called BioConsuls implementing the interfaces to a heterogeneous set of data sources. An initial set of database-specific BioConsuls, translation modules responsible for the conversion and integration of data from diverse sources into/from the BQL-based model.


Gene Regulation Networks in Rat Emotional Learning

Humberto Ortiz-Zuazaga, María Alicia Aviñó-Diaz, Oscar Moreno, Sandra Peña de Ortiz

Conditioned Taste Aversion or CTA is a behavioral task that measures long-term memory in rats. We have measured gene expression with cDNA microarrays in rats at 4 time points times after subjecting them to CTA. We have used a variety of techniques to characterize the expression time-course following CTA, including statistical analysis of differential expression, various clustering techniques, analysis of promoter regions, and reverse engineering of genetic networks.


A Parallel Algorithmic Approach to the Reverse Engineering Problem

Edusmildo Orozco, D. Bollman, O. Moreno

University of Puerto Rico

PhD CISE Program, Department of Mathematics, Mayagüez, PR 00681-9018

A genetic network is a set of genes together with a set of links that represent their interactions. In bioinformatics, the reverse engineering problem is the problem of finding an appropriate network that accounts for all interactions between these genes. O. Moreno et al propose a finite dynamic network approach that leads to a polynomial interpolation over a finite field. Conceptually, as well as computationally, one important idea is “lifting;” a multivariate polynomial can be seen as a polynomial with only one indeterminate in the appropriate Galois field. In this work we present an efficient parallel algorithm for the solution of the reverse engineering problem based on a O(k log2 k) sequential algorithm due to J. Lipson [1971].

Created by humberto
Last modified 2003-07-23 12:03 PM
 

Powered by Plone

This site conforms to the following standards: