Gene Informatics



Evolutionary informatics merges theories of evolution and information, thereby wedding the natural, engineering, and mathematical sciences. Evolutionary informatics studies how evolving systems incorporate, transform, and export information. The Evolutionary Informatics Laboratory explores the conceptual foundations, mathematical development, and empirical application of evolutionary informatics. The principal theme of the lab’s research is teasing apart the respective roles of internally generated and externally applied information in the performance of evolutionary systems.


Intelligent design is the study of patterns in nature best explained as the product of intelligence. So defined, intelligent design seems not problematic. Archaeology, forensics, and the search for extraterrestrial intelligence (SETI) all fall under this definition. In each of these cases, however, the intelligence in question could be the result of an evolutionary process. But what if patterns best explained as the product of intelligence exist in biological systems? In that case, the intelligence in question would be an un-evolved intelligence. For most persons, such an intelligence has religious connotations, suggesting that it as well as its activities cannot properly belong to science. Simply put, intelligent design, when applied to biology, seems to invoke ‘spooky’ forms of causation that have no place in science. Evolutionary informatics eliminates this difficulty associated with intelligent design. By looking to information theory, a well-established branch of the engineering and mathematical sciences, evolutionary informatics shows that patterns we ordinarily ascribe to intelligence, when arising from an evolutionary process, must be referred to sources of information external to that process. Such sources of information may then themselves be the result of other, deeper evolutionary processes. But what enables these evolutionary processes in turn to produce such sources of information? Evolutionary informatics demonstrates a regress of information sources. At no place along the way need there be a violation of ordinary physical causality. And yet, the regress implies a fundamental incompleteness in physical causality’s ability to produce the required information. Evolutionary informatics, while falling squarely within the information sciences, thus points to the need for an ultimate information source qua intelligent designer.


Weasel WareGTAC

A web-based simulation of Dawkins’ Weasel. Do the mere presence of replication, mutation and selection guarantee success in a search? If not, what fraction of all possible fitness functions lead to success? How likely are we to stumble across a successful fitness function if we initialize one at random? The free simultation lab allows users to answer these questions and more. Simply set up your experiment, choose (or optionally design) your fitness function and record your results.


Minivida is a simplified, online simulation of Lenski et. al.’s Avida. It allows the evolution and visualization of NAND-logic programs. With it, you can learn how the choice of rewarded tasks and instructions affects the ability of locating EQU operations. Does the process of mutation and selection alone allow the evolution of complex operations or is the presence of stair step information necessary in order to guide the process? Run the simulation and see what role user-supplied information plays in the discovery of complex features.

Ev Wareinformatics

A javascript implementation of Tom Schneider’s ev, with several modes and upgrades. Allows for real-time, multi-run evolution experiments. With Ev Ware, you can test and limit how much information is imposed on the search by the structure of the program and by particular parameter settings. Discover how likely randomized initializations are to stumble upon targets of your choice. Does ev create information from “scratch” or does it merely reshuffle pre-supplied information? Find out.


  • The Search for a Search: Measuring the Information Cost of Higher Level SearchWilliam A. Dembski and Robert J. Marks II
  • LIFE’S CONSERVATION LAW: Why Darwinian Evolution Cannot Create Biological InformationWilliam A. Dembski and Robert J. Marks II
  • Conservation of Information in Search: Measuring the Cost of SuccessWilliam A. Dembski and Robert J. Marks II
  • Bernoulli’s Principle of Insufficient Reason and Conservation of Information in Computer SearchWilliam A. Dembski and Robert J. Marks II
  • Evolutionary Synthesis of Nand Logic: Dissecting a Digital OrganismWinston Ewert, William A. Dembski and Robert J. Marks II
  • A Vivisection of the ev Computer Organism: Identifying Sources of Active InformationGeorge Montañez, Winston Ewert, William A. Dembski and Robert J. Marks II
  • A Second Look at the Second Law Granville Sewell
Genetic distance equation

Genetic Markers

First, by significantly scaling-up the number of genetic markers, genomewide NGS approaches enhance the power and resolution for the above-mentioned applications and improve the reliability of conclusions (Steiner et al. 2013 ).

Second, the application of genomic technologies opens novel axes of investigation (Allendorf et al. 2010 ; Ouborg et al. 2010 ). Genome-scale data provide information beyond neutral genetic variation or candidate gene approaches (e.g. major histocompatibility complex genes; Hedrick 1999 )

To enable creening for selectively important variation and assessing the adaptive potential of populations (Primmer 2009 ). For example, approaches such as genome-wide scans for selection, association mapping or quantitative trait loci (QTL) mapping can pinpoint loci of relevance for local adaptation of the target population (Steiner et al. 2013 ), with the potential to conserve evolutionary processes – a long sought after goal in conservation biology (Crandall et al. 2000 ; Fraser and Bernatchez 2001 ).

Genomewide analyses

This application allows addressing the poorly understood mechanistic basis of inbreeding depression (epistasis, directional dominance versus overdominance, many versus few loci), or assessing the impact of genetic variation on patterns of gene expression, and plastic response to environmental change. Keywords: database, genetics, genomics, human disease model, informatics, model organism, mouse, phenotypes.

The early 1990s saw the birth of genomics, as high-throughput techniques coupled with early robotics and bioinformatics enabled large-scale data collection from genomic and expressed sequence tag clones Plant & Animal Genomes XVIII Conference January 9-13, 2010 Town & Country Convention Center San Diego, CA W219 : Evolution of Genome Size Chromosomal Distribution Of Pine Repetitive DNA Sequences Islam-Faridi Nurul 1, Zenaida V. He started as co-organizer of the Reduced Representation Sequencing Workshop (RRSW) with Andrew H. Meet us at the Plant & Animal Genome Conference (PAG XXIII) Last updated: 26 Nov, 2014.

We and others have used SOMs to characterize codon usage patterns of a wide variety of bacteria (Kanaya et al. 1998 ; Wang et al. 2001 ). We introduced a new feature to the SOM for studies of genomic sequences that makes the learning process independent of the order of data input (Abe et al. 1999 ), and we characterized codon usage in 60,000 genes from 29 bacterial species (Kanaya et al. 2001 ). SOMs were particularly useful, not only in searching for horizontally transferred genes, but also in predicting the donor genomes of the transferred genes. National Institutes of Health National Human Genome Research Institute grants (HG005097-1 and HG005613-01) and in part by Bill & Melinda Gates Foundation OPP42867 to X. NGS beginner Summary This manual introduces the basics of next generation sequence (NGS) data to researchers who would like to get started with sequencing their data.

The latest versions of the C. brenneri, C. briggsae, C. japonica, C. remanei, and P. pacificus sequences were obtained from the Genome Sequencing Center at The Genome Institute at Washington University (WUSTL) The cb1 browser data were obtained from WormBase.

What are PGAs models?

PGAs stands for Partitioned Global Address Space for Large-scale genome-wide association studies.

(1) development of animal models and characterization of phenotype in these models;

(2) measurement of gene expression, identification of regulated genes, and identification of single nucleotide polymorphisms (SNPs) in both animal models and human patients for a range of cardiopulmonary disorders;

(3) development of new databases, data analysis procedures, and software tools for cardiovascular genomics. In particular, we review the status of

(1) whole genome sequencing efforts in human, mouse, rat, zebrafish, and dog;

(2) the development of data mining and analysis tools;

(3) the launching of the National Heart, Lung, and Blood Institute Programs for Genomics Applications and Proteomics Initiative;

(4) efforts to characterize the cardiac transcriptome and proteome; and

(5) the current status of computational modeling of the cardiac myocyte. This view has been used for many genome-wide association study reports R – R packages: Shiny, DT, ggplot, plotly, ggraph, visNetwork, RMySQL, DBI – Data sets will be queried from a SQL database on a VM located on Duke’s network – CSV files can be made available for participants for which queries fail (ftp from VM or github, resource location to be provided in session) ggplot (metadata) + geom_bar (aes (x = genome_size), stat = “bin”, binwidth = 0. As any avid follower of genomics or medical genetics knows, genome-wide association studies (GWAS) have been the dominant tool used by complex disease genetics researchers in the last five years.

Genome bioinformatics and high throughput sequencing Searching genes and gene functions, Genome databases, Variation in the genome, Sequencing technologies past, present and future (Sanger, Shotgun, PacBio, Illumina, toward the $500 human genome), Biological applications of sequencing, Bioinformatics analysis methods

Drosophyla Informatics

WFleaBase presently archives 528 microsatellite markers 15 Yet, to generate additional loci for genetic mapping in D. pulex and D. magna, wFleaBase integrates a suite of computational programs that

(i) identifies microsatellites from raw DNA sequencer trace files,

(ii) designs optimal primers for amplifying the markers and

(iii) indexes the amplicon, microsatellite motifs and primer information into the Microsat database 16 The Microsat database will rapidly grow by applying this pipeline to trace files emerging from the Daphnia genome sequencing project. Returning to the welcome page, the user can instead choose to explore tables containing data extracted from automated BLAST searches against the euGenes database, which includes annotated genome sequences from 10 eukaryotic model organisms. Distal IgVH genes are modulated by IL-7 and Pax5: Authors: Xu, C. 2016: ABySS updated to version 2. The Integrative Genomics Viewer (IGV) from the Broad Center allows you to view several types of data files involved in any NGS analysis that employs a reference genome, including how reads from a dataset are mapped, gene annotations, and predicted genetic variants.

We also compare a small interval of human chromosome 7q31 with DNA sequences of four species at different evolutionary distances to demonstrate the multistep process of comparative sequence analysis, and discuss several of the public resources available for these studies. Knowledge of DNA has allowed us to study the mutational process with nucleotide and phosphodiester bond precision 6 Our DNA-based technology has made it possible to acquire a growing database of genome sequences that permit us to read the history of evolutionary events preserved in the nucleic acid and protein record. The “Cost per Genome” graph was generated using the same underlying data as that used to generate the “Cost per Megabase of DNA Sequence” graph; the former thus reflects an estimate of the cost of sequencing a human-sized genome rather than the actual costs for specific genome-sequencing projects.

Production activities are essential to the routine generation of large amounts of quality DNA sequence data that are made available in public databases; the costs associated with production DNA sequencing are summarized here and depicted on the two graphs