Software

philentropy

Description: The R package philentropy implements >46 fundamental distance and similarity measures to quantify distances between probability density functions as well as traditional information theory measures. Comparison is a fundamental method of scientific research leading to more general insights about the processes that generate similarity or dissimilarity. In statistical terms comparisons between probability functions are performed to infer connections, correlations, or relationships between samples. The philentropy package implements optimized distance and similarity measures for comparing probability functions. These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a base framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions.

Webpage: https://hajkd.github.io/philentropy/

metablastr

Description: The R package metablastr harnesses the power of sequence search tools by providing interface functions between R and the standalone (command line tool) versions of several sequence aligners. In addition to providing interface functions, metablastr provides a scalable database backend infrastructure and analytics tools to store and handle the extensive search output generated by these aligners when handling thousands of genomes.

Webpage: https://hajkd.github.io/metablastr/

biomartr

Description: The vastly growing number of sequenced genomes allows us to perform a new type of biological research. Using a comparative approach these genomes provide us with new insights on how biological information is encoded on the molecular level and how this information changes over evolutionary time. The first step, however, of any genome based study is to retrieve genomes and their annotation from databases. To automate the retrieval process of this information on a meta-genomic scale, the biomartr package provides interface functions for genomic sequence retrieval and functional annotation retrieval. The major aim of biomartr is to facilitate computational reproducibility and large-scale handling of genomic data for (meta-)genomic analyses. In addition, biomartr aims to address the genome version crisis. With biomartr users can now control and be informed about the genome versions they retrieve automatically. Many large scale genomics studies lack this information and thus, reproducibility and data interpretation become nearly impossible when documentation of genome version information gets neglected. In particular, biomartr automates genome, proteome, CDS, RNA, Repeats, GFF/GTF (annotation), genome assembly quality, and metagenome project data retrieval from the major biological databases and performs quality checks of the underlying genome assemblies.

Webpage: https://ropensci.github.io/biomartr/

myTAI

Description: Evolutionary transcriptomics studies can serve as a first approach to screen in silico for the potential existence of evolutionary constraints within a biological process of interest. This is achieved by quantifying transcriptome conservation patterns and their underlying gene sets in biological processes. The exploratory analysis functions implemented in R package myTAI provide users with a standardized, automated and optimized framework to detect patterns of evolutionary constraints in any transcriptome dataset of interest.

Webpage: https://github.com/HajkD/myTAI

LTRpred

Description: The LTRpred pipeline is designed for the de novo annotation of functional retrotransposons within any given genome assembly. The difference between a classical annotation and a functional annotation is that in a classical annotation the main goal is to annotate as much of the TE space as possible and as a result, most annotated TE loci comprise of truncated, overlapping or nested TEs. In most cases, such TEs are not mobile anymore and rather characterize the historic activity of TEs in the respective genome. However, when aiming to mobilize extant TEs, we need a high-quality annotation of young and functional elements within a genome of interest. LTRpred aims to assist such mobilization efforts by providing a comprehensive pipeline able to detect functional retrotransposons de novo in any genome assembly.

Webpage: https://hajkd.github.io/LTRpred/

orthologr

Description: The comparative method is a powerful approach in genomics research. Based on our knowledge about the phylogenetic relationships between species, we can study the evolution, diversification, and constraints of biological processes by comparing genomes, genes, and other genomic loci across species. The orthologr package aims to provide a framework to perform large scale comparative genomics studies with R. Orthologr aims to be as easy to use as possible – from genomic data retrieval to orthology inference and dNdS estimation between several genomes. In combination with the R package biomartr, users can retrieve genomes, proteomes, or coding sequences for several species and use them as input for orthology inference and dN/dS estimation with orthologr. The advantage of using biomartr in combination with orthologr is that users can join the new wave of research that promotes and facilitates computational reproducibility in genomics studies and solve the issue of comparing genomes with different genome assembly qualities.

Webpage: https://hajkd.github.io/orthologr/

DIAMOND2

Description: Developed by Benjamin Buchfink, DIAMOND2 is an ultra-fast sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are:

  • Pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST.
  • Frameshift alignments for long read analysis.
  • Low resource requirements and suitable for running on standard desktops or laptops.
  • Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.

Please find the publication to the initial DIAMOND version here:

Buchfink B, Xie C, Huson DH, “Fast and sensitive protein alignment using DIAMOND”, Nature Methods 12, 59-60 (2015). doi:10.1038/nmeth.3176

Webpage: https://github.com/bbuchfink/diamond