Felipe Vaz Peres

Research

Here you can find the main topics that have shaped my scientific work over the years. Each project represents a different stage in this journey, sometimes exploring biological questions through data, and other times developing the computational tools required to answer them.

Use the filters below to explore specific areas of interest.

Sugarcane Pan-omics

This section highlights my two contributions to sugarcane pan-omics research, conducted at the Computational, Evolutionary, and Systems Biology Laboratory. In our first study, we built the sugarcane pan-transcriptome and revealed the variability in protein-coding transcripts across 50 genotypes. However, a substantial portion of the transcriptome, the non-coding RNAs, remained unexplored.

Despite ncRNAs representing a significant fraction of the transcriptome in most organisms, studies in sugarcane remain limited. This gap raised fundamental questions: How many ncRNAs are present in the sugarcane? What is the landscape of their diversity across genotypes? What roles might they play in gene regulation, development, or stress responses?

During my master’s, I explored a multi-genotype catalog of ncRNAs, investigating their expression and co-expression networks. Together, these studies aim to fill the knowledge gaps between coding and ncRNAs in sugarcane, providing a broader view of transcript variability and potential functional roles across diverse genetic backgrounds.

"Look again at that dot. That's here. That's home. That's us." - Carl Sagan

The Pale Blue Dot, captured by Voyager 1 in 1990, showing Earth as a tiny speck in the vastness of space, a humbling reminder of our place in the cosmos.

Sugarcane Pan-RNAome

This work focuses on characterizing ncRNAs and lncRNAs in sugarcane, addressing gaps in understanding their variability, conservation, co-expression patterns, and functional roles.

Thesis Code

Sugarcane Pan-transcriptome

This work provide a robust framework for pan-transcriptome analysis in complex polyploid crops and resources to enhance sugarcane breeding programs.

ISMB Talk Code

Computational Reproducibility

Ensuring reproducibility remains one of the major challenges in computational biology and data science. Despite the growing availability of large-scale biological and experimental datasets, many published results cannot be easily replicated due to issues such as incomplete documentation, non-standardized workflows, missing metadata, or dependence on specific computing environments. Whether in the context of gene expression studies or predictive modeling, transparent and reproducible computational practices are essential for building confidence in scientific results.

In my work I focus on developing robust, efficient, and reproducible workflows that adhere to the open science and FAIR principles (Findable, Accessible, Interoperable, and Reusable).

This section highlights my contributions to reproducible computational research, showcasing the tools, pipelines, and projects I have been involved in, each developed to enhance reproducibility and transparency in large-scale biological data analysis.

KAPT

Nextflow pipeline for automated inference of the Kappaphycus alvarezii proteome through de novo transcriptome assembly and comprehensive functional annotation.

Code

T-M Integration

Nextflow pipeline for automated integration of transcriptomic and microbiome data, performing cross-correlation analysis to uncover potential host transcriptome-microbiome associations.

Code

R2C

Nextflow pipeline for scalable gene co-expression network analysis. It automates the process from data download, quantification and network construction.

Code

seabed symphony

Metagenomics pipeline for Biosynthetic Gene Cluster discovery. Designed for identification of novel BGCs in microbiome of marine sediments.

Code

YAATAP

Snakemake pipeline for de novo transcriptome assembly. It performs every step from raw RNA-seq data download to the final transcriptome assembly and quality assessment.

Code

Software Development

My path into software development began in the biological sciences. Working with the massive datasets produced by modern sequencing technologies, I quickly realized that understanding biology in the twenty-first century requires far more than laboratory skills, it demands the ability to design, automate, and scale analyses across terabytes of information. What began as a necessity soon became a passion.

I am deeply committed to the principles of Free and Open Source Software (FOSS) and enjoy contributing to and learning from open-source communities. Several of my projects and tools are freely available online, reflecting my belief that scientific progress is best achieved through transparency and collaboration.

This section highlights a selection of projects I’ve built, including tools and applications developed beyond the scope of computational biology.

ContFree-NGS

A lightweight, open-source tool designed to remove contaminant sequences from NGS datasets.

Paper Code

paper-trackr

Tired of missing out on cool papers? stay up to date with paper-trackr!

Demo Code

Hackathon

I see hackathons as the ultimate proving ground. It’s where collaborative coding meets high-level problem solving. Working under intense pressure doesn't just forge innovative solutions, it forces a rapid, real-time exchange of scientific and technical knowledge. Beyond the code, these sprints deliver the most valuable result for us: a network of brilliant people and lifelong friends who share a passion for building the future.

Below are some of my most rewarding hackathon experiences.

LBB 2025

3rd place at the "Liga Brasileira de Bioinformática 2025", the largest bioinformatics competition in Latin America, focused on solving computational biology challenges.

About Code

Mendelics 2021

3rd place at the "Mendelics Challenge". Developed a fully automated variant calling pipeline in less than 48 hours using real genomic data.

About Code

BIOHACK 2018

2nd place at the "Synthetic Biology Hackathon (BIOHACK)". Designed BIOREMEDYATOR, a bioremediation project presented at Brazil’s largest biotechnology conference.

About

Contact

Interested in collaboration or have questions about my research?
Feel free to reach out!

Email GitHub Google Scholar Curriculum Vitae LinkedIn

Data Scientist | Bioinformatician

Latest in Science

Research

Sugarcane Pan-omics

Sugarcane Pan-RNAome

Sugarcane Pan-transcriptome

Computational Reproducibility

KAPT

T-M Integration

R2C

seabed symphony

YAATAP

Software Development

ContFree-NGS

paper-trackr

Hackathon

LBB 2025

Mendelics 2021

BIOHACK 2018

Contact