Home | Felipe Vaz Peres

I work at the intersection of machine learning and biology, developing models to improve early cancer detection.

I am particularly interested in computational approaches to cancer research, omics, non-coding RNA, and developing efficient & reproducible workflows.

RESEARCH

Decoding cancer patterns through explainable AI

Machine learning can capture the complex, non-linear relationships underlying cancer risk. But real clinical impact demands more than predictive accuracy: it demands interpretability and trust.

I am particularly interested in transparent models, where the path to a prediction matters as much as the prediction itself. This is the philosophy behind my work at Huna, developing and validating interpretable models for early cancer detection.

Our research at Huna

[2026]

prostate cancer risk stratification

A PSA density-based model for prostate cancer risk stratification. Built to reduce unnecessary MRI and biopsy referrals in resource-limited settings.

ASCO 2026 Abstract

Exploring transcriptomic diversity in sugarcane

How diverse are sugarcane transcriptomes across genotypes, and what functions remain hidden within their coding and non-coding RNAs?

Using transcriptomic data from 50 genotypes, we reconstructed the sugarcane pan-transcriptome, revealing extensive variability in protein-coding transcripts. We further expanded this landscape by integrating coding and non-coding genes into co-expression networks to infer potential functions for poorly annotated ncRNAs. This work was developed at the Computational, Evolutionary, and Systems Biology Laboratory (LabBCES).

Our research at LabBCES

"Look again at that dot. That's here. That's home. That's us." - Carl Sagan

The Pale Blue Dot, captured by Voyager 1 in 1990, showing Earth as a tiny speck in the vastness of space, a humbling reminder of our place in the cosmos.

[2022 - 2025]

sugarcane pan-RNAome

Characterization of sugarcane ncRNAs and lncRNAs, revealing variability, conservation, co-expression, and functional roles.

Thesis Code

[2019 - 2021]

sugarcane pan-transcriptome

Framework for pan-transcriptome assembly in complex polyploid crops, supporting sugarcane breeding programs.

ISMB Talk Code

Building trust through reproducible workflows

Ensuring reproducibility remains one of the major challenges in computational biology. Many published results are difficult to replicate due to incomplete documentation, non-standardized workflows, or environment dependencies.

In my work I focus on developing robust workflows that adhere to open science and FAIR principles (Findable, Accessible, Interoperable, and Reusable).

Pipelines I've built/contributed for reproducible research

[2025]

KAPT

Automated inference and annotation of the Kappaphycus alvarezii supertranscriptome.

Code

[2025]

T-M integration

Transcriptome-microbiome cross-correlation and host-microbial interaction inference.

Code

[2025]

R2C

Automated gene co-expression network construction and regulatory analysis.

Code

[2024 - 2025]

seabed symphony

Pipeline for novel Biosynthetic Gene Cluster discovery in marine sediment microbiomes.

Code

[2020 - 2021]

YAATAP

Snakemake pipeline for de novo transcriptome assembly and functional annotation.

Code

Open-source tools for science

My path into software development emerged from the biological sciences. Working with massive sequencing datasets, I realized that modern biology demands the ability to design, automate, and scale computational analyses across terabytes of information.

That experience also shaped my values: I am deeply committed to Free Software principles, believing that progress is best achieved when we have the freedom to run, copy, distribute, study, change, and improve the software that powers our research.