Latest in Science

Hello, welcome to your daily dose of research!
Here you'll find the most recent papers published yesterday, filtered for the topics I follow.

* The curation is done automatically by my open-source project, paper-trackr.
Want your own results? Install it via PyPI and get personalized updates!

bioinformatics, synthetic biology Nov 24, 2025 bioRxiv

General-Purpose Large Language Models, such as DeepSeek V3.2, Have Evolved Protein Design Capabilities

Li, J., Dong, X.

Abstract: General-Purpose Large Language Models (GLLMs), although primarily developed for natural language processing, are increasingly demonstrating emergent capabilities in specialized scientific domains. In this study, we explored the potential of GLLMs, specifically DeepSeek V3.2 Exp in reasoning mode, to perform practical protein engineering tasks without domain-specific biological training. Two representative design problems were addressed: Generation of amino acid sequences predicted to adopt the canonical 4-helix bundle topology, and targeted mutation design to improve protein solubility while preserving core structural integrity. Across 49 generated 4-helix bundle candidates, 40 adopted the desired geometry, with 36 achieving pLDDT scores above 70. Solubility optimization on 50 representative proteins yielded 46 mutants with an average predicted score increase of 0.178, and 29 maintained structural deviations below 3 Angstrom RMSD. These results indicate that general-purpose LLMs such as DeepSeek V3.2 can integrate sequence-structure-property relationships sufficiently to produce viable protein designs. We propose a hybrid workflow that couples GLLM-based mutation generation with established computational validation, offering an accessible route for protein and peptide engineering.

Read full paper
bioinformatics, synthetic biology Nov 24, 2025 bioRxiv

A programmable mRNA platform for miRNA detection via miRNA-mRNA2 triplex-mediated ribosomal frameshifting

Chen, Y., Zhao, W., +9 authors, Chen, G.

Abstract: Programmed -1 ribosomal frameshifting (-1 PRF) is a recoding mechanism utilized by viruses to expand their coding capacity and modulate the stoichiometric ratio of -1 frame and 0 frame translation products. The stability of mRNA secondary structure at the ribosomal entry site within the frameshifting stimulating elements (FSEs) determines the frameshifting efficiency. Here, we report the development of a programmable mRNA-based platform that detects specific mature microRNA (miRNA or miR) by converting their presence into a quantifiable protein output through miRNA-triggered -1 PRF. We designed a triplex-forming mRNA (TF-mRNA) platform to selectively trap target miRNAs through the formation of major-groove mRNA-miRNA-mRNA (miR-mRNA2) triplexes. Bio-layer interferometry and fluorescence binding studies confirmed that TF-mRNA forms stable complexes with cognate miRNAs with low nanomolar affinity and prolonged dissociation rate. Critically, the formation of miR-mRNA2 triplex robustly stimulated ribosomal frameshifting in a cell-free dual-luciferase translation system, acting as a miRNA-dependent molecular switch. The generality of this TF-mRNA platform has been verified for several disease-associated purine-rich miRNAs, and it is suitable for targeting a wide range of other purine-enriched miRNAs. This programmable TF-mRNA platform establishes a foundation for developing novel diagnostic tools and synthetic biology circuits that convert the presence of miRNA into a quantifiable protein output.

Read full paper
bioinformatics, synthetic biology Nov 24, 2025 bioRxiv

Creating an energy efficient central metabolism for boosting biosynthesis without compromising cell growth of yeast

Ni, X., Wang, H., +4 authors, Zhou, Y.

Abstract: Long-term natural evolution selects glycolysis as the major metabolic mode for rapid cell growth, which however lacks sufficient NADPH supply to dive the biosynthesis of reduced chemicals such as free fatty acids (FFAs). Engineering energy economical pathway for chemical overproduction always compromises cellular fitness due to the rigidity of cellular metabolism. Here, we successfully replaced glycolysis metabolism with an optimized pentose phosphate pathway (PPP) in an industrial yeast Ogataea polymorpha, for the first time, which enabled a higher energy generation efficiency than glycolysis and a balanced supply of ATP and NADPH. More importantly, we discovered a global carbon metabolism regulator CMR that drives metabolic flux toward glycolysis for energy generation, and its disruption relieved the tight regulation of metabolic flux distribution and significantly enhanced the efficiency of cellular energy generation, which significantly boosted the FFA production by 63% in a FFA overproducing chassis. The final engineered yeast produced FFAs at a titer of 41.7 g/L, the highest titer reported by microbial fermentation. Our work provides valuable insights into the metabolic regulation mechanisms and a feasible approach for constructing energy efficient metabolism for chemical overproduction.

Read full paper
bioinformatics, synthetic biology Nov 24, 2025 bioRxiv

Sequence-to-graph alignment based copy number calling using a network flow formulation

Magalhaes, H., Weber, J., +2 authors, Prodanov, T.

Abstract: Variation of sequence copy number (CN) between individuals can be associated with phenotypical differences. Consequently, CN calling is an important step for disease association and identification, as well as for genome assembly validation. Traditionally, CN calling is done by mapping sequencing reads to a linear reference genome and estimating the CN from the observed read depth. This approach, however, is significantly hampered by sequences and rearrangements not present in a linear reference genome; at the same time simple CN prediction for individual graph nodes does not make use of the graph topology and can lead to inconsistent results. To address these issues, we propose Floco, a method for CN calling with respect to a genome graph using a network flow formulation. Given a graph and alignments against that graph, we calculate raw CN probabilities for every graph node based on the Negative Binomial distribution and the base pair coverage across the node, and then use integer linear programming to compute the CN flow through the whole graph. We tested this approach on 15 aligned datasets, involving three different graphs, as well as HiFi and ONT sequencing reads and linear assemblies split into reads. These results demonstrate that the addition of the network flow formulation increases the accuracy of CN predictions by up to 43% when compared with read depth based estimation alone. Additionally, we observed that concordance between predictions from the three different sequence sources was able to reach 93.2%. Floco fills a gap in CN calling tools specifically designed for genome graphs.

Read full paper
bioinformatics, synthetic biology Nov 24, 2025 bioRxiv

MiRformer: a dual-transformer-encoder framework for predicting microRNA-mRNA interactions from paired sequences

Gu, J., Chen, C., Li, Y.

Abstract: MicroRNAs (miRNAs) are small non-coding RNAs that regulate genes by binding to target messenger RNAs (mRNAs), causing them to degrade or suppressing their translation. Accurate prediction of miRNA-mRNA interactions is crucial for RNA therapeutics. Existing methods rely on handcrafted features, struggle to scale to kilobase-long mRNA sequences, or lack interpretability. We introduce MiRformer, a transformer framework designed to predict not only the binary miRNA-mRNA interaction but also the start and end location of the miRNA binding site in the mRNA sequence. MiRformer employs a dual- transformer encoder architecture to learn interaction patterns directly from raw miRNA-mRNA sequence pairs via the cross-attention between the miRNA-encoder and mRNA-encoder. To scale to long mRNA sequences, we leverage sliding-window attention mechanism. MiRformer achieves state-of-the-art performance across diverse miRNA-mRNA tasks, including binding prediction, target-site localization, and cleavage-site identification from Degradome sequencing data. The learned transformer attention are highly interpretable and reveals highly contrasting signals for the miRNA seed regions in 500-nt long mRNA sequences. We used MiRformer to simultaneously predict novel binding sites and cleavage sites in 13k miRNA-mRNA pairs and observed that the two types of sites tend to be close to each other, supporting miRNA-mediated degradation mechanism.

Read full paper
bioinformatics, synthetic biology Nov 24, 2025 bioRxiv

Inferring virtual cell environments using multi-agent reinforcement learning

Kalafut, N. C., He, C., +3 authors, Wang, D.

Abstract: Single cells interact continuously to form a cell environment that drives key biological processes. Cells and cell environments are highly dynamic across time and space, fundamentally governed by molecular mechanisms, such as gene expression. Recent sequencing techniques measure single-cell-level gene expression under specific conditions, either temporally or spatially. Using these datasets, emerging works, such as virtual cells, can learn biologically useful representations of individual cells. However, these representations are typically static and overlook the underlying cell environment and its dynamics. To address this, we developed CellTRIP, a multi-agent reinforcement learning method that infers a virtual cell environment to simulate the cell dynamics and interactions underlying given single-cell data. Specifically, cells are modeled as individual agents with dynamic interactions, which can be learned through self-attention mechanisms via reinforcement learning. CellTRIP also applies novel truncated reward bootstrapping and adaptive input rescaling to stabilize training. We can in-silico manipulate any combination of cells and genes in our learned virtual cell environment, predict spatial and/or temporal cell changes, and prioritize corresponding genes at the single-cell level. We applied and benchmarked CellTRIP on various simulated and real gene expression datasets, including recapitulating cellular dynamic processes simulated by gene regulatory networks and stochastic models, imputing spatial organization of mouse cortical cells, predicting developmental gene expression changes after drug treatment in cancer cells, and spatiotemporal reconstruction of Drosophila embryonic development, demonstrating its outperformance and broad applicability. Interactive manipulation of those virtual cell environments, including in-silico perturbation, can prioritize spatial and developmental genes for single-cell-level changes, enabling the generation of new insights into cell dynamics over time and space.

Read full paper
bioinformatics, synthetic biology Nov 24, 2025 bioRxiv

Accurate Probabilistic Reconstruction of Cell Lineage Trees from SNVs and CNAs with ScisTreeCNA

Zhang, H., Wu, Y.

Abstract: Cell lineage tree is a fundamental evolutionary model for single-cell evolution. Inference of cell lineage tree from noisy single-cell DNA data has been studied actively in recent years. Existing methods for cell lineage tree inference can be classified into two categories based on the type of genetic variations they work with: single-nucleotide variants (SNVs) or copy-number aberrations (CNAs). Due to various noises and uncertainties in the data, the existing methods are not fully satisfactory, in part because they only used one type of genetic variant. Single-cell DNA sequencing data with both SNVs and CNAs are becoming available. In principle, joint inference of cell lineage trees from both SNVs and CNAs may lead to more accurate results. However, there is a lack of rigorous models and efficient algorithms for such inference. In this paper, we present a new cell lineage tree inference method, called ScisTreeCNA, that jointly infers cell lineage trees from SNVs and CNAs. A key contribution of ScisTreeCNA is a novel probabilistic model for the joint evolution of SNVs and CNAs in single cell data. Based on this model, ScisTreeCNA implemented several efficient algorithms for accelerating probabilistic inference of cell lineage tree. Experiments on both simulated and real biological data show that ScisTreeCNA consistently outperforms existing methods in the accuracy of the inferred cell lineage trees. ScisTreeCNA is available at https://github.com/haotianzh/ScisTreeCNA.

Read full paper
bioinformatics, synthetic biology Nov 24, 2025 bioRxiv

Sample-specific haplotype-resolved protein isoform characterization via long-read RNA-seq-based proteogenomics

Wissel, D., Sheynkman, G. M., Robinson, M. D.

Abstract: Protein isoform inference from bottom-up mass spectrometry (MS) relies on database search strategies that assume the reference protein database accurately reflects the full repertoire of genetic and transcriptomic states present in the sample being analyzed. Long-read RNA sequencing (lrRNA-seq) now enables simultaneous recovery of complete transcript structures and the genetic variants present on each molecule, offering a direct route to allele-specific isoforms, yet this capability has not been fully leveraged to improve MS-based proteogenomics workflows. Here, we develop an end-to-end workflow for constructing and searching haplotype-resolved, sample-specific proteomes using matched lrRNA-seq and MS data. We benchmark multiple phasing algorithms on PacBio lrRNA-seq from Genome-in-a-Bottle samples and identify methods that achieve high phasing accuracy and completeness on transcriptomic reads. Our open-source, modular Snakemake pipeline performs variant calling, read-based phasing, isoform discovery, haplotype-resolved proteome construction, MS search, and downstream annotation. To demonstrate its utility, we apply the workflow to an induced pluripotent stem cell line (WTC11) and to an osteoblast differentiation time course, showing that haplotype-resolved databases enable detection of variant and splice peptides, allele-specific protein isoforms, and linked variants not detectable with reference-only proteomes. Together, our results demonstrate that lrRNA-seq-based phasing is feasible and effective for proteogenomics and provide a practical framework for allele-resolved proteome characterization in dynamic or disease-relevant settings.

Read full paper
bioinformatics, synthetic biology Nov 24, 2025 bioRxiv

Deconvolution of Sparse-count RNA Sequencing Data for Tumor Cells Using Embedded Negative Binomial Distributions

Montierth, M., Yan, H., +15 authors, Wang, W.

Abstract: Estimating tumor-specific transcript proportions from mixed bulk samples has potential to inform novel biology. However, estimation accuracy using existing methods in sparse-count data such as microRNA-seq and spatial transcriptomics has yet to be established. We generated a mixed small RNA benchmark dataset to demonstrate analytical challenges. To resolve them, we developed DeMixNB, a semi-reference-based deconvolution model assuming a sum of negative binomial distributions. Applications to miRNA-seq from 856 patients with breast cancer and 3,755 spatial spots from lung cancer generated either clinical or mechanistic insights into tumor cell plasticity. This supports the important utility of DeMixNB to investigate cancer RNomes.

Read full paper
bioinformatics, synthetic biology Nov 24, 2025 bioRxiv

Reinforcement learning for adaptive control of phenotypically heterogeneous bacterial populations

Kratz, J., Wen, Z., +1 author, Carja, O.

Abstract: Bacterial populations display extraordinary resilience to antibiotic stress, driven by diverse physiological states that allow some cells to persist and later repopulate. This phenotypic heterogeneity, amplified by environmental fluctuations, undermines the effectiveness of conventional fixed-dose treatment regimens. To address this challenge, we introduce a reinforcement learning (RL) framework that discovers adaptive treatment strategies using only experimentally accessible, population-level measurements. The RL agent learns to infer the hidden physiological state of the population and leverages this knowledge to maintain control even under conditions not encountered during training. Moreover, when granted control over nutrient availability, an important driver of physiological change often overlooked in antibiotic treatment protocols, the agent consistently drives population extinction, surpassing adaptive protocols based solely on drug dynamics. This computational framework offers a powerful, data-driven approach for designing adaptive treatment strategies to counter the growing threat of antimicrobial resistance.

Read full paper
bioinformatics, cancer biology Nov 24, 2025 bioRxiv

TAp73 mediates anti-tumor immunity through regulation of lipid metabolism in the lung tumor microenvironment

Ackerman, H. D., Rubio, V. Y., +23 authors, Flores, E. R.

Abstract: While immunotherapy has become the standard of care for lung adenocarcinoma (LUAD) patients without actionable genomic alterations, only a subset of patients benefits from a long-lasting response to immunotherapy. Activation of p53-related signals has emerged as a potential mediator of the lung tumor microenvironment (TME). Given that mutant-p53 interacts with p73 extensively and TAp73-deficient mice develop LUAD, we engineered a mouse model with conditional deletion of TAp73 to understand the interactions of the p53 family in the TME and in metabolic pathways that impact anti-tumor immunity. We demonstrated that TAp73 exerts a tumor-suppressive role in KrasG12D-driven LUAD by regulating lipid metabolism in the TME. We identified a TAp73-driven transcriptional signature involving genes in the arachidonic acid metabolism pathway operational in tumor-associated macrophages that favors T-cell activation and thus anti-tumor immunity. Similar transcriptional changes are seen in macrophages from LUAD patients with p53 mutations and in association with response to immunotherapy.

Read full paper
bioinformatics, cancer biology Nov 24, 2025 bioRxiv

In vivo assessment of differential toxicity of cancer treatment drugs in Fanconi Anemia

Grompe, M., Dorrell, C., +2 authors, MacMillan, A.

Abstract: Fanconi Anemia (FA) is a DNA repair disorder with a very elevated risk of cancer, especially squamous cell carcinomas (SCC). Many cancer chemotherapy agents induce DNA damage, are highly toxic in FA, and cannot be safely used in this population. The potential differential toxicity to FA patients of many new drugs being explored for use in SCC is unknown. To evaluate such compounds of unknown toxicity for use in FA cancers, we developed a sensitive in vivo bone marrow repopulation competition assay in mice. We found that afatinib, alisertib and everolimus exhibited no significant differential toxicity in this system. Therefore, these drugs are candidates for chemotherapy of cancers in human FA patients. Our competitive repopulation assay provides a robust method to screen novel chemotherapy agents for their safety in FA.

Read full paper
bioinformatics, cancer biology Nov 24, 2025 bioRxiv

Multimodal Approach for Identification and Validation of Hepatocellular Carcinoma Targets for Radiotheranostics

Homan, P., Chung, J.-Y., +14 authors, Escorcia, F. E.

Abstract: Identifying tumor selective targets is critical for the development of precision diagnostic and therapeutic agents in oncology. Despite advances in precision oncology elsewhere, there are no FDA-approved hepatocellular carcinoma (HCC)-selective treatments. HCC is the most common type of liver cancer and accounts for significant morbidity and mortality worldwide. Here, we sought to integrate bulk (371 cases) and single cell RNA sequencing (scRNAseq, n=2 datasets, 34 cases, 102,956 cells) of patient samples to enrich for molecules that are overexpressed in HCC, which could serve as HCC-selective targets. To guide definitions of tumor and normal cell clusters with higher fidelity, we also imported a normal liver scRNAseq dataset. Using this integrated approach, we identified several HCC-selective plasma membrane molecules. To validate these targets, we performed immunohistochemical staining of HCC and normal tissue microarrays and confirmed HCC-selective staining of identified targets. Next, we verified the presence of these targets in several commercially available HCC cell lines by flow cytometry and western blot. Finally, we designed, engineered, and tested novel antibody-based positron emission tomography (immunoPET) agents to these targets in various murine models of liver cancer. Our findings confirm that we can leverage this multimodal approach to identify and validate of HCC-selective targets, which can be used to develop tumor-selective diagnostic and therapeutic radiopharmaceuticals, or radiotheranostics, and other precision oncology agents.

Read full paper
bioinformatics, cancer biology Nov 24, 2025 bioRxiv

Sphingosine-1-phosphate receptor modulators resensitize FLT3-ITD acute myeloid leukemia cells with NRAS mutations to FLT3 inhibitors

Baer, M. R., Chatterjee, A., +7 authors, Silvestri, G.

Abstract: FLT3 inhibitor efficacy in AML with FLT3-ITD is short-lived, frequently due to new mutations, most commonly in NRAS. Sphingosine kinase 1 (SPHK1), which phosphorylates sphingosine to generate sphingosine-1-phosphate (S1P), is upregulated and localized to the plasma membrane in RAS-mutated cells. We studied S1P and FLT3 co-targeting to overcome FLT3 inhibitor resistance in NRAS-mutated FLT3-ITD AML cells. NRAS-mutated FLT3-ITD AML cell lines and patient blasts were treated with FLT3 inhibitors and/or S1P receptor (S1PR) modulators. FLT3 inhibitor sensitivity was assessed by immunoblotting, cytotoxicity and apoptosis assays. Co-treatment was also assessed in vivo in an orthotopic mouse model. Downstream RAS and SPHK1 effectors were measured by immunoblotting and qRT-PCR. The S1PR modulators fingolimod (FTY720) and mocravimod (KRP-203) resensitized FLT3-ITD-expressing MOLM-14 and MV4-11 human AML cells with G12D, G12S, Q61K or Q61H, but not G12C, and patient blasts with G13D or G13V NRAS mutations to FLT3 inhibitors. Moreover, FTY720 co-treatment resensitized G12D NRAS-mutated M14(R)701 cells to gilteritinib in vivo. Co-treatment inactivated ERK, transcriptionally downregulated SPHK1, and inactivated downstream AKT, p70S6K and BAD, with inactivation abrogated by constitutive SPHK1 expression. The clinically applicable S1PR modulators fingolimod and mocravimod resensitize NRAS-mutated FLT3-ITD AML cells to FLT3 inhibitors, supporting potential clinical efficacy of these combinations.

Read full paper
machine learning, cancer biology Nov 24, 2025 PubMed

Prediction of graft loss in living donor liver transplantation during the early postoperative period.

Raiki Yoshimura, Naotoshi Nakamura, +6 authors, Tomoharu Yoshizumi

Abstract: Liver transplantation is almost the only way to save patients with end-stage liver disease. Particularly, living donor liver transplantation (LDLT) has gained importance in recent years thanks to the shorter waiting times and better graft quality than with deceased donor liver transplantation (DDLT). However, some patients experience graft loss due to unexpected infections, sepsis, or immune-mediated rejection of the transplanted organ. An urgent need exists to clarify which patients experience graft loss. Several models have been proposed, but most analyze the classic DDLT, and knowledge about LDLT is lacking. In this study, we retrospectively analyzed clinical data from 748 patients who underwent LDLT. By adapting machine learning methods, we predicted early graft loss (within 180 days postoperatively) with better performance than conventional models. The model enabled us to stratify a highly heterogeneous sample of patients into five groups. By focusing on survival time, we next categorized the patients into three groups with early, intermediate, and late or no graft loss. Notably, we identified the intermediate-loss group as a distinct population similar to the early-loss population but with different survival times. Additionally, by proposing a hierarchical prediction method, we developed an approach to distinguish these populations using data up to 30 days postoperatively. Our findings will enable the early identification of individuals at risk of graft loss, particularly those in the early- and intermediate-loss groups. This will allow for appropriate patient care, such as switching to DDLT, identifying other living donors for LDLT, or preparing for re-transplantation, leading to a bottom-up improvement in transplant success rates.

Read full paper
machine learning, cancer biology Nov 09, 2025 PubMed

OmniCLIC: A Unified Omics Contrastive Learning Framework for Effective Integration and Classification of Multiomics Data.

Mingzhou Zhang, Xuzeng Liu, +2 authors, Yunhe Wang

Abstract: Integrating multiomics data for cancer subtype classification remains a critical yet challenging task due to the high dimensionality, heterogeneity, and limited interpretability of omics features. To address these limitations, we propose OmniCLIC, a unified Omics Contrastive Learning and Integration Classification framework that enables end-to-end multiomics integration, feature learning, and prediction. Three key components are proposed in OmniCLIC: OmniNet, a customized MLP with feature-wise scaling, neural tangent parametrization, and regularization strategies for omics-specific representation learning; a contrastive learning module that jointly optimizes supervised contrastive and cross-entropy losses; and OCDN, a decision-level fusion module that captures interomics correlations via a cross-modal correlation tensor to enhance generalization. We evaluate OmniCLIC on four benchmark cancer data sets, where it consistently outperforms state-of-the-art methods in accuracy and robustness across both binary and multiclass multiomics data sets. Furthermore, OmniCLIC enables biologically meaningful interpretation by identifying key molecular features through its built-in scaling layers. Functional enrichment analyses on the selected features reveal subtype-specific pathways, such as epithelial morphogenesis and PI3K-Akt signaling, aligned with known cancer biology. We further extend OmniCLIC to single-cell multiomics data (RNA+ATAC, RNA+ADT), where it still outperforms existing methods, verifying the framework's generalizability across multiomics data at different scales.

Read full paper