Engineering high-titer lentiviral vectors for robust expression of RNA-based gene circuits
Abstract: Lentiviral vectors enable efficient delivery of genetic cargoes for gene and cell therapies. With their ~10-kb packaging limit, lentiviral vectors can encode multiple transcription units, supporting delivery of compact gene circuits. RNA-based devices offer highly compact control including ligand-responsive induction and closed-loop regulation. However, RNA devices such as ribozymes and splicing switches may interfere with vector production via activity on the single-stranded RNA genome. Here, we examine the impact of gene syntax and genetic parts to define design strategies for two-gene vectors encoding RNA devices. We find that titer decreases with genetic parts that interfere with transcription or processing of the viral transcript during production. Compared to initial vectors, our best-performing design boosts titer more than 30-fold, enabling fine-scale tuning of expression to optimize cell-fate conversion within a non-monotonic landscape. Together, this work illuminates principles for constructing two-gene lentiviral vectors with both high titer and robust expression, enhancing efficacy for downstream applications.
Read full paperBayesian-Steered Structure Prediction of Mechanical Biomolecules Using Twisted Diffusion
Abstract: Deep learning approaches have revolutionized protein structure prediction. These tools are trained using experimental data and recapitulate reported conformations, but there is great interest in predicting conformations that may be functionally relevant although experimentally underrepresented. Since many modern structure prediction tools use generative artificial intelligence diffusion models, we reframe the search for alternative molecular conformations as that of sampling from a diffusion distribution conditioned using any arbitrary Bayesian likelihood. We implement a twisted diffusion sampler in Boltz-2 to sample this conditioned distribution and demonstrate the utility of this approach, which does not require any additional training of the neural network, by implementing a diffusion analog of steered molecular dynamics simulations applied to mechanical systems. We can reproduce predicted stretched states of fragments of DNA, the muscle protein titin, and the inner-ear protocadherin-15 protein, as well as open states of the MscL ion channel consistent with experimental results. We expect that steered structure predictions will help sample underrepresented and non-equilibrium conformations for many macromolecular systems.
Read full paperEngineering orthogonal quorum sensing circuits using LuxR-type systems in yeast consortia
Abstract: Engineered microbial communities hold significant biotechnological potential because their collective metabolism can produce functions beyond those achievable by individual strains. However, multicellular synthetic gene circuits require orthogonal communication systems that enable precise, programmable signaling between cells. Quorum sensing (QS), where cells both produce and detect small diffusible signal molecules, offers a natural framework for such intercellular communication. However, the construction of complex multicellular circuits for applications such as biobased production is currently hampered by the limited number of orthogonal QS channels available in yeast. Here, we expand the QS toolkit in Saccharomyces cerevisiae by characterizing four LuxR-type biosensors based on EsaR, LasR, TraR and RpaR, alongside the previously established LuxR biosensor. We functionally expressed acyl-CoA-dependent HSL synthases in yeast, producing a diverse range of aliphatic and aromatic HSL signals. LuxR and RpaR, were compatible with in vivo ligand production and established as orthogonal QS signaling pair with synthases MesI and RpaI, respectively. Co-culture experiments demonstrated QS-dependent intercellular signaling, with 3.9-fold and 6.4-fold induction relative to monocultures. Together, these results establish a modular and extensible platform for orthogonal intercellular communication in yeast, enabling the construction of multicellular synthetic gene circuits.
Read full paperA Comprehensive Benchmarking of Spatial Deconvolution andDomain Detection Methods across Diverse Tissues and SpatialTranscriptomic Technologies
Abstract: Spatial transcriptomic technologies enable high-resolution characterization of gene expression patterns and reconstruction of cellular architecture within tissue contexts. Two key computational problems have emerged for analyzing these datasets: spatial deconvolution, for disentangling cell-type compositions at spatial locations, and spatial domain detection, for identifying spatially coherent regions within a tissue section. Although numerous methods have been developed for each task, a comprehensive and unified benchmarking study spanning diverse tissue types, spatial resolutions, and technological platforms remains lacking, hindering informed method selection by end users and impeding future methodological advancements. Here, we present spDDB (https://github.com/Zafar-Lab/spDDB), a comprehensive benchmarking framework for spatial deconvolution and domain detection methods across a large and diverse collection of datasets spanning multiple tissues, technologies, and biological conditions. We evaluated 21 deconvolution methods, including seven recently-developed approaches, across 37 datasets curated from brain, cancer, and organ tissues encompassing four distinct technologies. To enable rigorous evaluation, we introduced SynthST, a deep graph attention autoencoder-based simulator that generates realistic spatial cell-type distributions from spatial transcriptomic data, and employed a suite of spatial bivariate metrics including a novel bivariate Geary's C metric, alongside rare cell-type, and cell-shape characterization metrics, for multidimensional performance assessment. While Cell2location, RCTD and SONAR emerged as top-performing methods for spatial deconvolution across tissue types, deconvolution performance varied substantially based on tissue architecture, spatial technology, dataset scale, and cell type diversity. For domain detection, we benchmarked 18 methods across 36 datasets spanning six spatial technologies, identifying PROST, BASS, and SpaceFlow as the leading approaches, while revealing notable limitations of existing methods in handling large-scale datasets. Finally, we provide practical guidelines to assist end users in selecting optimal methods for both tasks across diverse experimental settings.
Read full paperAn improved generic schema for high fidelity data linkage and sample tracing across complex multi-assay medical entomology studies
Abstract: Evidence-based decision making on malaria vector control strategies increasingly rely on triangulation of data which requires informatics systems that can integrate data from complex, multi-stage studies involving mosquitoes. This manuscript describes a performance evaluation of an extended version of the generic schema underpinning the VBDs360 platform, specifically improved to accommodate multiple distinct entomological assays spanning the field, insectary, and laboratory. The utility of this extension, with respect to high-fidelity data linkage and robust sample traceability across complex entomological workflows, was evaluated through a case study conducted in southern Tanzania. Wild female mosquitoes were collected from 40 locations across more than 4,000 square km and then reared through multiple generations in an insectary before derived iso-female lineages were tested for phenotypic susceptibility to a pyrethroid insecticide. Such multi-generational lineages (F0 to Fn; where n is greater than or equal to 2) were propagated to prevent non-heritable maternal effects on phenotype and produce enough progeny for standard WHO susceptibility assays. All samples were subsequently archived in a molecular laboratory, where all F0 specimens were tested for sibling species identity. A paper-based implementation of the extended schema enabled successful integration of 77,017 lines of data distributed across 6 different tables that spanned 3 distinct field, insectary, and laboratory workflows, implemented by three different teams working in different locations. At each step, fully independent and redundant primary and secondary keys enabled high fidelity error correction and sample tracing. Consistently perfect linkage between assay design and sample sorting data was achieved for F0 wild-caught adults, with 100% of 66,108 record successfully linked between field capture and morphological categorization. This complete traceability extended to the propagation of derived Fn lineages, with all 100 and 243 records from 9 adult-derived and 13 larval-derived lineages, respectively, correctly linked. Insecticide susceptibility phenotype further confirmed 100% linkage for 5,654 records between exposure history and recorded mortality outcome data in the insectary. Although such cross-cleaned linkages to sample analysis and storage data recorded by the laboratory team were not entirely perfect and could be improved, they were nevertheless of very high fidelity (97.3% (1967/2,022) for F0 samples and 99.3% (437/440) for Fn samples). Overall, this pilot application of the extended generic schema ensured robust data provenance and minimized transcription errors in this complex study distributed across multiple teams and locations. These findings demonstrate how this generic informatics framework may be scaled and adapted to support data integrity across diverse, large-scale, multi-team entomological research workflows.
Read full paperPhylogenomic coupling of F1 chemosensory and archaellum systems across archaea and monoderm bacteria
Abstract: Archaellum-associated motility has been viewed as solely archaeal, yet new findings in Chloroflexota prompt a broader perspective. By analysing a curated ~22,000 NCBI reference genomes alongside 2,397 archaeal and 226 archaellum-encoding Chloroflexota genomes, this study systematically characterises the co-distribution of archaellum loci with chemosensory system (CSS) classes. Maximum-likelihood phylogeny of 3,727 F1-type CheA proteins reveals three major clades, with Clade 1 comprising ~80% monoderm representation, uniting archaeal and monoderm bacterial lineages in a shared evolutionary grouping. Overall, this work shows that not only archaeal-type motility, but also F1-CSS based sensing system, might have been gained from Archaea to Chloroflexota via horizontal gene transfer and both systems shared an evolutionary trajectory altogether.
Read full paperGatorDuo: Global-Consistency Dual-Graph Refinement With Pseudo-Label Agreement for Spatial Transcriptomics
Abstract: Spatial transcriptomics (ST) measures gene expression together with spatial coordinates, enabling spatial domain identification of coherent tissue regions. Many recent approaches rely on graph-based modeling to combine spatial neighborhoods and transcriptomic (gene-expression) similarity, yet neighborhood construction is often unreliable under sparsity and technical noise. As a result, spurious cross-domain shortcut edges can persist in static graphs and propagate misleading signals during message passing, ultimately blurring domain boundaries and weakening cluster separability. In this paper, we propose GatorDuo, a topology-aware dual-graph contrastive self-supervised framework for robust spatial domain identification that couples gene-expression similarity with spatial proximity through complementary neighborhood graphs. GatorDuo introduces global-consistency-based graph refinement that uses a pseudo-label agreement mask to suppress cross-domain shortcut edges in both views, thus stabilizing neighborhood topology for representation learning. To avoid manual tuning of domain resolution, GatorDuo further employs a contextual bandit reinforcement-learning strategy to adaptively select the clustering granularity (the number of clusters) used for refinement. The refined view-specific embeddings are integrated via a hybrid-routing Mixture-of-Experts (MoE) module to generate a unified embedding, optimized with contrastive objectives augmented by an MoE-alignment term. Across eight public benchmarks spanning sequencing- and imaging-based ST at spot and single-cell resolution, and compared with ten representative baselines, GatorDuo consistently delivers strong and robust spatial domain identification performance across multiple clustering metrics, while yielding informative unified embeddings that can support downstream biological analyses.
Read full paperDisease-guided functional gene mapping across species reveals translational correspondences beyond sequence orthology
Abstract: Selecting the correct mouse gene to model a human disease phenotype is critical for translational research, yet sequence-based orthology can fail when genes have been lost, duplicated, or functionally rewired between species. Here we present BRIDGE (Biological Rank Integration for Disease Gene Equivalence), a sequence-free framework that identifies functional mouse equivalents of human disease genes. BRIDGE integrates 3.37 million disease-gene associations, biological pathways, and Gene Ontology annotations into a unified heterogeneous graph with 94,897 nodes and approximately 8.3 million edges. The graph is encoded by a heterogeneous graph transformer and combined with fused Gromov-Wasserstein alignment and multi-strategy reciprocal rank fusion. On two sequence-independent benchmarks, BRIDGE achieves Recall@5 of 61.8-66.7%, compared with 0.0-20.1% for Ensembl Compara. We validate BRIDGE through case studies including neutrophil pathway rewiring (CXCL8 to Cxcl1/2/5), acute-phase divergence (CRP to Apcs), and immune checkpoint substitution (LILRB2 to Pirb), and demonstrate complementarity with sequence methods in drug-translation analysis. Prospective validation of 30 novel predictions against three independent data modalities, including tissue expression, cell-type expression, and phenotype concordance, shows that BRIDGE picks are favored in 64 of 65 orthogonal tests (sign test P = 3.6 x 10^-10) and significantly outperform tested baselines including Ensembl Compara, BLAST RBH, and ESM-2. BRIDGE provides a benchmarked framework for functional cross-species gene mapping in disease-model design.
Read full paperBiLSTM-Powered Bilinear Attention for Protein-Ligand Prediction
Abstract: Rapid and accurate prediction of protein-ligand bindings is essential for drug discovery. While generative AI has driven rapid advancements in structure-based approaches, sequence-based methods remain significantly faster and more cost-effective. Here, we present a weakly supervised deep learning framework integrating graph convolutional networks (GCN) for molecular encoding and bidirectional long short-term memory (BiLSTM) for protein modeling. The latter represents long-range dependencies better than the widely used convolutional neural network (CNN). Leveraging a bilinear attention network (BAN), this model learns protein-ligand pairwise interactions without requiring three-dimensional structural supervision. By using the publicly available BindingDB dataset, the model was trained, solely on affinity labels, and successfully classified binder and non-binders with AUROC of 0.96 and an AUPRC of 0.95. The model generates interpretable attention maps that serve as a "GPS" to locate binding sites. Remarkably, despite the lack of structural training data, it can pinpoint key contact residues confirmed by crystal structures. Our method could function as a scalable filter for giga-scale libraries, allowing rapid screening of drug candidates with direct structural insights into the protein-ligand interface.
Read full paperSystematic Regional Bias is Widespread in ChIP-seq
Abstract: Robust and reproducible results are essential for confident scientific analysis. We demonstrate that transcription factor (TF) Chromatin Immunoprecipitation coupled with sequencing (ChIP-seq) suffers from systematic bias that may threaten its reproducibility: 80% of 200+ condition-matched, dual-replicate experiments in ENCODE contain genomic regions of systematic bias. We observe this regional bias even between replicates produced within the same experiment, resulting in thousands of unreplicated peaks, which often contain valuable biological data. We provide evidence that regional bias may lead to qualitative differences in TF biology inferred by different experiments; we discovered eight TFs with binding activity in compact chromatin that was identified by one experiment, yet systematically absent from others. To mitigate the effects of bias, we derive simple but effective metrics to quantify the quality of data within biased regions and demonstrate that they can be used for the robust integration of data from multiple experiments.
Read full paperEndocrine therapy-specific lineage and partial epithelial-mesenchymal reprogramming defines divergent resistant cell-states in ER+ breast cancer
Abstract: Acquired resistance to endocrine therapy remains a primary obstacle in the clinical management of estrogen receptor-positive (ER+) breast cancer. While resistance is frequently accompanied by transcriptional rewiring and lineage plasticity, how specific pharmacological modalities dictate divergent resistance trajectories remains poorly understood. Here, we integrate multi-omic profiling, spanning bulk and single-cell transcriptome, chromatin architecture (Hi-C), and the cistrome, to systematically compare the mechanisms involved in adaptive resistance to selective estrogen receptor modulators (SERMs, e.g., tamoxifen) and degraders (SERDs, e.g., fulvestrant), and the mechanism driven by constitutive ESR1 mutation, to characterize how mode of ER perturbation influences lineage identity and epithelial-mesenchymal state. We found that tamoxifen resistant (TamR) cells occupy a distinct transcriptional state characterized by coordinated luminal erosion, partial basal lineage activation, and stabilization of a partial epithelial-mesenchymal (pEMT) program. In contrast, fulvestrant resistant (FulR) cells primarily suppress ER signaling without extensive lineage reprogramming. Finally, ESR1 mutant cells recapitulate ligand-driven ER hyperactivation with limited engagement of mesenchymal and basal gene expression programs. Chromatin profiling further revealed that SERM resistance is accompanied by higher-order genome reorganization, including A-to-B compartment switching at luminal regulators such as GATA3 and ESR1, redistribution of ER and FOXA1 binding, and consequent activation of a pEMT program. Furthermore, we show that SERM-induced reprogramming is accompanied by a distinct mode of immune evasion where the reprogrammed cells do not engage classical T-cell exhaustion programs but instead exhibit coordinated loss of major histocompatibility complex (MHC) class I antigen presentation and establishment of a pro-tumorigenic signaling that strongly predicts adverse survival outcomes in patient cohorts. Together, these findings indicate that endocrine resistance does not converge on a single molecular endpoint but instead reflects drug-specific adaptive states defined by ER signaling context, lineage identity, and chromatin architecture. Our study establishes the basal-pEMT axis as a coordinated, epigenetically encoded module of SERM-induced plasticity and reframes endocrine resistance as a multidimensional evolutionary process shaped by therapeutic mechanisms of action.
Read full paperSingle-cell and deep learning identify hypoxia-responsive lncRNAs predicting outcomes in colorectal cancer.
Abstract: Emerging evidence highlights hypoxia-responsive long non-coding RNAs (lncRNAs) as potential modulators in tumor biology. In this study, we explored the significance of a hypoxia-responsive lncRNA molecular signature (HRLPMS) and the therapeutic implications of hypoxia-responsive lncRNAs in colorectal cancer (CRC). To assess the significance of HRLPMS, we integrated bulk transcriptomic and proteomic data, single-cell RNA-seq (scRNA-seq), spatial transcriptomics (ST) data, therapy-specific clinical cohorts, and our in-house data. We further evaluated the TME characteristics, somatic variations, drug sensitivity, and applied multiple machine learning (ML) and deep learning (DL) algorithms to validate the prognostic power of HRLPMS. Pan-cancer analysis revealed that HRLPMS functions as a risk factor across most cancer types. In CRC, HRLPMS was associated with chromosomal instability, adverse pathological characteristics, and poor survival outcomes, as confirmed by Cox, ML, and DL models. This signature was notably enriched in immune and stromal cell populations, such as fibroblasts. Distinct patterns of somatic variation were observed between the high- and low-HRLPMS groups. Cell-state analysis indicated that low-HRLPMS cells, characterized by immune and inflammatory features, predominated during early-to-middle pseudotime, whereas high-HRLPMS cells emerged later, exhibiting angiogenesis and extracellular matrix (ECM) remodeling characteristics. Further analysis demonstrated that APP-CD74 interactions may mediate immunosuppression and tumor progression. Furthermore, high-HRLPMS patients showed evidence of benefit from fluorouracil plus bevacizumab and a trend toward improved response to preoperative chemoradiotherapy. We found that HRLPMS represents a promising prognostic tool for CRC, with the potential to refine therapeutic strategies and enhance patient outcomes through tailored treatment approaches.
Read full paperPerformance of large language models (GPT-5.2, Gemini 3 Pro, Claude Sonnet 4.6 and Grok 4.1) on the Fellowship of The Royal College of Surgeons Urology Part A examination.
Abstract: To assess and compare the performance of four contemporary frontier large language models (LLMs)-GPT-5.2 (OpenAI), Gemini 3 Pro (Google DeepMind), Claude Sonnet 4.6 (Anthropic) and Grok 4.1 (xAI)-on a simulated Fellowship of The Royal College of Surgeons Urology (FRCS(Urol)) Part A examination, evaluating overall accuracy, subspecialty-level performance, output consistency and response time.
Read full paperDecoding (digital) histopathology: The building blocks for computational researchers.
Abstract: Computational Pathology is a novel discipline at the intersection of pathology and computer science, driven by the recent advances in machine learning and image analysis. Nevertheless, combining the insights from both disciplines remains challenging, particularly due to differences in technical background and language between pathologists and engineers. It is acknowledged that literature translating fundamental pathology concepts for computer scientists remains limited, which further complicates the understanding of the field, especially for those entering the field. In this context, and aligned with the mission of the European Society of Digital and Integrative Pathology (ESDIP) to promote education and interdisciplinary collaboration in digital and computational pathology, this work aims to provide a comprehensive yet accessible guide to pathology for computational scientists and other researchers. Herein, we present an overview of the pathology laboratory workflow, digital pathology and whole-slide imaging, diagnostic fundamentals of neoplastic and nonneoplastic diseases, and current applications of AI in pathology. This guide is designed as a practical reference and educational resource to support computer scientists new to the field and to promote more effective collaboration between medical and computational communities.
Read full paperMultiSP deciphers tissue structure and multicellular communication from spatial multi-omics data.
Abstract: Recent breakthroughs in spatial multi-omics enable simultaneous profiling of different modalities while preserving tissue architecture, providing unprecedented opportunities to explore tissue complexity. However, due to the sparse and noisy nature of the data, interpreting these complex tissue structures and cellular communication remains challenging. We present MultiSP, a deep learning framework that enhances data representation through efficient spatial and feature similarity fusion, modality-specific probabilistic generative modeling, and cross-modality adversarial learning. Applied to various spatial multi-omics datasets, it outperforms existing methods in capturing biologically interpretable spatial domains. MultiSP also denoises spatial data, uncovers modality-specific spatial variations, and reveals gene regulation mechanisms. In the tumor microenvironment, it unravels fine-resolution cellular distribution maps, such as spatially neighboring macrophage-enriched sub-regions with distinct prognosis outcomes. Additionally, MultiSP facilitates the inference of spatially multimodal cell-cell communication. Together, MultiSP serves as a powerful framework for uncovering spatially multimodal heterogeneity and communication by integrating complementary information from multiple modalities.
Read full paper