Accepted to BioMed @ CVPR 2025
Authors: Akshay Murthy, Mengmeng Zhang, Aashrita Koyyalamudi, Shanmukhi Kannamangalam
Biological LLMs trained on vast genomic data can produce sequences with high similarity to harmful viruses or bacteria under carefully crafted inputs, creating dual-use risks. This paper analyzes biosafety concerns in genomic language models, examining how models can be manipulated to generate DNA sequences resembling pathogenic organisms despite safety measures. We propose mitigation strategies including rigorous safety alignment during model training, robust output filtering mechanisms, and stringent access controls. [arXiv link TBA]

