Marjan Niazpoor
Bringing Biomedical Artificial Intelligence into Practice: Supporting Experimental Hypothesis Validation through Graph-Based Bioassay Retrieval and Large Language Models Reasoning
Marjan Niazpoor explains how suitable bioassays for the validation of structured biomedical hypotheses can be identified with the aid of Knowledge Graphs and Large Language Models, the topic of her recently submitted master's thesis.
In recent years, advances in artificial intelligence, natural language processing, and knowledge graphs have produced a flood of biomedical hypotheses. However, while hypothesis generation has advanced rapidly, the step of experimental validation remains a bottleneck. In biomedicine, validation often requires carefully choosing the right assay, yet this process is still largely manual, error-prone, and limited by the expertise of individual researchers.
Bioassays are laboratory experiments used to test how biological systems respond to different substances, such as drugs, proteins, or chemicals. They come in many forms, for example, ELISA assays to measure protein levels, firefly luciferase assays to study gene expression, and antibody-based assays to detect specific molecules. Each type is designed for a different kind of biological question. This thesis addresses the challenge of connecting computational hypotheses with these real laboratory tests. To do this, we developed a workflow that retrieves and ranks relevant bioassays from PubChem [1] that could be used to validate structured biomedical hypotheses. The workflow includes several key steps: embedding the descriptions of both assays and hypotheses, extracting biomedical entities and linking them to UMLS (Unified Medical Language System) concepts, building integrated knowledge graphs (KGs), and calculating cosine similarity to find the best assay–hypothesis matches. Finally, the top-ranked assays are sent to a large language model (LLM) for justification and classification into levels of support. Figures 1 and 2 illustrate the full workflow and provide a more detailed overview of the approach.
The workflow successfully identified bioassays relevant to each hypothesis, not only finding direct experimental matches but also uncovering indirect or adaptable assays, which reflects how validation often happens in real laboratory practice. For instance, in the hypothesis GRN → ASSOCIATION → NEOPLASMS, the workflow retrieved luciferase assays related to miRNAs (AIDs 493174, 493175, and 493179) that, with minor experimental adjustments, can be used to measure GRN expression levels, exactly as described in the literature. Another example is MPTP → INCREASES → neuron death, where the workflow retrieved AIDs 1411177, 1411176, and 61001, which are widely recognized in the literature as gold-standard experiments for validating this hypothesis.
In comparative testing, the Qwen embedding model [2] produced more realistic similarity scores than the domain-specific Simonlee model [3]. In addition, GPT outperformed other large language models in providing clear and accurate justifications, especially when guided by the curated candidate list generated by the pipeline.
Limitations remain, particularly in data quality and metadata coverage. Many PubChem assay descriptions are vague or incomplete, cell/tissue context is often missing, and widely used commercial assays are not represented. Additionally, the workflow is computationally demanding and relies on high-performance computing resources. Nonetheless, this study demonstrates that hypothesis-to-assay retrieval is feasible and can make validation more systematic and reproducible.
Overall, this thesis presents a proof of concept for bridging the gap between in-silico hypothesis generation and real-world experimental validation. As the scientific community faces an ever-growing flood of data and AI-generated hypotheses, the next step is to bring these predictions back into the laboratory, helping researchers identify the most suitable bioassays for testing whether a hypothesis truly holds. With further improvements in assay curation, BioAssay Ontology (BAO) annotations, and the integration of cellular context, this workflow could evolve into an interactive system where researchers can input hypotheses and receive ranked, well-justified assay suggestions to guide experimental validation.
Citations
[1] Kim, S., Chen, J., Cheng, T., et al. (2025). PubChem 2025 update. Nucleic Acids Res., 53(D1), D1516–D1525. https://doi.org/10.1093/nar/gkae1059
[2] Zhang, Y., Li, M., Long, D., et al. (2025). Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models. arXiv preprint arXiv:2506.05176. https://arxiv.org/abs/2506.05176
[3] Lee, S. A., Wu, A., Chiang, J. N. (2025). Clinical modernbert: An efficient and long context encoder for biomedical text. arXiv preprint arXiv:2504.03964. https://arxiv.org/abs/2504.03964