Shengyao (Arvin) Zhuang 庄胜尧
I am an Applied Scientist at Amazon’s AGI group, specializing in advancing web search technologies that power Amazon’s product ecosystem. My research focuses on information retrieval, large language model–based neural rankers, and natural language processing. Previously, I was a Postdoctoral Researcher at CSIRO’s Australian e-Health Research Centre, where I developed LLM-driven search systems for the medical domain. I earned my Ph.D. in Computer Science from the University of Queensland’s ielab, under the supervision of Professor Guido Zuccon.
Publications
2026
LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum
Published in Preprint, 2026. Full paper
2025
Tevatron 2.0: Unified document retrieval toolkit across scale, language, and modality
Published in Preprint, 2025. Full paper
Set-encoder: Permutation-invariant inter-passage attention for listwise passage re-ranking with cross-encoders
Published in Preprint, 2025. Full paper
Report from the 4th Strategic Workshop on Information Retrieval in Lorne (SWIRL 2025)
Published in Preprint, 2025. Full paper
SIGIR-AP 2025 Tutorial on Retrieval and Ranking with LLMs (R2LLMs)
Published in Preprint, 2025. Full paper
Rethinking On-policy Optimization for Query Augmentation
Published in Preprint, 2025. Full paper
Resllm: Large language models are strong resource selectors for federated search
Published in Preprint, 2025. Full paper
Rank-r1: Enhancing reasoning in llm-based document rerankers via reinforcement learning
Shengyao Zhuang, Xueguang Ma, Bevan Koopman, Jimmy Lin, Guido Zuccon
Published in arXiv preprint arXiv:2503.06034, 2025. Full paper
Rank-distillm: Closing the effectiveness gap between cross-encoders and llms for passage re-ranking
Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen
Published in Preprint, 2025. Full paper
R2LLMs: Retrieval and Ranking with LLMs
Published in Preprint, 2025. Full paper
MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed
Published in Preprint, 2025. Full paper
Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models
Published in Preprint, 2025. Full paper
LLM-VPRF: Large Language Model Based Vector Pseudo Relevance Feedback
Published in Preprint, 2025. Full paper
An investigation of prompt variations for zero-shot llm-based rankers
Shuoqi Sun, Shengyao Zhuang, Shuai Wang, Guido Zuccon
Published in Preprint, 2025. Full paper
The impact of auxiliary patient data on automated chest x-ray report generation and how to incorporate it
Published in Preprint, 2025. Full paper
Document screenshot retrievers are vulnerable to pixel poisoning attacks
Published in Preprint, 2025. Full paper
Distillation versus contrastive learning: How to train your rerankers
Published in Preprint, 2025. Full paper
2d matryoshka training for information retrieval
Published in Preprint, 2025. Full paper
Corpus subsampling: Estimating the effectiveness of neural retrieval models on large corpora
Published in Preprint, 2025. Full paper
Browsecomp-plus: A more fair and transparent evaluation benchmark of deep-research agent
Zijian Chen*, Xueguang Ma*, Shengyao Zhuang*, Ping Nie, Kai Zou, Andrew Liu, Joshua Green, Kshama Patel, Ruoxi Meng, Mingyi Su, Sahel Sharifymoghaddam, Yanxi Li, Haoran Hong, Xinyu Shi, Xuye Liu, Nandan Thakur, Crystina Zhang, Luyu Gao, Wenhu Chen, Jimmy Lin
Published in arXiv preprint arXiv:2508.06600, 2025. Full paper
Visa: Retrieval augmented generation with visual source attribution
Xueguang Ma, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Wenhu Chen, Jimmy Lin
Published in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025. Full paper
2024
Large Language Models for Stemming: Promises, Pitfalls and Failures
Shuai Wang, Shengyao Zhuang, and Guido Zuccon.
Published in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24), 2024. Short paper
Leveraging LLMs for Unsupervised Dense Retriever Ranking
Ekaterina Khramtsova, Shengyao Zhuang (equal contribution), Mahsa Baktashmotlagh, and Guido Zuccon ( Best Paper Honorable Mention Award )
Published in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24), 2024. Full paper
Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems
Shengyao Zhuang, Bevan Koopman, Xiaoran Chu, and Guido Zuccon.
Published in Proceedings of the 2st International ACM SIGIR Conference on Information Retrieval in the Asia Pacific (SIGIR-AP ’24), 2024, 2024. Full paper
Starbucks: Improved Training for 2D Matryoshka Embeddings
Shengyao Zhuang, Shuai Wang, Bevan Koopman, and Guido Zuccon.
Published in Arxiv, 2024. Full paper
Zero-shot generative large language models for systematic review screening automation
Shuai Wang, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman, Guido Zuccon
Published in Preprint, 2024. Full paper
Team ielab at trec clinical trial track 2023: Enhancing clinical trial retrieval with neural rankers and large language models
Published in Preprint, 2024. Full paper
Starbucks-v2: Improved Training for 2D Matryoshka Embeddings
Published in Preprint, 2024. Full paper
Revisiting Document Expansion and Filtering for Effective First-Stage Retrieval
Published in Preprint, 2024. Full paper
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
Shengyao Zhuang, Xueguang Ma, Bevan Koopman, Jimmy Lin, and Guido Zuccon.
Published in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. Full paper
Large language models based stemming for information retrieval: Promises, pitfalls and failures
Published in Preprint, 2024. Full paper
Embark on DenseQuest: A System for Selecting the Best Dense Retriever for a Custom Collection
Published in Preprint, 2024. Full paper
Does Vec2Text Pose a New Corpus Poisoning Threat?
Published in Preprint, 2024. Full paper
Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation
Published in Preprint, 2024. Full paper
AgAsk: an agent to help answer farmer’s questions from scientific documents
Published in Preprint, 2024. Full paper
FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation
Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang, Guido Zuccon
Published in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024. Full paper
2023
Teaching Pre-Trained Language Models to Rank Effectively, Efficiently, and Robustly
Shengyao Zhuang.
Published in The University of Queensland, 2023. Thesis
Typos-aware Bottlenecked Pre-Training for Robust Dense Retrieval
Shengyao Zhuang, Linjun Shou, Jian Pei, Ming Gong, Houxing Ren, Guido Zuccon and Daxin Jiang.
Published in Proceedings of the 1st International ACM SIGIR Conference on Information Retrieval in the Asia Pacific (SIGIR-AP ’23), 2023. Full paper
A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
Shengyao Zhuang, Honglei Zhuang, Bevan Koopman and Guido Zuccon.
Published in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24), 2023. Full paper
Selecting which Dense Retriever to use for Zero-Shot Search
Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang and Guido Zuccon.
Published in Proceedings of the 1st International ACM SIGIR Conference on Information Retrieval in the Asia Pacific (SIGIR-AP ’23), 2023. Full paper
Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval
Shengyao Zhuang, Linjun Shou, Guido Zuccon
Published in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), 2023. Short paper
Exploring the Representation Power of SPLADE Models
Joel Mackenziem, Shengyao Zhuang (equal contribution), Guido Zuccon
Published in Proceedings of the 2023 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’23), 2023. Short paper
Beyond CO2 Emissions: The Overlooked Impact of Water Consumption of Information Retrieval Models
Guido Zuccon, Harrisen Scells, Shengyao Zhuang
Published in Proceedings of the 2023 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’23), 2023. Short paper
Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
Shengyao Zhuang, Bing Liu, Bevan Koopman and Guido Zuccon.
Published in Findings of the Association for Computational Linguistics: EMNLP 2023, 2023. Short paper
Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation
Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon and Daxin Jiang.
Published in The First Workshop on Generative Information Retrieval at SIGIR2023, 2023. Full paper
2022
To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers
Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon
Published in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), 2022. Short paper
Reduce, Reuse, Recycle: Green Information Retrieval Research
Harry Scells, Shengyao Zhuang, Guido Zuccon ( Best Paper Honorable Mention Award )
Published in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), 2022. Perspective paper
Implicit Feedback for Dense Passage Retrieval: A Counterfactual Approach
Shengyao Zhuang, Hang Li and Guido Zuccon
Published in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), 2022. Full paper
CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos
Shengyao Zhuang, Guido Zuccon
Published in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), 2022. Full paper
Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training
Shengyao Zhuang, Guido Zuccon
Published in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), 2022. Demo paper
Reinforcement Online Learning to Rank with Unbiased Reward Shaping
Shengyao Zhuang, Zhihao Qiao, Guido Zuccon
Published in Information Retrieval Journal (IRJ), 2022. Journal paper
Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study
Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon
Published in Proceedings of the 44th European Conference on Information Retrieval (ECIR ’22), 2022. Reproducibility paper
Pseudo-Relevance Feedback with Dense Retrievers in Pyserini
Hang Li, Shengyao Zhuang, Xueguang Ma, Jimmy Lin, Guido Zuccon
Published in Proceedings of the 26th Australasian Document Computing Symposium (ADCS ’22), 2022. Demo paper
Robustness of Neural Rankers to Typos: A Comparative Study
Shengyao Zhuang, Xinyu Mao, Guido Zuccon ( Best Paper Award )
Published in Proceedings of the 26th Australasian Document Computing Symposium (ADCS ’22), 2022. Short paper
Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls
Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon
Published in Transactions on Information Systems (TOIS), 2022. Journal paper
2021
Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion
Shengyao Zhuang, Guido Zuccon
Published in arxiv preprint, 2021. Full paper
IELAB at TREC Deep Learning Track 2021
Published in Preprint, 2021. Full paper
TILDE: Term Independent Likelihood moDEl for Passage Re-ranking
Shengyao Zhuang, Guido Zuccon
Published in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21), 2021. Full paper
How do Online Learning to Rank Methods Adapt to Changes of Intent?
Shengyao Zhuang, Guido Zuccon
Published in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21), 2021. Full paper
BERT-based Dense Retrievers Require Interpolation with BM25 for Effective Passage Retrieval
Shuai Wang, Shengyao Zhuang, Guido Zuccon
Published in Proceedings of the 2021 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’21), 2021. Full paper
Effective and Privacy-preserving Federated Online Learning to Rank
Shuyi Wang, Bing Liu, Shengyao Zhuang, Guido Zuccon ( Best Student Paper Award )
Published in Proceedings of the 2021 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’21), 2021. Full paper
Dealing with Typos for BERT-based Passage Retrieval and Ranking
Shengyao Zhuang, Guido Zuccon
Published in In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. Short paper
Federated Online Learning to Rank with Evolution Strategies: A Reproducibility Study
Shuyi Wang, Shengyao Zhuang, Guido Zuccon
Published in Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, 2021. Reproducibility paper
Deep Query Likelihood Model for Information Retrieval
Shengyao Zhuang, Hang Li, Guido Zuccon
Published in Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, 2021. Short paper
2020
IELAB for TREC Conversational Assistance Track (CAsT) 2020
Published in Preprint, 2020. Full paper
Counterfactual Online Learning to Rank
Shengyao Zhuang, Guido Zuccon
Published in Advances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, 2020. Full paper
