Shengyao (Arvin) Zhuang 庄胜尧
I am an Applied Scientist at Amazon’s AGI group, specializing in advancing web search technologies that power Amazon’s product ecosystem. My research focuses on information retrieval, large language model–based neural rankers, and natural language processing. Previously, I was a Postdoctoral Researcher at CSIRO’s Australian e-Health Research Centre, where I developed LLM-driven search systems for the medical domain. I earned my Ph.D. in Computer Science from the University of Queensland’s ielab, under the supervision of Professor Guido Zuccon.
Publications
2026
- LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum
arXiv
Zhichao Xu, Shengyao Zhuang, Crystina Zhang, Xueguang Ma, Yijun Tian, Maitrey Mehta, Jimmy Lin, Vivek Srikumar
2025
- 2d matryoshka training for information retrieval
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shuai Wang, Shengyao Zhuang, B. Koopman, G. Zuccon - An investigation of prompt variations for zero-shot llm-based rankers
European Conference on Information Retrieval (ECIR)
Shuoqi Sun, Shengyao Zhuang, Shuai Wang, Guido Zuccon - Browsecomp-plus: A more fair and transparent evaluation benchmark of deep-research agent
arXiv
Zijian Chen*, Xueguang Ma*, Shengyao Zhuang*, Ping Nie, Kai Zou, Andrew Liu, Joshua Green, Kshama Patel, Ruoxi Meng, Mingyi Su, Sahel Sharifymoghaddam, Yanxi Li, Haoran Hong, Xinyu Shi, Xuye Liu, Nandan Thakur, Crystina Zhang, Luyu Gao, Wenhu Chen, Jimmy Lin - Corpus subsampling: Estimating the effectiveness of neural retrieval models on large corpora
European Conference on Information Retrieval (ECIR)
Maik Fröbe, Andrew Parry, Harrisen Scells, Shuai Wang, Shengyao Zhuang, G. Zuccon, Martin Potthast, Matthias Hagen - Distillation versus contrastive learning: How to train your rerankers
European Conference on Information Retrieval (ECIR)
Zhichao Xu, Zhiqi Huang, Shengyao Zhuang, Vivek Srikumar - Document screenshot retrievers are vulnerable to pixel poisoning attacks
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shengyao Zhuang, Ekaterina Khramtsova, Xueguang Ma, B. Koopman, Jimmy Lin, G. Zuccon - LLM-VPRF: Large Language Model Based Vector Pseudo Relevance Feedback
arXiv
Hang Li, Shengyao Zhuang, B. Koopman, G. Zuccon - Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models
arXiv
Jieran Li, Xiuyuan Hu, Yang Zhao, Shengyao Zhuang, Hao Zhang - MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed
arXiv
J. Zhan, Crystina Zhang, Shengyao Zhuang, Xueguang Ma, Jimmy Lin - R2LLMs: Retrieval and Ranking with LLMs
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
G. Zuccon, Shengyao Zhuang, Xueguang Ma - Rank-distillm: Closing the effectiveness gap between cross-encoders and llms for passage re-ranking
European Conference on Information Retrieval (ECIR)
Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen - Rank-r1: Enhancing reasoning in llm-based document rerankers via reinforcement learning
arXiv
Shengyao Zhuang, Xueguang Ma, Bevan Koopman, Jimmy Lin, Guido Zuccon - Report from the 4th Strategic Workshop on Information Retrieval in Lorne (SWIRL 2025)
SIGIR Forum
Johanne R Trippas, J. Culpepper, Mohammad Aliannejadi, James Allan, Enrique Amigó, Jaime Arguello, Leif Azzopardi, P. Bailey, Jamie Callan, Rob Capra, Nick Craswell, Bruce Croft, Jeff Dalton, Gianluca Demartini, Laura Dietz, Zhicheng Dou, C. Eickhoff, Michael Ekstrand, Nicola Ferro, Norbert Fuhr, Dorota Glowacka, Faegheh Hasibi, Danula Hettiachchi, Rosie Jones, J. Kamps, Noriko Kando, Sarvnaz Karimi, Makoto P. Kato, B. Koopman, Yiqun Liu, Chenglong Ma, Joel Mackenzie, Maria Maistro, Jiaxin Mao, Dana McKay, Bhaskar Mitra, Stefano Mizzaro, Alistair Moffat, Josiane Mothe, I. Ounis, Lida Rashidi, Yongli Ren, Mark Sanderson, Rodrygo L. T. Santos, Falk Scholer, Chirag A Shah, Laurianne Sitbon, Ian Soboroff, Damiano Spina, Paul Thomas, Julián Urbano, Arjen P. de Vries, Ryen W. White, Abby Yuan, Hamed Zamani, Oleg Zendel, Min Zhang, Shengyao Zhuang, Justin Zobel, Guido Zuccon - Resllm: Large language models are strong resource selectors for federated search
The Web Conference (WWW)
Shuai Wang, Shengyao Zhuang, B. Koopman, G. Zuccon - Rethinking On-policy Optimization for Query Augmentation
arXiv
Zhichao Xu, Shengyao Zhuang, Xueguang Ma, Bingsen Chen, Yijun Tian, Fengran Mo, Jie Cao, Vivek Srikumar - SIGIR-AP 2025 Tutorial on Retrieval and Ranking with LLMs (R2LLMs)
SIGIR-AP
G. Zuccon, Shengyao Zhuang, Xueguang Ma, B. Koopman - Set-encoder: Permutation-invariant inter-passage attention for listwise passage re-ranking with cross-encoders
European Conference on Information Retrieval (ECIR)
Ferdinand Schlatt, Maik Frobe, Harrisen Scells, Shengyao Zhuang, B. Koopman, G. Zuccon, Benno Stein, Martin Potthast, Matthias Hagen - Tevatron 2.0: Unified document retrieval toolkit across scale, language, and modality
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Xueguang Ma, Luyu Gao, Shengyao Zhuang, J. Zhan, Jamie Callan, Jimmy Lin - The impact of auxiliary patient data on automated chest x-ray report generation and how to incorporate it
Annual Meeting of the Association for Computational Linguistics (ACL)
Aaron Nicolson, Shengyao Zhuang, Jason Dowling, Bevan Koopman - Visa: Retrieval augmented generation with visual source attribution
Annual Meeting of the Association for Computational Linguistics (ACL)
Xueguang Ma, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Wenhu Chen, Jimmy Lin
2024
- AgAsk: an agent to help answer farmer’s questions from scientific documents
International Journal on Digital Libraries (IJDL)
B. Koopman, Ahmed Mourad, Hang Li, A. Vegt, Shengyao Zhuang, Simon Gibson, Y. Dang, David Lawrence, G. Zuccon - Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Xinyu Mao, Shengyao Zhuang, B. Koopman, G. Zuccon - Does Vec2Text Pose a New Corpus Poisoning Threat?
arXiv
Shengyao Zhuang, B. Koopman, G. Zuccon - Embark on DenseQuest: A System for Selecting the Best Dense Retriever for a Custom Collection
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Ekaterina Khramtsova, Teerapong Leelanupab, Shengyao Zhuang, Mahsa Baktashmotlagh, G. Zuccon - FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang, Guido Zuccon - Large Language Models for Stemming: Promises, Pitfalls and Failures
arXiv
Shuai Wang, Shengyao Zhuang, and Guido Zuccon. - Large language models based stemming for information retrieval: Promises, pitfalls and failures
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shuai Wang, Shengyao Zhuang, G. Zuccon - Leveraging LLMs for Unsupervised Dense Retriever Ranking
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Ekaterina Khramtsova, Shengyao Zhuang (equal contribution), Mahsa Baktashmotlagh, and Guido Zuccon ( Best Paper Honorable Mention Award ) - PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Shengyao Zhuang, Xueguang Ma, Bevan Koopman, Jimmy Lin, and Guido Zuccon. - Revisiting Document Expansion and Filtering for Effective First-Stage Retrieval
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Watheq Mansour, Shengyao Zhuang, G. Zuccon, Joel Mackenzie - Starbucks-v2: Improved Training for 2D Matryoshka Embeddings
arXiv
Shengyao Zhuang, Shuai Wang, B. Koopman, G. Zuccon - Starbucks: Improved Training for 2D Matryoshka Embeddings
arXiv
Shengyao Zhuang, Shuai Wang, Bevan Koopman, and Guido Zuccon. - Team ielab at trec clinical trial track 2023: Enhancing clinical trial retrieval with neural rankers and large language models
Text Retrieval Conference (TREC)
Shengyao Zhuang, B. Koopman, G. Zuccon - Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems
SIGIR-AP
Shengyao Zhuang, Bevan Koopman, Xiaoran Chu, and Guido Zuccon. - Zero-shot generative large language models for systematic review screening automation
European Conference on Information Retrieval (ECIR)
Shuai Wang, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman, Guido Zuccon
2023
- A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shengyao Zhuang, Honglei Zhuang, Bevan Koopman and Guido Zuccon. - Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shengyao Zhuang, Linjun Shou, Guido Zuccon - Beyond CO2 Emissions: The Overlooked Impact of Water Consumption of Information Retrieval Models
International Conference on the Theory of Information Retrieval (ICTIR)
Guido Zuccon, Harrisen Scells, Shengyao Zhuang - Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation
arXiv
Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon and Daxin Jiang. - Exploring the Representation Power of SPLADE Models
International Conference on the Theory of Information Retrieval (ICTIR)
Joel Mackenziem, Shengyao Zhuang (equal contribution), Guido Zuccon - Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Shengyao Zhuang, Bing Liu, Bevan Koopman and Guido Zuccon. - Selecting which Dense Retriever to use for Zero-Shot Search
SIGIR-AP
Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang and Guido Zuccon. - Teaching Pre-Trained Language Models to Rank Effectively, Efficiently, and Robustly
The University of Queensland (PhD Thesis)
Shengyao Zhuang. - Typos-aware Bottlenecked Pre-Training for Robust Dense Retrieval
SIGIR-AP
Shengyao Zhuang, Linjun Shou, Jian Pei, Ming Gong, Houxing Ren, Guido Zuccon and Daxin Jiang.
2022
- Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shengyao Zhuang, Guido Zuccon - CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shengyao Zhuang, Guido Zuccon - Implicit Feedback for Dense Passage Retrieval: A Counterfactual Approach
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shengyao Zhuang, Hang Li and Guido Zuccon - Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study
European Conference on Information Retrieval (ECIR)
Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon - Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls
ACM Trans. Inf. Syst. (TOIS)
Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon - Pseudo-Relevance Feedback with Dense Retrievers in Pyserini
Australasian Document Computing Symposium (ADCS)
Hang Li, Shengyao Zhuang, Xueguang Ma, Jimmy Lin, Guido Zuccon - Reduce, Reuse, Recycle: Green Information Retrieval Research
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Harry Scells, Shengyao Zhuang, Guido Zuccon ( Best Paper Honorable Mention Award ) - Reinforcement Online Learning to Rank with Unbiased Reward Shaping
Information Retrieval Journal (IRJ)
Shengyao Zhuang, Zhihao Qiao, Guido Zuccon - Robustness of Neural Rankers to Typos: A Comparative Study
Australasian Document Computing Symposium (ADCS)
Shengyao Zhuang, Xinyu Mao, Guido Zuccon ( Best Paper Award ) - To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon
2021
- BERT-based Dense Retrievers Require Interpolation with BM25 for Effective Passage Retrieval
International Conference on the Theory of Information Retrieval (ICTIR)
Shuai Wang, Shengyao Zhuang, Guido Zuccon - Dealing with Typos for BERT-based Passage Retrieval and Ranking
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Shengyao Zhuang, Guido Zuccon - Deep Query Likelihood Model for Information Retrieval
European Conference on Information Retrieval (ECIR)
Shengyao Zhuang, Hang Li, Guido Zuccon - Effective and Privacy-preserving Federated Online Learning to Rank
International Conference on the Theory of Information Retrieval (ICTIR)
Shuyi Wang, Bing Liu, Shengyao Zhuang, Guido Zuccon ( Best Student Paper Award ) - Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion
arXiv
Shengyao Zhuang, Guido Zuccon - Federated Online Learning to Rank with Evolution Strategies: A Reproducibility Study
European Conference on Information Retrieval (ECIR)
Shuyi Wang, Shengyao Zhuang, Guido Zuccon - How do Online Learning to Rank Methods Adapt to Changes of Intent?
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shengyao Zhuang, Guido Zuccon - IELAB at TREC Deep Learning Track 2021
Text Retrieval Conference (TREC)
Shengyao Zhuang, Hang Li, Guido Zuccon - TILDE: Term Independent Likelihood moDEl for Passage Re-ranking
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Shengyao Zhuang, Guido Zuccon
2020
- Counterfactual Online Learning to Rank
European Conference on Information Retrieval (ECIR)
Shengyao Zhuang, Guido Zuccon - IELAB for TREC Conversational Assistance Track (CAsT) 2020
Text Retrieval Conference (TREC)
Sebastian Cross, Hang Li, Shengyao Zhuang, Ahmed Mourad, Guido Zuccon, Bevan Koopman
