Xiao (Brandon) Han

Ph.D. Student

UoSurrey, CVSSP


I am on the job market for Spring 2024. šŸ˜¬

I am a third-year Ph.D. student at University of Surrey under the supervision of Prof. Tao Xiang and Prof. Yi-Zhe Song. I also work closely with Dr. Xiatian Zhu. Before coming to Surrey, I obtained my bachelor’s degree at Zhejiang University in 2020. I have several academic experiences at University of Michigan, Westlake University and Fudan University.

I am broadly interested in the field of Deep Learning. My current research interest lies in the intersection between Computer Vision and Natural Language Processing (i.e., visionā€‘language). My research goal is to build multiā€‘modal AI systems that can be used in realā€‘world applications (e.g., eā€‘commerce platform). My expertise includes but not limited to

  • Visionā€‘language preā€‘training and (parameterā€‘efficient) adaptation;
  • Visionā€‘language downstream tasks (e.g., uni-/crossā€‘/multi-modal image retrieval, image captioning, textā€‘based/guided 2D/3D contents generation/editing);
  • Some specific tasks (e.g., person ReID).

For more details, see my academic CV.

Feel free to poke me if you want to discuss, collaborate, or just say hi. šŸ˜Š


  • 21/03/2023: šŸ˜ Our FAME-ViL is selected as a highlight paper at CVPR 2023! (Top 2.5% of 9155 submissions)
  • 24/02/2023: šŸ˜† Our paper on multi-task vision-and-language model for fashion tasks get accepted by CVPR 2023 .
  • 03/07/2022: šŸ˜† Our paper on fashion-focused vision-and-language representation learning get accepted by ECCV 2022 .
  • 30/06/2022: šŸ˜‰ Our team win the second place of eBay eProduct Visual Search Challenge - FGVC9 (CVPR 2022) .


Journal & Conference

Quickly discover relevant content by filtering publications.

HeadSculpt: Crafting 3D Head Avatars with Text

A versatile pipeline for generating and editing 3D head avatars with textual prompts.

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

A versatile and efficient multi-task model for fashion-focused V+L tasks.

Large-Scale Product Retrieval with Weakly Supervised Representation Learning

The second place solution for 2nd eBay eProduct Visual Search Challenge (FGVC9-CVPR2022).

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

A versatile and flexible framework for fashion-focused V+L representation learning.

UIGR: Unified Interactive Garment Retrieval

A unified framework and benchmark for two interactive garment retrieval tasks.

Copyright Ā© Xiao (Brandon) Han · Last update on June 2023 · Powered by the Academic theme for Hugo.