Cross-modal Retrieval

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

A versatile and efficient multi-task model for fashion-focused V+L tasks.

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

A versatile and flexible framework for fashion-focused V+L representation learning.

Text-Based Person Search with Limited Data

Solve the data scarcity problem in TBPS through transfer learning and contrastive learning.

