Image Captioning

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

A versatile and efficient multi-task model for fashion-focused V+L tasks.


Copyright © Xiao (Brandon) Han · Last update on June 2023 · Powered by the Academic theme for Hugo.