This is a useful dataset, thank the organizers for releasing it.
Baseline method TIRG had its source code released. I tested it on this dataset and got around 0.30-0.31 mean score, here's quick guide to get started
https://github.com/lugiavn/notes/blob/master/fashioniq_tirg.md
You could add more bell and whistle, or simply throw it into your ensemble or something to get better score