This repository has been consolidated into model-runner. All future development, issues, and pull requests should be directed there. Please visit the new repository for the latest updates and to ...
Abstract: Vision-language pre-training models have demonstrated outstanding performance on a wide range of multimodal tasks. Nevertheless, they remain susceptible to multimodal adversarial examples.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results