all InfoSec news
Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service. (arXiv:2311.05863v1 [cs.CR])
cs.CR updates on arXiv.org arxiv.org
Recent advances in vision-language pre-trained models (VLPs) have
significantly increased visual understanding and cross-modal analysis
capabilities. Companies have emerged to provide multi-modal Embedding as a
Service (EaaS) based on VLPs (e.g., CLIP-based VLPs), which cost a large amount
of training data and resources for high-performance service. However, existing
studies indicate that EaaS is vulnerable to model extraction attacks that
induce great loss for the owners of VLPs. Protecting the intellectual property
and commercial ownership of VLPs is increasingly crucial yet …
analysis capabilities companies cost data eaas high language large modal performance resources service training training data understanding watermarking