nanoVLM Image-to-Text Generator

This demo showcases nanoVLM, a lightweight vision-language model by HuggingFace. Upload an image and provide a query to generate four text descriptions. The model is based on the nanoVLM repository and uses the pretrained model lusxvr/nanoVLM-222M. nanoVLM is designed for efficient image-to-text generation, ideal for resource-constrained environments.