This demo showcases
nanoVLM, a lightweight vision-language model by HuggingFace.
Upload an image and provide a query to generate four text descriptions.
The model is based on the
nanoVLM repository
and uses the pretrained model
lusxvr/nanoVLM-222M.
nanoVLM is designed for efficient image-to-text generation, ideal for resource-constrained environments.