Question 1

What is the input and output format for this model?

Accepted Answer

Input: an image and a text prompt (e.g., "Locate all the instances that matches the following description: personcar."). Output: a JSON list of detections with labels and bounding boxes, plus an optional annotated PNG.

Question 2

How does this model compare in speed to the official PyTorch version?

Accepted Answer

On CPU, the recommended q8_0 GGUF is ~3.9× faster than official PyTorch f32 and box-identical. On GPU (NVIDIA GB10), it is ~1.9–2.1× faster than official bf16.

Question 3

What quantization levels are available and which is recommended?

Accepted Answer

Available: f16, q8_0, q6_k, q5_k, q4_k. The q8_0 GGUF is recommended as the sweet spot: box-identical to f32, less than half the size, and ~3.9× faster than official PyTorch.

Question 4

What are the license terms for using this model?

Accepted Answer

The model weights are under NVIDIA's license (not standard open-source). The GGUF repository is MIT-licensed. You must comply with NVIDIA's license for the weights.

Question 5

How can I call this model via the gigarouter API?

Accepted Answer

Use the gigarouter OpenAI-compatible endpoint with your API key. Send a request with the image and prompt to the hosted model endpoint.

Task	Open-vocabulary detection / visual grounding
Architecture	Qwen2.5-3B (LM) + MoonViT (vision) + 2-layer MLP projector
Parameters	3B
License	NVIDIA license (weights); MIT (GGUF repo)

dtype	size	infer	vs official f32	boxes
f16	9.15 GB	13.68 s	1.7×	identical
q8_0	6.26 GB	6.07 s	3.9×	identical
q6_k	5.51 GB	5.77 s	4.1×	identical
q5_k	5.10 GB	5.11 s	4.6×	sub-pixel
q4_k	4.72 GB	4.29 s	5.5×	sub-pixel

Locate Anything 3B

specs

about this model

Performance

Quantization Policy

best for

FAQ

related object detection models