Bulbasaur
Mihaiii/Bulbasaur
published Apr 2024 · updated Apr 2024
Bulbasaur is a distilled embed model for semantic-autocomplete, based on gte-tiny and fine-tuned on the qa-assistant dataset.
specs
| Task | Embeddings |
| Architecture | Distilled BERT (gte-tiny base) |
| Parameters | 22.7M |
| Embedding Dimension | 384 |
| Max Tokens | 512 |
| License | Unknown |
about this model
Mihaiii/Bulbasaur is an embedding model that converts text into dense vector representations, optimized for semantic-autocomplete and related similarity search tasks. It is a distilled version of gte-tiny, fine-tuned on the qa-assistant dataset—a curated collection of 7,174 question-answer pairs (5,768 training, 1,406 test) with associated relevance scores.
The underlying gte-tiny architecture uses a BERT model with 22.7 million parameters, producing 384-dimensional embeddings via mean pooling. According to the base model’s documentation, gte-tiny achieves performance comparable to thenlper/gte-small at roughly half the model size, making Bulbasaur a compact and efficient choice for retrieval and ranking pipelines.
The model accepts English text only and truncates inputs longer than 512 tokens. It is hosted on gigarouter as a managed, OpenAI-compatible API—no local installation or inference code is required.
best for
- ·Semantic autocomplete for search bars
- ·Lightweight sentence embedding for English text
FAQ
It is designed for semantic-autocomplete, such as suggesting completions in search bars based on meaning.
It produces 384-dimensional embeddings and supports up to 512 tokens per input.
Bulbasaur is a distilled version of gte-tiny, which itself has 22.7M parameters and is about half the size of gte-small.
It exclusively supports English text.
Use the gigarouter OpenAI-compatible endpoint with your API key to send text and receive embeddings.
We're benchmarking and onboarding Bulbasaur as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.