Qwen3.6-40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking NEO CODE Di IMatrix MAX
DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF
published May 2026 · updated Jun 2026
Qwen3.6-40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking NEO CODE Di IMatrix MAX is a vlm model that expands the Qwen 3.6 27B base to 40B parameters with Deckard/Heretic uncensored finetuning, Claude 4.6 Opus reasoning distillation, and dual-imatrix GGUF quants for high-precision, uncensored text and vision tasks.
specs
| Task | Text generation with vision (multimodal), reasoning, coding, creative writing |
| Architecture | Dense causal language model with vision encoder, 96 layers, 1275 tensors |
| Parameters | 40B (expanded from 27B) |
| License | Unspecified (uncensored, no safety alignment) |
about this model
DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF is a 40-billion-parameter dense vision-language model (VLM) built from Qwen 3.6 27B, expanded and fine-tuned for uncensored, high-reasoning tasks with variable-length thinking and 256K native context.
Architecture and training
The model expands the original 64-layer, 27B Qwen 3.6 to 96 layers and 1275 tensors (approximately 40B parameters). Training proceeds in stages: first uncensored (Heretic), then on five Deckard internal datasets for character, intelligence, depth, observation, and point of view, followed by expansion to 40B, and finally distillation on a Claude 4.6 Opus high-reasoning dataset to shorten and stabilize reasoning. Vision capabilities are preserved and require an mmproj file for image inputs.
Key strengths
- Fully uncensored with no safety alignment; no content restrictions.
- Variable-length reasoning — shorter for simple queries, deeper for complex ones.
- NEO-CODE-Di-IMatrix-MAX quants engineered for balance and precision, benchmarked against BF16 full precision: IQ2_M at 83-84%, IQ4XS at 94%, Q8_0 HIGH at 98.4%.
- Outperforms the base Qwen 3.6 27B model in 6 out of 7 benchmarks in instruct mode.
Benchmark results (instruct mode)
| Benchmark | This model (mxfp8) | Base Qwen 3.6 27B (mxfp8) |
|---|---|---|
| ARC-c | 0.651 | 0.647 |
| ARC/e | 0.816 | 0.803 |
| BoolQ | 0.908 | 0.910 |
| HellaSwag | — | 0.773 |
| OBQA | — | 0.450 |
| PIQA | — | 0.806 |
| WinoGrande | — | 0.742 |
Note: instruct mode yields stronger scores; this model exceeds the base in 6 of 7 benchmarks despite a minor regression on BoolQ.

best for
- ·Uncensored creative writing and character-driven fiction with deep narrative control
- ·Complex coding and agentic frontend workflows with long context (up to 256K tokens)
- ·Multimodal reasoning combining image input with extended, uncensored text generation
FAQ
It supports 256K tokens natively.
No, safety alignment is removed — it is fully uncensored and unfiltered.
The card suggests a minimum of Q4_K_S (non-imatrix) or IQ3_S (imatrix) or higher; for toolcalls, Q5/Q6 minimum.
Yes, it has a vision encoder and requires an mmproj file placed in the same folder as the GGUF for image processing.
Use the gigarouter OpenAI-compatible endpoint with your API key, selecting the model name as listed on the platform.
We're benchmarking and onboarding Qwen3.6-40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking NEO CODE Di IMatrix MAX as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.