Qwen3.6-40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking NEO CODE Di IMatrix MAX

DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF

published May 2026 · updated Jun 2026

Qwen3.6-40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking NEO CODE Di IMatrix MAX is a vlm model that expands the Qwen 3.6 27B base to 40B parameters with Deckard/Heretic uncensored finetuning, Claude 4.6 Opus reasoning distillation, and dual-imatrix GGUF quants for high-precision, uncensored text and vision tasks.

status

coming soon

API providers

downloads / mo

519.4K

license

apache-2.0

specs

Task	Text generation with vision (multimodal), reasoning, coding, creative writing
Architecture	Dense causal language model with vision encoder, 96 layers, 1275 tensors
Parameters	40B (expanded from 27B)
License	Unspecified (uncensored, no safety alignment)

about this model

DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF is a 40-billion-parameter dense vision-language model (VLM) built from Qwen 3.6 27B, expanded and fine-tuned for uncensored, high-reasoning tasks with variable-length thinking and 256K native context.

Architecture and training

The model expands the original 64-layer, 27B Qwen 3.6 to 96 layers and 1275 tensors (approximately 40B parameters). Training proceeds in stages: first uncensored (Heretic), then on five Deckard internal datasets for character, intelligence, depth, observation, and point of view, followed by expansion to 40B, and finally distillation on a Claude 4.6 Opus high-reasoning dataset to shorten and stabilize reasoning. Vision capabilities are preserved and require an mmproj file for image inputs.

Key strengths

Fully uncensored with no safety alignment; no content restrictions.
Variable-length reasoning — shorter for simple queries, deeper for complex ones.
NEO-CODE-Di-IMatrix-MAX quants engineered for balance and precision, benchmarked against BF16 full precision: IQ2_M at 83-84%, IQ4XS at 94%, Q8_0 HIGH at 98.4%.
Outperforms the base Qwen 3.6 27B model in 6 out of 7 benchmarks in instruct mode.

Benchmark results (instruct mode)

Benchmark	This model (mxfp8)	Base Qwen 3.6 27B (mxfp8)
ARC-c	0.651	0.647
ARC/e	0.816	0.803
BoolQ	0.908	0.910
HellaSwag	—	0.773
OBQA	—	0.450
PIQA	—	0.806
WinoGrande	—	0.742

Note: instruct mode yields stronger scores; this model exceeds the base in 6 of 7 benchmarks despite a minor regression on BoolQ.

Model architecture diagram for Qwen3.6-40B

best for

·Uncensored creative writing and character-driven fiction with deep narrative control
·Complex coding and agentic frontend workflows with long context (up to 256K tokens)
·Multimodal reasoning combining image input with extended, uncensored text generation

FAQ

What is the context length of this model?

It supports 256K tokens natively.

Is this model censored or safety-aligned?

No, safety alignment is removed — it is fully uncensored and unfiltered.

What quant quality should I use for best results?

The card suggests a minimum of Q4_K_S (non-imatrix) or IQ3_S (imatrix) or higher; for toolcalls, Q5/Q6 minimum.

Does this model support image inputs?

Yes, it has a vision encoder and requires an mmproj file placed in the same folder as the GGUF for image processing.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, selecting the model name as listed on the platform.

not yet live

We're benchmarking and onboarding Qwen3.6-40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking NEO CODE Di IMatrix MAX as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit