MLIR Dialect Literature 2023-2026

§1 Provenance

MLIR project: https://mlir.llvm.org/. Originally Chris Lattner at Google (2018), released as part of LLVM in 2019.
Wikipedia MLIR page: https://en.wikipedia.org/wiki/MLIR_(software).
Triton (OpenAI): https://github.com/openai/triton. Recent paper "ML-Triton: A Multi-Level Compilation and Language Extension to GPU Programming" (2025), arXiv: https://arxiv.org/pdf/2503.14985.
IREE (Google): https://iree.dev/, https://github.com/iree-org/iree.
Mojo (Modular): https://docs.modular.com/mojo/. Vision page: https://docs.modular.com/mojo/vision/. KGEN compiler discussed at https://forum.modular.com/t/mlir-dialect-import-for-mojo/774.
Awesome list: https://github.com/coderonion/awesome-mojo-max-mlir.
Ramalho et al., "Mitigating the MLIR Learning Curve" (2024 conference paper, exact venue under our verification).

§2 Technique / contribution

MLIR is a multi-level intermediate representation framework. Where LLVM IR is a single fixed IR, MLIR is a meta-IR with dialects: user-defined sets of operations, types, and attributes. A program traverses several dialects on its way to machine code, each lowering pass converting one dialect's ops into another's.

Triton dialect (OpenAI, 2021+)

Triton is a Python-embedded DSL for writing custom GPU kernels.
Triton-MLIR is the MLIR-based reimplementation of Triton's compiler (Microsoft contributed substantially to this).
Introduces dialects representing Triton's blocks, warps, and memory spaces. Lowers via the gpu, nvvm, and llvm standard MLIR dialects.
Now used by vLLM, Mamba, DeepSpeed, and many other ML frameworks.

IREE dialect (Google, 2020+)

Maps ML graphs (TFLite, PyTorch ONNX) onto MLIR's linalg, affine, scf, vector dialects.
Performs aggressive fusion, tiling, vectorization before lowering to LLVM IR or SPIR-V.
IREE adds its own flow, stream, hal dialects for inter-device scheduling.

Mojo / KGEN (Modular, 2023+)

Mojo is a systems-programming language with Python syntax. The compiler (KGEN, "kernel generator") is built atop MLIR Core.
Mojo deliberately does not use the linalg/affine/scf ML dialects. It uses MLIR Core only.
Mojo code can directly express MLIR dialect operations as inline syntax: it is described as "syntactic sugar for MLIR."
Supports out-of-tree custom dialects, making Mojo a candidate as a general-purpose MLIR frontend.

Quake (NVIDIA CUDA Quantum), CIRCT (hardware design), Polygeist (C-to-MLIR)

Quake: MLIR dialect for quantum-circuit compilation.
CIRCT: MLIR for digital-hardware design (Chisel moved its backend here in 2023).
Polygeist: lifts C/C++ into the affine dialect for polyhedral analysis.

§3 Where it shines, where it fails

Shines:

Modular: a small custom dialect plus standard lowerings yields LLVM-quality codegen.
Multi-target: same dialect lowers to CPU, GPU, TPU, FPGA via different lowering paths.
Active community, well-staffed (LLVM Foundation, Google, NVIDIA, Apple, Modular all contribute).
The dialect pattern lets a language frontend stay small while inheriting world-class optimization.

Fails:

Steep learning curve. Ramalho et al. (2024) note: building a new dialect or pass means delving into C++ templates and TableGen.
Build complexity: full MLIR build is ~1 GB of LLVM dependencies.
Pure-Go integration is essentially impossible. MLIR is C++ to its core.
Stability: the linalg/tensor dialects in particular churn fast.
Heavyweight: not "naive" at all. This is the polar opposite of MEP-42's stated phase-1 goal.

§4 Status (May 2026)

MLIR is the foundation for essentially every new ML compiler.
Triton-MLIR is the production Triton compiler since 2023.
IREE has shipping mobile and edge deployments.
Mojo reached 1.0 stable in 2025; KGEN is in active development with new dialect work each quarter.
MLIR governance was formalized in late 2024 with an area-team structure to manage the "dialect zoo."
The ML-Triton 2025 paper (arXiv 2503.14985) proposes a multi-level extension to Triton, suggesting the dialect approach is still evolving.

§5 Engineering cost for Mochi

This is the expensive end of the spectrum.

A Mochi-as-MLIR-dialect implementation would require:

3 months: define mochi dialect (operations, types, attributes). Use TableGen for declarative ops.
6 months: write lowering passes: mochi -> arith + memref + scf -> llvm.
3 months: build-system integration (Mochi compiler shells out to mlir-opt and mlir-translate).
6 months: stable for production.

Total: ~18 months. Plus the build dependency on LLVM/MLIR (~1 GB of C++ source).

Versus copy-and-patch at ~8 weeks for a working JIT, MLIR is 9-10x more expensive.

When MLIR makes sense for Mochi: only if Mochi grows GPU/TPU/accelerator support. For CPU-only naive native, MLIR is overkill.

§6 Mochi adaptation note

compiler3/ir/ could be lifted into an MLIR mochi dialect.
runtime/vm3/arenas.go would expose runtime hooks that the lowered LLVM IR calls.
runtime/vm3/cell.go would map to an MLIR custom type !mochi.cell.
The build would shift from pure-Go to Go+C++ (via cgo or shell-out to LLVM tools).

This conflicts with the project preference to stay pure-Go-no-cgo. Recommendation: defer MLIR to MEP-50+ when accelerator support becomes a priority.

§7 Open questions for MEP-42

Is GPU/TPU support a phase-1 goal? If no, skip MLIR.
If yes, do we adopt Triton's dialect or define our own?
Can we shell out to mlir-opt from a Go binary without cgo? (Yes, via subprocess, but the dependency is heavy.)
What is our story for Mojo interop? If Mojo becomes the lingua franca of MLIR frontends, Mochi could expose a Mojo binding.

§8 References

MLIR project page: https://mlir.llvm.org/.
MLIR users list: https://mlir.llvm.org/users/.
ML-Triton paper (2025): https://arxiv.org/pdf/2503.14985.
Mojo vision: https://docs.modular.com/mojo/vision/.
IREE project: https://iree.dev/.
Triton on GitHub: https://github.com/openai/triton.
"Deep Engineering #9: Unpacking MLIR and Mojo with Ivo Balbaert": https://deepengineering.substack.com/p/deep-engineering-9-unpacking-mlir.
Awesome Mojo / MAX / MLIR: https://github.com/coderonion/awesome-mojo-max-mlir.