# no-magic **Repository Path**: long-train/no-magic ## Basic Information - **Project Name**: no-magic - **Description**: no-magic - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-03-12 - **Last Updated**: 2026-03-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [![no-magic](./assets/banner.png)](https://github.com/Mathews-Tom/no-magic) --- ![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue?style=flat-square&logo=python&logoColor=white) ![License: MIT](https://img.shields.io/github/license/Mathews-Tom/no-magic?style=flat-square) ![Algorithms](https://img.shields.io/badge/algorithms-41-orange?style=flat-square) ![Version](https://img.shields.io/badge/version-v2.0.0-blue?style=flat-square) ![Zero Dependencies](https://img.shields.io/badge/dependencies-zero-brightgreen?style=flat-square) ![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen?style=flat-square) ![GitHub stars](https://img.shields.io/github/stars/Mathews-Tom/no-magic?style=flat-square) ![Last Commit](https://img.shields.io/github/last-commit/Mathews-Tom/no-magic?style=flat-square) --- # no-magic **Because `model.fit()` isn't an explanation.** --- ## What This Is `no-magic` is a curated collection of single-file, dependency-free Python implementations of the algorithms that power modern AI. Each script is a complete, runnable program that trains a model from scratch and performs inference — no frameworks, no abstractions, no hidden complexity. Every script in this repository is an **executable proof** that these algorithms are simpler than the industry makes them seem. The goal is not to replace PyTorch or TensorFlow — it's to make you dangerous enough to understand what they're doing underneath. ## See It In Action

01 — Foundations (14 scripts)

Autoregressive GPT

Token-by-token generation
RNN vs GRU

Vanishing gradients and gating
LSTM

4-gate memory highway
BPE Tokenizer

Iterative pair merging → vocabulary
Word Embeddings

Contrastive learning → semantic clusters
RAG Pipeline

Retrieve → augment → generate
BERT

Bidirectional attention + [MASK] prediction
Convolutional Net

Sliding kernels → feature maps
ResNet

F(x) + x = gradient highway
Vision Transformer

Image patches as tokens
Diffusion

Noise → data via iterative denoising
VAE

Encode → sample z → decode
GAN

Generator vs discriminator minimax
Optimizers

SGD vs Momentum vs Adam convergence
**Comparison scripts:** [attention_vs_none.py](01-foundations/attention_vs_none.py) · [rnn_vs_gru_vs_lstm.py](01-foundations/rnn_vs_gru_vs_lstm.py)

02 — Alignment & Training (10 scripts)

LoRA Fine-tuning

Low-rank weight injection
QLoRA

4-bit base + full-precision adapters
DPO Alignment

Preferred vs. rejected → policy update
PPO (RLHF)

Clipped policy gradient for alignment
GRPO

Group-relative rewards, no critic
REINFORCE

Log P(a) × reward = gradient
Mixture of Experts

Sparse routing to specialist MLPs
Batch Normalization

Normalize activations → stable training
Dropout

Kill neurons → prevent overfitting
**Comparison scripts:** [adam_vs_sgd.py](02-alignment/adam_vs_sgd.py)

03 — Systems & Inference (13 scripts)

Attention Mechanism

Q·KT → softmax → weighted V
Flash Attention

Tiled O(N) memory computation
RoPE

Position via rotation matrices
KV-Cache

Memoize keys/values — stop recomputing
PagedAttention

OS-style paged KV-cache memory
Quantization

Float32 → Int8 = 4x compression
Beam Search

Tree search with top-k pruning
Checkpointing

O(n) → O(√n) memory via recompute
Model Parallelism

Tensor + pipeline across devices
State Space Models

Linear-time selective state transitions
Vector Search

Exact vs LSH approximate search
BM25

TF → TF-IDF → BM25 evolution
Speculative Decoding

Draft fast, verify once

04 — Agents & Planning (2 scripts)

Monte Carlo Tree Search

UCB1 tree search + random rollouts
ReAct Agent

Thought → Action → Observation
> All algorithms have animated visualizations. Full 1080p60 videos in [Releases](https://github.com/Mathews-Tom/no-magic/releases). > Video source scenes in [`videos/scenes/`](videos/scenes/) — built with [Manim](https://www.manim.community/). ### Rendering Videos Locally All visualizations can be rendered from source. System dependencies: `cairo`, `pango`, `ffmpeg`, and optionally `gifsicle` for GIF optimization. ```bash # Install Manim (one-time) pip install -r videos/requirements.txt # macOS system deps (one-time) brew install cairo pango ffmpeg gifsicle # Ubuntu/Debian system deps (one-time) sudo apt-get install -y libcairo2-dev libpango1.0-dev ffmpeg gifsicle ``` **Using the Python renderer** (`render_all.py`): ```bash # Render all scenes — full 1080p60 MP4 + 480p GIF previews python videos/render_all.py # Render specific scenes only python videos/render_all.py microattention microgpt microlora # Full MP4s only (no GIFs) python videos/render_all.py --full-only # GIF previews only (faster) python videos/render_all.py --preview-only # Custom quality (low/medium/high/4k) python videos/render_all.py --quality medium # Skip GIF optimization step python videos/render_all.py --preview-only --skip-optimize ``` **Using the shell renderer** (`render.sh`): ```bash bash videos/render.sh # all scenes (MP4 + GIF) bash videos/render.sh microattention # single scene bash videos/render.sh --preview-only # GIF previews only bash videos/render.sh --full-only # MP4s only ``` Output lands in `videos/renders/` (MP4) and `videos/previews/` (GIF). Full rendering details in [`videos/README.md`](videos/README.md). ## Philosophy Modern ML education has a gap. There are thousands of tutorials that teach you to call library functions, and there are academic papers full of notation. What's missing is the middle layer: **the algorithm itself, expressed as readable code**. This project follows a strict set of constraints: - **One file, one algorithm.** Every script is completely self-contained. No imports from local modules, no `utils.py`, no shared libraries. - **Zero external dependencies.** Only Python's standard library. If it needs `pip install`, it doesn't belong here. - **Train and infer.** Every script includes both the learning loop and generation/prediction. You see the full lifecycle. - **Runs in minutes on a CPU.** No GPU required. No cloud credits. Every script completes on a laptop in reasonable time. - **Comments are mandatory, not decorative.** Every script must be readable as a guided walkthrough of the algorithm. We are not optimizing for line count — we are optimizing for understanding. See `CONTRIBUTING.md` for the full commenting standard. ## Who This Is For - **ML engineers** who use frameworks daily but want to understand the internals they rely on. - **Students** transitioning from theory to practice who want to see algorithms as working code, not just equations. - **Career switchers** entering ML who need intuition for what's actually happening when they call high-level APIs. - **Researchers** who want minimal reference implementations to prototype ideas without framework overhead. - **Anyone** who has ever stared at a library call and thought: _"but what is it actually doing?"_ This is not a beginner's introduction to programming. You should be comfortable reading Python and have at least a surface-level familiarity with ML concepts. The scripts will give you the depth. ## What You'll Find Here The repository is organized into four tiers based on conceptual dependency: ### 01 — Foundations (14 scripts) Core algorithms that form the building blocks of modern AI systems. GPT, RNN, LSTM, BERT, CNN, ResNet, ViT, GAN, VAE, diffusion, embeddings, tokenization, RAG, and optimizer comparison. Includes comparison scripts for attention mechanisms and recurrent architectures. See [`01-foundations/README.md`](01-foundations/README.md) for the full algorithm list, timing data, and roadmap. ### 02 — Alignment & Training Techniques (10 scripts) Methods for steering, fine-tuning, and aligning models after pretraining. LoRA, QLoRA, DPO, PPO, GRPO, REINFORCE, MoE, batch normalization, dropout/regularization, and optimizer comparison. See [`02-alignment/README.md`](02-alignment/README.md) for the full algorithm list, timing data, and roadmap. ### 03 — Systems & Inference (13 scripts) The engineering that makes models fast, small, and deployable. Attention variants, Flash Attention, KV-cache, PagedAttention, RoPE, quantization, beam search, checkpointing, parallelism, SSMs, vector search, BM25, and speculative decoding. See [`03-systems/README.md`](03-systems/README.md) for the full algorithm list, timing data, and roadmap. ### 04 — Agents & Planning (2 scripts) Autonomous reasoning and decision-making. Monte Carlo Tree Search for strategic planning and ReAct agents for tool-augmented reasoning loops. See [`04-agents/README.md`](04-agents/README.md) for the full algorithm list, timing data, and roadmap. ## How to Use This Repo ```bash # Clone the repository git clone https://github.com/Mathews-Tom/no-magic.git cd no-magic # Pick any script and run it python 01-foundations/microgpt.py ``` That's it. No virtual environments, no dependency installation, no configuration. Each script will download any small datasets it needs on first run. ### Minimum Requirements - Python 3.10+ - 8 GB RAM - Any modern CPU (2019-era or newer) ### Quick Start Path If you're working through the scripts systematically, this subset builds core concepts incrementally: ```text microtokenizer.py → How text becomes numbers microembedding.py → How meaning becomes geometry microgpt.py → How sequences become predictions micrornn.py → How recurrence models sequences microlstm.py → How gated memory solves vanishing gradients microbert.py → How bidirectional context differs from autoregressive microconv.py → How spatial filters extract features microvit.py → How transformers see images microbatchnorm.py → How normalizing activations stabilizes training microlora.py → How fine-tuning works efficiently microdpo.py → How preference alignment works microattention.py → How attention actually works (all variants) microrope.py → How position gets encoded through rotation microquant.py → How models get compressed microflash.py → How attention gets fast microssm.py → How Mamba models bypass attention entirely microreact.py → How agents reason with tools ``` Each tier's README has the full algorithm list with measured run times for that category. ## Learning Resources ### Challenges "Predict the behavior" exercises that test your understanding of the algorithms. 5 challenges covering attention, GPT, GAN, DPO, and optimizer edge cases. Each challenge presents a code snippet and asks you to reason about the output before running it. See [`challenges/README.md`](challenges/README.md) for the full challenge set. ### Flashcards Anki-compatible flashcard decks for spaced repetition review. 147 cards across 3 tiers (foundations, alignment, systems), covering key concepts, equations, and design decisions from every script. ```bash # Generate the Anki deck python resources/flashcards/generate_anki.py ``` See [`resources/flashcards/`](resources/flashcards/) for the raw card data and generation script. ### Learning Path Structured tracks for different goals — 6 learning tracks ranging from weekend sprints to a full 20-hour curriculum. Each track orders scripts by conceptual dependency and includes time estimates, prerequisites, and milestone markers. See [`LEARNING_PATH.md`](LEARNING_PATH.md) for the full guide. ## Translations Comment translations for 6 languages: Spanish, Portuguese, Chinese, Japanese, Korean, and Hindi. The code stays in English — only comments, docstrings, section headers, and print statements are translated. See [`TRANSLATIONS.md`](TRANSLATIONS.md) for full status and contributor guide. Want to help translate? See the [translation guide](translations/README.md). ## Dependency Graph How the algorithms connect conceptually. Arrows mean "understanding A helps with B" — not code imports (every script is fully self-contained). ```mermaid graph LR %% --- Style definitions --- classDef foundations fill:#4a90d9,stroke:#2c5f8a,color:#fff classDef alignment fill:#e8834a,stroke:#b35f2e,color:#fff classDef systems fill:#5bb55b,stroke:#3a823a,color:#fff %% === 01-FOUNDATIONS === subgraph F["01 — Foundations"] TOK["Tokenizer"] EMB["Embedding"] OPT["Optimizer"] RNN["RNN / GRU"] CONV["Conv Net"] GPT["GPT"] BERT["BERT"] RAG["RAG"] DIFF["Diffusion"] VAE["VAE"] GAN["GAN"] end %% === 02-ALIGNMENT === subgraph A["02 — Alignment"] BN["BatchNorm"] DROP["Dropout"] LORA["LoRA"] QLORA["QLoRA"] DPO["DPO"] REINF["REINFORCE"] PPO["PPO"] GRPO["GRPO"] MOE["MoE"] end %% === 03-SYSTEMS === subgraph S["03 — Systems"] ATTN["Attention"] FLASH["Flash Attn"] ROPE["RoPE"] KV["KV-Cache"] PAGED["PagedAttn"] QUANT["Quantization"] BEAM["Beam Search"] CKPT["Checkpointing"] PAR["Parallelism"] SSM["SSM / Mamba"] end %% --- Foundation internals --- TOK --> GPT EMB --> RAG RNN --> GPT OPT --> GPT GPT --> BERT DIFF -.-> VAE DIFF -.-> GAN %% --- Foundations → Alignment --- GPT --> LORA GPT --> DPO GPT --> PPO GPT --> MOE GPT --> GRPO LORA --> QLORA REINF --> PPO REINF --> GRPO OPT --> BN OPT --> DROP %% --- Foundations → Systems --- GPT --> ATTN GPT --> KV GPT --> QUANT GPT --> BEAM GPT --> SSM RNN --> SSM ATTN --> FLASH ATTN --> ROPE KV --> PAGED %% --- Cross-tier into QLoRA --- QUANT --> QLORA %% --- Apply styles --- class TOK,EMB,OPT,RNN,CONV,GPT,BERT,RAG,DIFF,VAE,GAN foundations class BN,DROP,LORA,QLORA,DPO,REINF,PPO,GRPO,MOE alignment class ATTN,FLASH,ROPE,KV,PAGED,QUANT,BEAM,CKPT,PAR,SSM systems ``` **Legend:** Foundations · Alignment · Systems — Solid arrows = strong prerequisite, dashed arrows = conceptual comparison. ## Inspiration & Attribution This project is directly inspired by [Andrej Karpathy's](https://github.com/karpathy) extraordinary work on minimal implementations — particularly [micrograd](https://github.com/karpathy/micrograd), [makemore](https://github.com/karpathy/makemore), and the `microgpt.py` script that demonstrated the entire GPT algorithm in a single dependency-free Python file. Karpathy proved that there's enormous demand for "the algorithm, naked." `no-magic` extends that philosophy across the full landscape of modern AI/ML. ## How This Was Built In the spirit of transparency: the code in this repository was co-authored with Claude (Anthropic). I designed the project — which algorithms to include, the three-tier structure, the constraint system, the learning path, and how each script should be organized — then directed the implementations and verified that every script trains and infers correctly end-to-end on CPU. I'm not claiming to have hand-typed every algorithm from scratch. The value of this project is in the curation, the architectural decisions, and the fact that every script works as a self-contained, runnable learning resource. The line-by-line code generation was collaborative. This is how I build in 2026. I'd rather be upfront about it. ## Contributing Contributions are welcome, but the constraints are non-negotiable. See `CONTRIBUTING.md` for the full guidelines. The short version: - One file. Zero dependencies. Trains and infers. - If your PR adds a `requirements.txt`, it will be closed. - Quality over quantity. Each script should be the **best possible** minimal implementation of its algorithm. ## License MIT — use these however you want. Learn from them, teach with them, build on them. --- _The constraint is the product. Everything else is just efficiency._ _v2.0.0 — March 2026_