Implement a reasoning LLM in PyTorch from scratch, step by step
-
Updated
Jun 12, 2026 - Jupyter Notebook
Implement a reasoning LLM in PyTorch from scratch, step by step
(ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
Solving Inequality Proofs with Large Language Models.
An official implementation of "SPARK: Synergistic Policy And Reward Co-Evolving Framework"
Prompt-engineering study on LLM math reasoning (GSM8K) and code generation (HumanEval): zero/few-shot, self-consistency, self-verification, and experiments on prompt quality, complexity, demonstrations, and diversity.
ArmLLM 2025 solutions covering ViT from scratch, SigLIP–Qwen LaTeX OCR, GRPO reasoning post-training, inference-time reasoning strategies, and adversarial vision attacks.
AI Benchmark 知识库 — 全面收录各大 AI 公司用来测试模型性能的 Benchmark 题库完整集合
A minimal JEPA-based language model demonstrating latent-space reasoning on GSM8K using a single decoder-only Transformer.
STaR × S1 math pipeline on Qwen2.5-1.5B. LoRA, strict Final: format, ~20–30% acc (OpenR1-Math split).
Data cleaning and structuring pipeline for math reasoning tasks using Qwen3-0.6B for LLM post-training.
A controlled LoRA finetuning study on process supervision for mathematical reasoning with Qwen2.5-Math-7B-Instruct.
Paired-framing pilot on observable self-doubt in math reasoning models
Beyond English-Only GRPO: a multi-seed controlled study of training-language and auxiliary-reward effects in sub-3B math reasoning (GRPO + LoRA, single GPU).
GRPO (Group Relative Policy Optimization) implemented from scratch in PyTorch. 10 ablation experiments.
Verifier-backed math reasoning lab for proof flaw localization, minimal repair, and RL-style evaluation.
NLP course final project (2026), Nanjing Normal University, supervised by 孔力: GSM8K math QA with Seq2Seq, Transformer and LLMs.
MSc capstone: mechanistic interpretability of Chain-of-Thought reasoning in LLMs via SAE features and logprob signals
Comprehensive framework for mathematical reasoning research with dual research capabilities
Small-scale Implementation and Extension of “The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning” (NeurIPS '25)
Unofficial PyTorch reproduction for LIMO: Less is More for Reasoning.
Add a description, image, and links to the math-reasoning topic page so that developers can more easily learn about it.
To associate your repository with the math-reasoning topic, visit your repo's landing page and select "manage topics."