Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper presentation for CSE 5469

Author

Affiliation

Robert Frenken

Ohio State University

Published

May 1, 2026

Keywords

graph neural networks, intrusion detection, knowledge distillation, CAN bus

Venue: CSE 5469 — Ohio State University Duration: 25 minutes Slides: Live deck · Source Paper: Gu, A., & Dao, T. (2024). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. COLM 2024.

Abstract

Gu & Dao introduce Mamba, a selective state space model (SSM) that matches or exceeds Transformer quality on language tasks while scaling linearly in sequence length rather than quadratically. The key insight is input-dependent state transitions: unlike prior SSMs (S4, H3) whose transition matrices are fixed at inference time, Mamba gates the SSM parameters as functions of the current token, giving the model content-aware memory selection analogous to attention — but without the full key-value product.

This presentation covers the structured state space foundation, the selectivity mechanism and why it breaks the convolutional view that makes prior SSMs fast, the hardware-aware parallel scan that recovers efficiency in spite of this, and a critical read of the empirical claims: where the linear-time scaling genuinely matters, where Transformers remain competitive, and what the architecture leaves open for follow-up work (Mamba-2, hybrid models).

References

Gu & Dao (2024). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. COLM.
Gu et al. (2022). Efficiently Modeling Long Sequences with Structured State Spaces. ICLR.
Dao & Gu (2024). Transformers are SSMs: Generalized Models and Efficient Algorithms through Structured State Space Duality. ICML.