llama2.c

Overview

llama2.c is a minimalist and educational "fullstack" solution for training and inferencing Llama 2 LLMs. The core component is `run.c`, a single 700-line C file implementing an inference engine for the Llama 2 architecture. It allows users to run small Llama 2 models (up to 7B parameters, though larger is theoretically possible but slow) with no dependencies. The repository also provides PyTorch code for training the models. The focus is on simplicity and transparency, making it suitable for learning about the inner workings of LLMs and for deploying small, specialized models on resource-constrained devices. While larger models can be loaded, performance is optimized for smaller models due to the current float32 inference implementation.

Common tasks

Text Generation Model Inference

FAQ

View all

What is llama2.c?

llama2.c is a minimalist and educational project that provides a complete solution for training and inferencing Llama 2 LLMs in pure C.

What are the system requirements?

A C compiler (like GCC or Clang) and basic command-line tools are required. For training, PyTorch is needed.

How do I train a model?

The repository includes PyTorch code for training Llama 2 models. Follow the instructions in the README to set up the training environment and run the training script.

How do I load and inference Meta's Llama 2 models?

First, get the Llama 2 checkpoints from Meta. Then, install the Python dependencies (pip install -r requirements.txt) and use the export.py file to convert the checkpoints to the llama2.c format.

FAQ+