Overview
llama2.c is a minimalist and educational "fullstack" solution for training and inferencing Llama 2 LLMs. The core component is `run.c`, a single 700-line C file implementing an inference engine for the Llama 2 architecture. It allows users to run small Llama 2 models (up to 7B parameters, though larger is theoretically possible but slow) with no dependencies. The repository also provides PyTorch code for training the models. The focus is on simplicity and transparency, making it suitable for learning about the inner workings of LLMs and for deploying small, specialized models on resource-constrained devices. While larger models can be loaded, performance is optimized for smaller models due to the current float32 inference implementation.
Common tasks
