Nepali GPT — GPT-2 Style Language Model
- Built a GPT-2 style Nepali language model in PyTorch with multi-head self-attention, causal masking, a SentencePiece tokenizer, and autoregressive generation
- Trained on 6.4 million Nepali-text rows and reached ~60 perplexity with stable convergence on an RTX 3050 4GB in ~4 hours
- Used float16 mixed precision, gradient accumulation, cosine decay, and top-k sampling to fit within GPU memory