「 ML/NLP 」
February 22, 2025
Words count
3.7k
Reading time
3 mins.
Trying to Understand MoE: How LLMs Get Both Bigger and SmarterEvery few months, a new paper or model release sends a shockwave through the NLP community. Recently, it was all about models with “trillions of parameters.” My first reaction was, “How is that even possible?” The computational cost to run a dense model of that size would be astronomical. The answer, as I learned after a deep dive with my reading group, lies in a clever architecture called Mixture of Experts (MoE).
The “Committee o...
Read article
「 ML/NLP 」
December 20, 2024
Words count
4.3k
Reading time
4 mins.
How I Shrank My LLM: A Student’s Dive into Model QuantizationOne of the most exciting and frustrating moments in my Master’s program was when I finally got my hands on a powerful, pre-trained language model. The excitement came from its incredible capabilities; the frustration came when I realized it was too big to run on my university-provided GPU for any serious fine-tuning. This sent me down the rabbit hole of model compression, and my first major stop was quantization.
What Exactly is Qua...
Read article
「 ML/NLP 」
October 25, 2024
Words count
4.7k
Reading time
4 mins.
On Data Contamination in LLMs: A Grad Student’s PerspectiveIn my NLP seminar last semester, a recurring theme was the integrity of our evaluation benchmarks. We spent weeks discussing how to measure progress, but one topic that really stuck with me was data contamination—the subtle, almost accidental way we can end up “cheating” on our tests. It’s a problem that seems technical on the surface but cuts to the very core of our field’s credibility.
The Core of the Problem: When Test Data Becomes...
Read article