「 ML/NLP 」
September 30, 2024
Words count
4.6k
Reading time
4 mins.
Making Sense of Multimodality: How Models See and ReadFor the longest time in my NLP studies, the world was made of text. Then came the rise of multimodal AI, and suddenly models could see, hear, and read all at once. For my seminar on advanced models, I had to do a deep dive into how exactly you get a model to understand both an image and a sentence at the same time. It turns out there are a few competing philosophies, each with its own flavor.
The First Big Question: When Do We Mix the Ingr...
Read article