Don't Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails

“The devil is in the details,” they say. And so is the beauty, the thinking, the “but …”.

Maybe that’s why the phrase “elevator pitch” gives me a shiver.

It might have started back at AMD, when I was a young, aspiring engineer, joining every “Women in This or That” club I could find. I was searching for the feminist ideas I’d first found among women’s rights activists in Iran — hoping to see them alive in “lean in”-era corporate America. Naive, I know.

Later, as I ventured through academic papers and policy reports, I discovered the world of Executive Summaries and Abstracts. I wrote many, and read many, and I always knew that if I wanted to actually learn, digest, challenge, and build on a paper, I needed to go to the methodology section, to limitations, footnotes, appendices. That, I felt, was how I should train my mind to do original work.

Interviewing is also a big part of my job at Taraaz, researching social and human rights impacts of digital technologies including AI. Sometimes, from an hour of conversation, the most important finding is just one sentence. Or it’s the silence between sentences: a pause, then a longer pause. That’s sometimes what I want from an interview — not a perfectly written summary of “Speaker A” and “Speaker B” with listed main themes. If I wanted those, I would run a questionnaire, not an interview.

I’m not writing to dismiss AI-generated summarization tools. I know there are many benefits. But if your job as a researcher is to bring critical thinking, subjective understanding, and a novel approach to your research, don’t rely on them.

And here’s another reason why: Last year at Mozilla Foundation, I had the opportunity to go deep on evaluating large language models. I built multilingual AI evaluation tools and ran experiments. But summarization kept nagging at me. It felt like a blind spot in the AI evaluation world.

Let me show you an example from the tool I made last year.

Project 1: Bilingual Shadow Reasoning

... continue reading