GoKawiil - New method lets DeepSeek and other models answer ‘sensitive’ questions

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More It is tough to remove bias, and in some cases, outright censorship, in large language models (LLMs). One such model, DeepSeek from China, alarmed politicians and some business leaders about its potential danger to national security. A select committee at the United States Congress recently released a report called DeepSeek, “a profound threat to our nation’s security,” and detailed policy recommendations. While there are ways to bypass bias through Reinforcement Learning from Human Feedback (RLHF) and fine-tuning, the enterprise risk management startup CTGT said it has another way. CTGT developed a method that bypasses bias and censorship baked into some language models that it says 100% removes censorship. In a paper, Cyril Gorlla and Trevor Tuttle of CTGT said that their framework “directly locates and modifies the internal features responsible for censor ... Read full article.

Find Related products on Amazon

New method lets DeepSeek and other models answer ‘sensitive’ questions

Related Articles