Introduction
LLMs use natural language processing (NLP), and often artificial intelligence (AI), to process and generate text for translation, information retrieval, conversational interactions, and summarisation. Countering violent extremism (CVE) efforts are also making use of LLMs, by automatically generating counter-speech (interventions that directly address, or offer alternatives to extremist content), with the aim of dissuading individuals from extremist narratives. Scholars have argued that LLMs may assist practitioners in identifying hate speech, creating targeted counter-speech content, and engaging with users to deliver counter-speech. This article outlines the potential for LLMs as a vehicle for counter-speech, whilst highlighting the challenges that must be overcome.
Counter-speakers have suggested that it could be beneficial to use AI for counter-speech campaigns to protect their personal well-being.
Potential Benefits
Using LLMs to automatically generate counter-speech can assist with the increasing demand for counter-speech online, as they can be used to supplement the design and dissemination of counter-speech and may be more easily scalable than solely human-operated interventions. When identifying where counter-speech is needed, LLMs could automatically detect hate speech. Counter-speakers have also shown interest in using AI to gather relevant information to assist them in designing counter-speech content. Using LLMs to assist counter-speakers in identifying hate speech and constructing counter-speech responses could be a beneficial time-saving strategy.
It is important to reduce the strain on counter-speakers, as they can become the victims of hate speech themselves and experience negative impacts on their personal well-being. Since counter-speech requires a lot of time and effort and often involves addressing upsetting content, some counter-speakers have reported feeling overwhelmed and experiencing negative impacts on their mental health. As such, counter-speakers have suggested that it could be beneficial to use AI for counter-speech campaigns to protect their personal well-being, as it can offer them anonymity and reduce the amount of harmful content that they must identify and review, helping them to become more emotionally detached.
Potential Challenges
Whilst using LLMs for counter-speech can reduce the physical and mental load on counter-speakers, there are important practical limitations and ethical challenges that should be considered.
There are questions surrounding the functionality of LLMs and their ability to produce relevant and credible counter-speech to a variety of audiences. Generating counter-speech requires large amounts of data, language comprehension, emotional intelligence, and contextual knowledge, which can be challenging to achieve with machine learning alone. As such, some LLM-generated counter-speech has been found to include factual, or grammatical errors, which may limit the effectiveness of the counter-speech within a CVE campaign.
Counter-speakers have expressed concerns that a lack of human involvement during the design and dissemination of counter-speech may have impacts on the authenticity and credibility of the counter-speech. For instance, disclosing that counter-speech was written by AI has been found to significantly reduce users perceived trust in the counter-speech. Alternatively, if it is not disclosed when AI has been used for counter-speech, transparency concerns can arise from users being unaware that LLMs are involved. It is important that LLM-generated counter-speech is manually reviewed to ensure it is authentic, accurate, and appropriate for the target audience.
Using fully automated design and delivery methods for counter-speech raises a number of ethical considerations including user privacy and bias.
Using fully automated design and delivery methods for counter-speech raises a number of ethical considerations including user privacy and bias. Privacy is a key ethical consideration for LLMs, which can gather highly personal information about individuals. For example, when conversing with a chatbot, individuals may not understand that their conversation can be read by real people. When designing counter-speech campaigns that use LLMs, user privacy should be carefully considered and protected as much as possible.
Biases can be built into LLMs by the developers, from the datasets used to train them, or during real-world implementation. Biases can result in harmful and inaccurate information being shared, which can encourage discrimination. For example, a translation LLM on Facebook incorrectly translated “good morning” written in Arabic to “hurt them” and “attack them”, resulting in a Palestinian individual being arrested by Israeli authorities. Diverse groups of individuals need to be involved in the design and dissemination of LLMs to ensure that biases are mitigated and LLMs are not generating harmful or incorrect content.
The effectiveness of LLM-generated counter-speech
There is limited empirical research that assesses the effectiveness of LLM-generated counter-speech as part of a CVE campaign. Most studies focus on the ability of LLMs to generate counter-speech, leaving measures of effectiveness mostly restricted to evaluations of the counter-speech’s written quality. There are, however, a smaller number of studies that do assess the practical effectiveness and impact of LLM-generated counter-speech.
Chung et al. (2021) tested an NLP tool that aimed to assist NGO practitioners in countering Islamophobia on Twitter in English, French, and Italian. The tool detected Islamophobia and then automatically composed counter-speech responses. The practitioners who tested the tool had positive feedback and felt it was an innovative addition to counter-speech writing. Importantly, some of the practitioners emphasised that the tool should not entirely replace manual writing, but instead should be used to assist them in counter-speech writing, as modifications to the generated counter-speech were often necessary. The tool was still considered to be useful as writing counter-speech from scratch reportedly took longer than modifying the counter-speech that the tool generated.
Bilewicz et al. (2021) assessed whether counter-speech messages that were generated and delivered by a bot (disguised as a real male user) could be effective at reducing verbal aggression within two subreddits. When verbal aggression was detected, the bot delivered counter-speech by directly replying to the user who posted the aggressive content. User comments posted 60 days before and after the intervention were analysed and a control group of users from other subreddits, who were not targeted with the bot, were used for comparison. Users displayed a lower proportion of verbal aggression after the intervention than before, whereas the proportion of verbal aggression in the control group remained largely the same throughout. This study suggests that bots could be used as part of counter-speech interventions.
Bär et al. (2024) explored whether LLM-generated counter-speech was effective in reducing hate on Twitter (X). Compared to manually-generated counter-speech, the LLM-generated counter-speech was found to be less effective in reducing online hate. The LLM-generated counter-speech was also associated with an increase in hate posts after users viewed the counter-speech. The researchers suggest that users may have recognised that the counter-speech was LLM-generated, causing them to react negatively. These findings highlight the potential counter-productive backfire effects that can arise from the use of LLMs to generate counter-speech.
Conclusion
LLMs can generate counter-speech in response to a range of extremist content, which can offer some protections to counter-speakers and may reduce the human resources that are needed to design and deliver counter-speech campaigns. However, there are some important limitations associated with the use of LLMs to generate counter-speech that must be considered. The potentially limited functionality of LLMs can result in the production of counter-speech that contains inaccuracies and the design and use of LLMs raises user privacy and bias concerns that can have harmful implications.
The small number of studies that assess the practical effectiveness of LLM-generated counter-speech offer mixed findings. LLMs show some promise in assisting practitioners with counter-speech writing and may potentially help to reduce verbal aggression online. It is important to consider that using LLMs to generate counter-speech may backfire and create increased hostility. Any use of LLMs within counter-speech campaigns needs to be accompanied by rigorous evaluation, risk assessment, and human oversight to ensure that the counter-speech being generated is relevant, factual, and non-harmful.
Ellie Rogers is a PhD candidate at Swansea University within the Cyber Threats Research Centre (CYTREC). Her research focuses on the algorithmic amplification of counter-speech as a response to online extremism.
Read more
Baele, S. J, & Brace, L. (2024). AI Extremism: Technologies, Tactics, Actors. Vox-Pol Network of Excellence. https://voxpol.eu/wp-content/uploads/2024/04/DCUPN0254-Vox-Pol-AI-Extremism-WEB-240424.pdf
Bär, D., Maarouf, A. & Feuerriegel, S. (2024). Generative AI may backfire for counterspeech. arXiv:2411.14986. https://doi.org/10.48550/arXiv.2411.14986
Bilewicz, M., Tempska, P., Leliwa, G., Dowgiałło, M., Tańska, M., Urbaniak, R. & Wroczyński, M. (2021). Artificial intelligence against hate: Intervention reducing verbal aggression in the social network environment. Aggressive Behaviour, 47(3), 260-266. https://doi.org/10.1002/ab.21948
Blasiak, K., Risius, M., & Matook, S. (2021, December 1). ‘Social Bots for Peace’: A Dual-Process Perspective to Counter Online Extremist Messaging. Forty-Second International Conference on Information Systems, Austin 2021. International Conference on Information Systems, Austin, Texas. https://www.researchgate.net/publication/354906975_Social_Bots_for_Peace_A_Dual-Process_Perspective_to_Counter_Online_Extremist_Messaging
Briggs, R. & Feve, S. (2013). Review of programs to counter narratives of violent extremism. Institute for Strategic Dialogue. https://www.publicsafety.gc.ca/lbrr/archives/cn28580-eng.pdf
Buerger, C. (2024). Counterspeech: A literature review. Dangerous Speech Project. https://www.dangerousspeech.org/libraries/counterspeech-a-literature-review
Chung, Y. (2022). Counter Narrative Generation for Fighting Online Hate Speech [unpublished doctoral dissertation]. University of Trento. https://iris.unitn.it/retrieve/handle/11572/338563/544707/PhD_Thesis_YiLingChung.pdf
Chung, Y., Sinem Tekiroğlu, S., Tonelli, S. & Guerini, M. (2021). Empowering NGOs in countering online hate messages. Online Social Networks and Media, 24, 100150. https://doi.org/10.1016/j.osnem.2021.100150
Coltri, M. A. (2024). The Ethical Dilemma with Open AI ChatGPT: Is It Right or Wrong to Prohibit It? Athens Journal of Law, 10(1), 119-130. https://doi.org/10.30958/ajl.10-1-6
Davey, J., Tuck, H., and Amarasingam, A. (2019). An imprecise science: Assessing interventions for the prevention, disengagement and de-radicalisation of left and right-wing extremists and Countering Political Extremism. Institute for Strategic Dialogue. https://www.isdglobal.org/wp-content/uploads/2019/11/An-imprecise-science-1.pdf
Dennis, A. R., Kim, A., Raimi, R., and Ayabakan, S. (2020). User reactions to COVID-19 screening chatbots from reputable providers. Journal of the American Medical Informatics Association, 27(11), 1727–1731. https://doi.org/10.1093/jamia/ocaa167
Donath, J. (2020). Ethical Issues in Our Relationship with Artificial Entities. In Markus D. Dubber, Frank Pasquale, and Sunit Das (Eds), The Oxford Handbook of Ethics of AI (pp. 52-73). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190067397.013.3
Gebru, T. (2020). Race and Gender. In M.D. Dubber, F. Pasquale & S. Das (Eds.), The Oxford Handbook of Ethics and AI (pp. 252-269). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190067397.013.16
Howard, J. W. (2021). Terror, Hate and the Demands of Counter-Speech. British Journal of Political Science, 51(3), 924–939. https://doi.org/10.1017/S000712341900053X
Irfan, M., Abdulaziz, Z. & Anwar, M. (2024). Unleashing transformative potential of artificial intelligence (AI) in countering terrorism online radicalization extremism and possible recruitment. Global Strategic & Securities Studies, 8(4), 1-15. http://dx.doi.org/10.31703/gsssr.2023(VIII-IV).01
Mukherjee, A. (2023). AI and Ethics: A computational perspective. IOP Publishing. https://iopscience.iop.org/book/mono/978-0-7503-6116-3
Mun, J., Buerger, C., Liang, J. T., Garland, J. & Sap, M. (2024). Counterspeakers’ Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, 742, 1-22. https://doi.org/10.1145/3613904.3642025
Naveed, H., Ullah Khan, A., Shi, Q., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., and Mian, A. (2025). A Comprehensive Overview of Large Language Models. ACM Transactions on Intelligent Systems and Technology, 16(5), 1-72. https://doi.org/10.1145/3744746
Susser, D. (2020). Ethical Considerations for Digitally Targeted Public Health Interventions. American Journal of Public Health, 110(53), 290-291. https://doi.org/10.2105/AJPH.2020.305758
Tanprasert, T., Fels, S. S., Sinnamon, L., and Yoon, D. (2024). Debate Chatbots to Facilitate Critical Thinking on YouTube: Social Identity and Conversational Style Make A Difference. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24). Association for Computing Machinery, 805, 1–24.
https://doi.org/10.1145/3613904.3642513
Vallecillo-Rodríguez, M. E., Ráez, A. M. & Martín-Valdivia, M. T. (2023). Automatic counter-narrative generation for hate speech in Spanish. Procesamiento del Lenguaje Natural, 71, 227-245. https://doi.org/10.26342/2023-71-18
Wu, C., Wang, Y, Zhang, Y., Wang, H., and Pang, Y. (2025). Confront hate with AI: how AI-generated counter speech helps against hate speech on social Media? Telematics and Informatics, 101, 102304. https://doi.org/10.1016/j.tele.2025.102304
Copyright Information
As part of CREST’s commitment to open access research, this text is available under a Creative Commons BY-NC-SA 4.0 licence. Please refer to our Copyright page for full details.
IMAGE CREDITS: Copyright ©2025 A.Armistead / CREST (CC BY-SA 4.0)






