Samia Touileb
Position
Associate Professor, Natural Language Processing
Affiliation
Research groups
Research
Samia Touileb is an Associate Professor in Natural Language Processing (NLP). Prior to this she was a researcher in MediaFutures WP5 on Norwegian Language Technologies, and a Postdoc at the Language Technology Group (LTG), Department of Informatics, at the University of Oslo. She holds a PhD in NLP from the University of Bergen, and has been working within research in and applications of NLP for almost a decade.
Her main research interests are bias and fairness in NLP, information extraction, summarization, and applications of NLP and machine learning methods to tasks within social science research. She also mainly works on under- and mid-resourced languages such as Norwegian.
Publications
Academic chapter/article/Conference paper
- Touileb, Samia; Murstad, Jeanett; Mæhlum, Petter et al. (2024). EDEN: A Dataset for Event Detection in Norwegian News. (external link)
- Barnes, Jeremy Claude; Touileb, Samia; Mæhlum, Petter et al. (2023). Identifying Token-Level Dialectal Features in Social Media. (external link)
- Samuel, David; Kutuzov, Andrei; Touileb, Samia et al. (2023). NorBench – A Benchmark for Norwegian Language Models. (external link)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2023). Measuring normative and descriptive biases in language models using census data. (external link)
- Sheikhi, Ghazaal; Touileb, Samia; Khan, Sohail Ahmed (2023). Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models. (external link)
- Sheikhi, Ghazaal; Opdahl, Andreas Lothe; Touileb, Samia et al. (2023). Making sense of nonsense : Integrated gradient-based input reduction to improve recall for check-worthy claim detection. (external link)
- Olsen, Helene Bøsei; Touileb, Samia; Velldal, Erik (2023). Arabic dialect identification: An in-depth error analysis on the MADAR parallel corpus. (external link)
- You, Huiling; Touileb, Samia; Øvrelid, Lilja (2023). JSEEGraph: Joint Structured Event Extraction as Graph Parsing. (external link)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2022). Occupational Biases in Norwegian and Multilingual Language Models. (external link)
- Touileb, Samia; Nozza, Debora (2022). Measuring Harmful Representations in Scandinavian Language Models. (external link)
- You, Huiling; Samuel, David; Touileb, Samia et al. (2022). EventGraph: Event Extraction as Semantic Graph Parsing. (external link)
- You, Huiling; Samuel, David; Touileb, Samia et al. (2022). EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction. (external link)
- Touileb, Samia (2022). Exploring the Effects of Negation and Grammatical Tense on Bias Probes . (external link)
- Mæhlum, Petter; Kåsen, Andre; Touileb, Samia et al. (2022). Annotating Norwegian language varieties on Twitter for Part-of-speech. (external link)
- Touileb, Samia (2022). NERDz: A Preliminary Dataset of Named Entities for Algerian. (external link)
- Kutuzov, Andrei; Touileb, Samia; Mæhlum, Petter et al. (2022). NorDiaChange: Diachronic Semantic Change Dataset for Norwegian. (external link)
- Barnes, Jeremy; Mæhlum, Petter; Touileb, Samia (2021). NorDial: A Preliminary Corpus of Written Norwegian Dialect Use. (external link)
- Touileb, Samia; Barnes, Jeremy (2021). The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus. (external link)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2021). Using Gender- and Polarity-Informed Models to Investigate Bias. (external link)
- Touileb, Samia (2020). LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier. (external link)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2020). Gender and sentiment, critics and authors: a dataset of Norwegian book reviews. (external link)
- Lison, Pierre; Barnes, Jeremy; Hubin, Aliaksandr et al. (2020). Named Entity Recognition without Labelled Data: A Weak Supervision Approach . (external link)
- Adouane, Wafia; Touileb, Samia; Bernardy, Jean-Philippe (2020). Identifying Sentiments in Algerian Code-switched User-generated Comments. (external link)
- Rodina, Julia; Bakshandaeva, Daria; Fomin, Vadim et al. (2019). Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian. (external link)
- Barnes, Jeremy Claude; Touileb, Samia; Øvrelid, Lilja et al. (2019). Lexicon information in neural sentiment analysis: a multi-task learning approach. (external link)
- Velldal, Erik; Øvrelid, Lilja; Bergem, Eivind Alexander et al. (2018). NoReC: The Norwegian Review Corpus. (external link)
- Touileb, Samia; Pedersen, Truls Andre; Sjøvaag, Helle (2018). Automatic identification of unknown names with specific roles. (external link)
- Touileb, Samia; Salway, Andrew (2014). Constructions: a new unit of analysis for corpus-based discourse analysis . (external link)
Feature article
Popular scientific lecture
- Goodwin, Morten; Touileb, Samia; Bøhn, Einar Duenger (2023). Blir vi overflødige? En samtale om kunstig intelligens og utdanning. (external link)
- Touileb, Samia (2023). Hva er ChatGPT og hvordan fungerer det og lignende verktøy?. (external link)
- Touileb, Samia (2023). Store språkmodeller: muligheter og utfordringer. (external link)
- Touileb, Samia (2023). Sosiale og etiske utfordringer med språkmodeller . (external link)
Academic anthology/Conference proceedings
Academic article
- Blum, Sophie; Koudijs, Raoul; Ozaki, Ana et al. (2023). Learning Horn envelopes via queries from language models. (external link)
- Touileb, Samia; Steskal, Lubos (2016). ADIOS LDA: When Grammar Induction Meets Topic Modeling. (external link)
- Salway, Andrew; Touileb, Samia; Tvinnereim, Endre (2014). Inducing Information Structures for Data-driven Text Analysis. (external link)
- Salway, Andrew; Touileb, Samia (2014). Applying grammar induction to text mining. (external link)
Lecture
- Touileb, Samia (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet. (external link)
- Touileb, Samia; Schjøll, Anita; Throndsen, Eivind et al. (2023). The Ethics of Large Language Models. (external link)
- Touileb, Samia (2023). The Societal and Ethical Implications of Language Models. (external link)
- Touileb, Samia; Fahlvik, Morten; Berg, John Arthur (2023). ChatGPT & AI in education. (external link)
- Touileb, Samia (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet.. (external link)
- Touileb, Samia (2023). Sosiale og etiske utfordringer med språkmodeller som ChatGPT. (external link)
- Touileb, Samia; Åkernes, Hanne Louise (2023). Når kunstig intelligens inntar redaksjonen. (external link)
- Touileb, Samia; Lemaire, Pauline Marguerite (2023). Big Science Gullgruve eller fallgruve?. (external link)
- Touileb, Samia (2023). Benchmarking the societal and ethical implications of large language model. (external link)
- Touileb, Samia (2023). Demystifying ChatGPT and language models. (external link)
- Touileb, Samia; Duarte, Katherine (2016). Getting to know large newsflows: Automatically induced information structures as keyphrases for news content analysis. (external link)
- Touileb, Samia; Elgesem, Dag; Steskal, Lubos (2012). Networks of texts and people. (external link)
Academic lecture
- Touileb, Samia (2023). Large Language models: What are they, and what are their ethical implications?. (external link)
- Sjøvaag, Helle; Pedersen, Truls Andre; Touileb, Samia (2018). Operationalising Diversity for Big Data Policy Research. (external link)
- Pedersen, Truls Andre; Touileb, Samia; Sjøvaag, Helle (2017). Finding Voices in the Margins: Computer-Assisted Discovery of Naturally Belonging Names . (external link)
- Iversen, Magnus Hoem; Pedersen, Truls Andre; Stavelin, Eirik et al. (2015). Computer supported deliberation and argumentation online. Proposing a system for online argumentation.. (external link)
- Touileb, Samia (2013). Inducing local grammars from n-grams. (external link)
Poster
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2021). Using Gender- and Polarity-informed Models to Investigate Bias. (external link)
- Touileb, Samia; Pedersen, Truls Andre; Sjøvaag, Helle (2018). Automatically identifying names of unrecognized politicians. (external link)
- Touileb, Samia; Steskal, Lubos (2015). A computational approach to organize and analyze online communication data. (external link)
- Salway, Andrew; Hofland, Knut; Touileb, Samia (2013). Applying Corpus Techniques to Climate Change Blogs. (external link)
Doctoral dissertation
Projects
OPINION COST action: https://www.cost.eu/actions/CA21129/
MediaFutures: https://mediafutures.no/2021/01/20/postdoc-samia-touileb/
NorDial: https://github.com/jerbarnes/nordial