Sentiment Analysis for Arabizi: A Multilingual Jargon on Social Media

After spending almost 2 years at the lab, I am finally pleased to share my research with my friends. Last Thursday, 25th of May, I presented my work in NLP titled "Sentiment Analysis for Arabizi: A Multilingual Jargon". Arabizi is a transcription of the naturally-dialectal Arabic language in Latinscript, quiet common in mobile texting and social media. Similar to the Modern Standard Arabic (MSA) or Al-Fus·ha, it is rich in morphology. Unlike MSA, there is no standard orthography to transcribe a spoken language, it is often code-switched with English or French, found within multi-lingual streams of social data, and it is under-resourced lacking the classical NLP tools such as lexicons, stemmers, parsers, and labelled datasets. I talked about a pilot case study that we conducted last year to analyse the usage of Arabizi in Twitter data, and then moved on to address the challenges to process and extract sentiment from Arabizi. I presented some preliminary results and my current and future line of research. I thank Rania Islambouli, Omar Farhat, and Omar Osman for their contributions in annotating a Lebanese dialect Arabizi dataset that is going to be prepared and released publicly soon on project-rbz.com. You can watch the talk (20 min) via the link below. 

Related Links: