THE PURPOSE OF DEEP LEARNING MODEL USING EMBEDDING TECHNIQUE IN ARABIC SENTIMENT ANALYSIS

AOUMEUR, Nour El Houda

Veuillez utiliser cette adresse pour citer ce document : http://dspace.univ-tiaret.dz:80/handle/123456789/14321

Titre:	THE PURPOSE OF DEEP LEARNING MODEL USING EMBEDDING TECHNIQUE IN ARABIC SENTIMENT ANALYSIS
Auteur(s):	AOUMEUR, Nour El Houda
Mots-clés:	Arabic Sentiment Analysis, Classical Arabic, Feature Extraction, Machine Learning, Word2Vec, Deep learning.
Date de publication:	2-déc-2023
Editeur:	Université IBN KHALDOUN-Tiaret
Résumé:	Social media, widely used by Internet users to express their opinions on a given topic, has become one of the main sources of information for analysts. Sentiment analysis (SA) is a growing area of research of natural language processing (NLP) and machine learning (ML) tools to identify and label opinion text. Sentiment analysis is an important task in fields related to data analysis and information mining. In this study, the books of the most famous Arab authors were read and each sentence was manually extracted and labeled. This research aimed to generate a new Classical Arabic dataset (CASAD). In addition, feature extraction from these datasets is generated using word embedding techniques equivalent to Word2vec, which can extract deep relationships representing features of formal Arabic languages. Some machine learning techniques, such as support vector machine (SVM), logistic regression (LR), naive bayes (NB), K-nearest neighbor (KNN), latent Dirichlet allocation (LDA), and classification tree and regression are used to evaluate the features for classical Arabic (CART). In addition, statistical techniques such as validation and reliability are used to evaluate the labels of this dataset. Finally, using six machine learning algorithms for 10-fold cross-validation, our tests evaluated the classification rate of the feature extraction matrix into two and three classes, and the results showed that the Logistic regression with Word2Vec was the most effective in predicting the occurrence of polarizing topics.
Description:	وسائل التواصل الاجتماعي، التي يستخدمها مستخدمو الإنترنت بشكل واسع للتعبير عن آرائهم حول مواضيع هو مجال بحث نامٍ (SA) معينة، أصبحت واحدة من المصادر الرئيسية للمعلومات لدى المحللين. تحليل المشاعر لتحديد وتصنيف النصوص التي تحمل آرا ء . (ML) وأدوات تعلم الآلة (NLP) في مجال معالجة اللغة الطبيعية إن تحليل المشاعر مهم في المجالات المتعلقة بتحليل البيانات وتنقيب المعلومات. في هذه الدراسة، تم قراءة كتب أشهر الكتَّاب العرب، وتم استخراج كل جملة يدوي ا وتسميتها. كان هدف هذا بالإضافة إلى ذلك، يتم إنشاء استخراج (CASAD). البحث إنشاء مجموعة بيانات عربية كلاسيكية جديدة التي يمكنها استخراج ، Word2vec الميزات من هذه المجموعات باستخدام تقنيات التضمين الكلموية المعادلة ل العلاقات العميقة التي تمثل ميزات اللغات العربية الفصحى. يتم استخدام بعض تقنيات تعلم الآلة، مثل آلة الدعم وتوزيع ، (KNN) وأقرب الجيران ، (NB) والبايز الساذج ، (LR) والتحليل اللوجستي ، (SVM) النوعي لتقييم الميزات للعربية الكلاسيكية. بالإضافة (CART) وشجرة التصنيف والتحليل ، (LDA) لاتنت ديريكليه إلى ذلك، يتم استخدام تقنيات إحصائية مثل التحقق والموثوقية لتقييم تسميات هذه المجموعة البيانات. وأخي را، باستخدام ستة خوارزميات لتعلم الآلة في التحقق المتقاطع لعشر مرات، قامت اختباراتنا بتقييم معدل التصنيف Word2Vec لمصفوفة استخراج الميزات في فئتين وثلاث فئات، وأظهرت النتائج أن التحليل اللوجستي بتقنية كان الأكثر فعالية في توقع حدوث مواضيع تحمل طابع القطبية.
URI/URL:	http://dspace.univ-tiaret.dz:80/handle/123456789/14321
Collection(s) :	Doctorat

Fichier(s) constituant ce document :

Fichier	Description	Taille	Format
AOUMEUR NOUR EL HOUDA .02.12.2023.pdf		2 MB	Adobe PDF	Voir/Ouvrir

Affichage détaillé