The first global datathon to fight the propaganda in the news finished last night. Hack the News Datathon was co-organized by the Data Science Society and the Qatar Computing Research Institute, HBKU, which were further supported by A Data Pro, who took care of the data annotation. More than 300 data science enthusiasts, experts, and scholars from over 50 countries around the world registered to participate in the challenge, which ran over the weekend 25-27 January, 2019. The final presentation took place on January 29, 2019, when the winners were announced.
“Having a practical tool that can detect the use of propaganda in the news is important and may impact the way readers consume news in the future.”, adds Dr. Giovanni da San Martino, datathon co-organizer from QCRI.
The event was co-organized by the Data Science Society and the Qatar Computing Research Institute (QCRI), HBKU. The data hackathon was held online via a dedicated platform and was hosted on-site in Doha, Bangalore, Riyadh as well as in Sofia in partnership with Sofia Development Association. The effort was part of QCRI’s Tanbih project, which is developed in collaboration with MIT-CSAIL, and aims to uncover stance, bias, and propaganda in the news, thus limiting the effect of “fake news”.
The Datathon gathered domain experts and data scientists who teamed up in 40 groups for the development of an intelligent system that follows three levels of difficulty with the goal to detect propaganda at a sentence and article level and to recognize the type of propaganda. The propaganda instances used for the dataset of the Hack the News Datathon were annotated by the global media and business intelligence provider A Data Pro.
“I expect that this datathon will start a very promising research direction in the fight against propaganda.”, says Dr. Preslav Nakov, datathon co-organizer from QCRI. “Similarly to fighting spam, fighting disinformation is an adversarial problem, where malicious actors constantly change and improve their strategies.”
The Datathon was indeed the start of the fight against propaganda via AI. Most of the teams concentrated on the classification tasks where the participants showed some remarkable results in the short amount of time they had. The accuracy of their algorithms was evaluated through a specially developed leaderboard to rank participants according to the quality of their predictions and algorithms.
The teams who reached the finals thanks to their highest scores on the Datathon’s leaderboard for the classification tasks, were team Data Exploiters from Qatar, team Stark and team Astea-Wombats from Bulgaria, team Data_Titans and Data Monks from India, team Lama from Turkey and team Leopard with participants from Italy and UK.
“Nice exploration of various BERT and ElMo models, and good results. I like it that there are results for different models.” comments Dr. Nakov on the Data Exploiters team article.
Other models for solving the classification tasks of the challenge included using vectorizers such as Count and TFIDF, machine learning algorithms – Logistic Regression, Decision Tree, Random Forest, SVM, and Naive Bayes, and accuracy measures like F1 Score.
“Good work. It is nice that you show that a simple model can achieve top performance in this task” says Dr. Barrón-Cedeño from Qatar University to team Stark who achieved the best score for the first level of the tasks and proved that complicated algorithms are not always performing the best.
The third task was with the highest difficulty and thus fewer teams competed to find its solution. However, there were some impressive results by team FlipFlops and team Antiganda from Bulgaria and team PIG (Propaganda Identification Group) from Germany who qualified for the finals.
The second stage of the challenge required the finalists’ teams to make a video presentation of their algorithms for the international jury to take the final decision who will be the winners. Some of the most world-renowned experts in the field of NLP and Text Mining gave their vote. Among them were Mitra Mohtarami and Ramy Baly from MIT, Pavel Nikolov from TU Sofia, Gian Marco De Francisco from ISI Foundation, Boryana Pelova from Sofia University, Iryna Gurevych from Technische Universität Darmstadt and Peter Cochrane from The University of Suffolk UK. The core team behind the task definition and the data preparation also participated in the final decision.
The finalists’ teams had one day to prepare their presentations and to answer questions raised by Dr. Giovanni da San Martino, Dr. Alberto Barrón-Cedeño, and Dr. Preslav Nakov from Qatar Computing Research Institute, HBKU, Dr. Laura Tolosi-Halacheva and Viktor Senderov who were the initiators of the case and the Datathon.
The third place was taken by team Lama from Istanbul.
“Very good job. It is good to see how BERT performs in brand-new tasks.” – said Ramy Baly to the team’s effort.
The second place went to the Bulgarian team Astea Wombats where Laura Tolosi was particularly happy to see a combination of different models at work and the use of external resources.
The first place and more than 4000 USD in cash and services were for the Propaganda Identification Group (PIG) team from Germany who used Convolutional neural network (CNN) and Long short-term memory (LSTM) to identify propaganda.
“You are a clear winner in the hardest Task 3, and you did reasonably well in Task 1” – said Viktor Senderov as a comment to their article.
The money for the cash award was raised via a crowdfunding campaign. This was the first crowd-funded Datathon and we are extremely thankful to all that contributed to the cause. We would not have succeeded without you!
The Data Science Society as a global community of people involved in the Data Science and supported by data-driven companies such as Telelink, WorldQuant, A Data Pro, GemSeek and Ontotext aims to motivate people to improve themselves in the field. Hack the News Datathon was one more good example of how people can learn by doing and experimenting with real data.
“We hope that this datathon will contribute to breaking the bad practices of how news is presented and read. We stand behind the open-source culture, and thus all models developed during the datathon and the dataset will be made publicly available, so that they can be used by anyone interested, including researchers, businesses, and even individual news consumers.” says Sergi Sergiev, founder of the Data Science Society.
All the work on fighting against propaganda with AI – and that includes the programming code – is officially published and shared on the Society’s data platform with all people who have an interest in the field of NLP and propaganda identification.
Data Science Society plans their next Global Datathon for 12-14 April which will give the chance to participants to explore many cases from multiple industries. The datathon will be challenging and therefore, for all curious minds who have just started to develop their data skills in the domain of Text-mining, the Society plans one more great initiative- an educational Data Monthly Challenge which will be online and free of charge. The educational challenge is not a competition but training for you. It starts from 26th of April and several universities from India, Bulgaria and France confirmed their participation. You are welcome to participate as well!