The Islamic State – Text Mining

Word Frequencies eBooks

I’ve become fascinated with the texts that the Islamic State has been recently publishing, especially after I discovered “Hijara to the Islamic State” and “The Islamic State” online. They were a bit hard to find, as the US and UK are trying to clamp down on both, understandably – but when I found “Hijara” and read through it, I had to laugh. Let’s put it this way: these books are clearly targeting… uh, how do we phrase this… naive teenagers to join them.

In addition to these three ebooks (including the “Revival of the Caliphate”, 2014), ISIS has also started regularly publishing emagazines – two, in fact. One, Dabiq, seems to target would-be jihadists abroad, while the other, the Islamic State Report, a weekly, appears to be a more general publication.

I’ve been playing around a little, teaching myself some basic text mining techniques using Python (I learned from here if you’re interested). Anyhow, I’ve processed the documents and these are the results. I present to you… the 50 most common words (excluding common particles such as “the” and “and”) used in Dabiq and ISIS eBooks, respectively:

 

Word Frequencies Dabiq Word Frequencies eBooks

 

Some pretty interesting stuff! There’s a few things to note here – first of all, my text mining abilities are pretty crude at the moment, so a lot of random stuff got slipped in. There’s the “t”, which shows up in both lists – this is just a result of the fact that my text mining script automatically removes apostrophes and other special characters where they appear and separate the word into parts (i.e. “don’t” will become “don” and “t”). This is useful sometimes, such as for web addresses (we don’t want http://www.amazon.com/products/als;dfjal88328 to count as a word, for example, or a twitter handle to count as one either). Because of this, we have some random other bits as well – such as “al”, which is simply ال , the Arabic equivalent of the word “the”, or “shi”, which is the first particle of the word “Shi’te”, i.e. one of the two major sects of Islam, the other being Sunni (which ISIS follows, by the way).

Anyhow, having done this, what conclusions can we make from this text mining analysis? Here’s a few conclusions:

1. ISIS is internet savvy.

OK, yes, we all know this. But this is simply more proof. Check it out: in ISIS eBooks, “com” and “http” make it into the top 50 easily. But what is this? Well, this mysterious “com” and “http” is actually simply counting instances of http://www.xxx.com! The fact that “com” and “http” are so high frequency is an indication of just how internet-savvy/based ISIS is, as we all know well. And of course, “twitter” is mentioned over a hundred times in these three eBooks. Go figure. (If you dig into the actual material, you’ll find that they advise you to contact this brother or that sister via twitter).

2. Dabiq is mainly propagating religious theology.

To be honest, I haven’t really read through any issues thoroughly. But from this, we can see that many of the most frequently appearing words are theocratically oriented: prophet, muslim, jihad, kufr. Contrast this to the eBooks, where words like “fight”, “weapons”, and “Iraq” pop up a lot more.

3. Dabiq mixes in a lot of Anglicized Arabic.

I.e. uses Arabic phrases transliterated into English, such as “wa” (و), the Arabic word for “and”, “sallam” (سلام), the Arabic word for “peace”, and so on. This, again, is not nearly as common in the eBooks.

4. ISIS eBooks are more practically focused, and oriented towards people who are more unfamiliar with Islam and Arabic.

From 2. and 3., we can see that Dabiq is more geared towards those familiar with ISIS religious ideology and Arabic – and also that the eBooks are less religious and more practically oriented.

5. ISIS is a group focused mainly on Iraq, Syria, and Shi’ites – and less on the West or America or the rest of the world.

The fact that “shi” (i.e. “shi’ites) occurs so frequently in the eBooks – and “America” or “Western” (which are words in these documents, by the way) don’t appear as much, seems to indicate that ISIS spends more of its time focusing on the Shi’ites. And of course, “Syria”, “Sham”, “Iraq”, and “Syrian” are some of the most common words in both Dabiq and the eBooks – no surprises there, but again, it seems to indicate that most of ISIS energy is being directed towards the internal struggle within Iraq and Syria against the Shi’ites – i.e. the Alawite regime of Bashar al-Assad in Syria, and the Iraqi coalition government in Iraq, dominated by Shi’ites. So, who’s ISIS’s worse enemy? America might be high up the list, but Iran is as well.

 

All in all, not too many original insights, but some pretty interesting confirmation of things we already know through some basic text mining. Stay tuned for more updates!

Leave a Reply

Your email address will not be published. Required fields are marked *