Link prediction algorithms to enhance criminal network analysis
Link prediction are common exercises in network analysis (1). In general, link prediction aims at identifying possible unknown connections among the nodes of a network due to limited data or predicting the links in the future based on the current observation of a network. Normally, link prediction approaches rely on the existing information on the topology of a network to predict possible ties between nodes in the network. However, there is a variety of techniques and approaches to identify the possible links (2,3). Also, the type of relations observed may influence the shape of a network and, consequently, the way in which links are established. In other words, link prediction in a global network of airports may follow different rules than predicting a link in a network of friends.
In criminal networks, link prediction may help investigators identifying possible connections between two individuals which have so far remained hidden or unobserved. Link predictions may indicate investigators possible connections which have so far remained.
The task, however, is particularly challenging. Investigation data is naturally noisy (e.g. irrelevant information, unclear recordings) and incomplete (e.g. criminals avoiding easily detectable communications). These issues may potentially affect the precision and robustness of link prediction algorithms. If the observed information is incomplete, it may bias the predicted links.
It is not surprising, therefore, that applications of link prediction to criminal networks are in their infancy. Very few studies applied link prediction methods to criminal networks (4,5). It is therefore important to reflect on the effectiveness and robustness of link predictions in criminal networks.
A recently published study has explored the robustness of different link prediction strategies across different types of network data. The study is authored by Francesco Calderoni (from Transcrime and Università Cattolica del Sacro Cuore of Milan, partner of Roxanne) and Salvatore Catanese (University of Messina) Pasquale De Meo (University of Messina), Annamaria Ficara (University of Palermo), and Giacomo Fiumara (University of Messina) (6). The study explored the performance of different link prediction algorithms on a mafia organization, based on information extracted from judicial documents of operation “Montagna”. The four-year investigation by the Italian law enforcement agencies targeted 52 individuals charged with participation the Sicilian Mafia. The data were made publicly available for further use and exploration (7,8).
The data extracted enabled to reconstruct two networks: the network from meetings (Fig.1) and the network from telephone calls (Fig. 2), respectively. The figures show that the two networks have a different topology or structure: the meeting network is denser whereas the telephone network is sparser. This is due to many factors, but especially to the nature of the two types of communications. While it is possible and frequent to meet with several people at the same time, most telephone calls normally involve only two speakers.
Figure 1. The meeting network in operation Montagna.
Note to the figure: Nodes which represent the members of the ‘‘Mistretta” family and the ‘‘Batanesi” family are highlighted in violet and orange, respectively. Circled nodes correspond to the subjects investigated for having promoted, organized, and directed the mafia association (leaders). The red and yellow circled nodes refer to bosses of Mafia families of other districts. The white knots represent the other subjects considered to be: (i) close to the association and (ii) not classifiable in any of the previous categories, but nevertheless useful for the purposes of the Mafia-type association and the realization of its plans. The width of the edges is proportional to the number of meetings and the size of the nodes to their degree.
Figure 2. The telephone call network in operation Montagna.
Note to the figure: Color and line coding as in the previous figure.
On these networks, the study conducted two experiments. First, it applied several link prediction algorithms and observed that link prediction algorithms leveraging the full graph topology (such as the Katz score) provide accurate results even on very sparse networks such as the telephone call network. Second, it examined how the noisy and incomplete nature of criminal networks may affect the accuracy of link prediction algorithms through extensive simulations. The experimental findings showed that the meeting network slightly differs from graphs in which up to 15% of new edges were added. On the contrary, the telephone call network exhibits strong differences between the observed and the new experimentally enhanced network. This may suggest that the available information on the telephone call may be incomplete and may need some more edges which were neglected or hidden during the investigations.
Overall, the study suggests that the soundness of link predictions is relatively high provided that only a limited amount of knowledge about connections is hidden or missing, and the unobserved edges follow some kind of generative law. The different results on the meeting and telephone call networks indicate that the specific features of a network should be taken into careful consideration.
The results have implications also for Roxanne project. For effective and robust link prediction tools, it is important that the input data is reasonably complete (no partial input and no systematic removal of specific actors). Furthermore, the amount of observed links must be relatively high, otherwise the missing information may led to inaccurate predictions.
References
- Link prediction. In: Wikipedia [Internet]. 2020 [cited 2020 Oct 6]. Available from: https://en.wikipedia.org/w/index.php?title=Link_prediction&oldid=976910832
- Martínez V, Berzal F, Cubero J-C. A Survey of Link Prediction in Complex Networks. ACM Comput Surv. 2016 Dec 20;49(4):69:1–69:33.
- Pandey B, Bhanodia PK, Khamparia A, Pandey DK. A comprehensive survey of edge prediction in social networks: Techniques, parameters and challenges. Expert Systems with Applications. 2019 Jun 15;124:164–81.
- Berlusconi G, Calderoni F, Parolini N, Verani M, Piccardi C. Link Prediction in Criminal Networks: A Tool for Criminal Intelligence Analysis. PLOS ONE. 2016 Apr 22;11(4):e0154244.
- Crandell I, Korkmaz G. Link Prediction in the Criminal Network of Albuquerque. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 2018. p. 564–7.
- Calderoni F, Catanese S, De Meo P, Ficara A, Fiumara G. Robust link prediction in criminal networks: A case study of the Sicilian Mafia. Expert Systems with Applications. 2020 Dec 15;161:113666.
- Cavallaro L, Ficara A, Meo PD, Fiumara G, Catanese S, Bagdasar O, et al. Disrupting resilient criminal networks through data analysis: The case of Sicilian Mafia. PLOS ONE. 2020 ago;15(8):e0236476.
- Ficara A, Cavallaro L, De Meo P, Fiumara G, Catanese S, Bagdasar O, et al. Social Network Analysis of Sicilian Mafia Interconnections. In: Cherifi H, Gaito S, Mendes JF, Moro E, Rocha LM, editors. Complex Networks and Their Applications VIII. Cham: Springer International Publishing; 2020. p. 440–50. (Studies in Computational Intelligence).