Social Network Analysis for Criminology in ROXANNE

Social network analysis provides an essential set of analytic tools to study people’s behavior: Social influence analysis, community detection, link prediction, and cross-network analysis are examples of such tools. Law enforcement agencies can benefit from them in criminology. In ROXANNE, we focus on several real crime cases and build a specialized tool for criminology that includes the state-of-the-art methods for each of these options.

Social Network Analysis (SNA) is a way of understanding human behavior through people’s relations and interactions. In criminology, social network analysis views social relationships in terms of network theory, consisting of nodes representing individual actors within the network and edges representing relationships between the individuals such as offender movement, co-offenders, and crime groups. The adequacy of SNA to law enforcement hinges on the fact that knowing who a person associates with can aid in predicting that person’s future movements since crime and victimization are not randomly distributed across people or places. Moreover, victims and offenders are often connected in multiple ways and play varying roles in criminal events (such as a victim, offender, co‐offender, or witness — constantly swapping in different circumstances) and in daily social life (such as an acquaintance, family member, or partner). Given that crime and victimization are both embedded within larger social networks, SNA has widespread applications in the field of crime analysis. In ROXANNE, our goal is to further investigate different SNA concepts and employ state-of-the-art methods to analyze criminal networks (Figure 1). Among the so far implemented concepts and methods, we explain social influence analysis, community detection, link prediction, and cross-network analysis in this post.



Figure 1: Network analysis scheme in ROXANNE


Social Influence Analysis 

We focus on the overall global influence of each individual over other individuals within the criminal social network. We assign each individual a relative importance score using centrality measures that estimates its influence compared to other individuals. Typically, individuals’ influence in a general network can be measured by various metrics; each has its own intuition and thus quantifies a particular aspect of the individuals’ importance within the network. In ROXANNE, we have implemented several of these centrality measures, including degree, closeness, betweenness, PageRank, and Hub score. Figure 2 highlights the most influential nodes in an anonymized criminal network with 124 nodes and 178 edges on individuals' communications via SMS and phone calls in a real investigation case.



Figure 2: Most influential nodes in a mobile network


Since we do not have access to the ground truth in practice most of the time, social influence analysis methods are mainly evaluated qualitatively by manually examining the correlation and influence (if any) between their behaviors and the behaviors of other individuals interacting with them.


Community Detection

Usually, individuals within a network form a cohesive group whose intra-community interaction is denser and more frequent than their interaction with the rest of the network. However, these community structures are hidden since they are not well defined, and individuals do not publicly reveal their community membership. Using attributes and meta-data, we uncover those hidden structures by deploying several established methods of community detection. Figure 3 shows the results of community detection on the same network as in Figure 2. In ROXANNE, we have implemented various community detection methods such as the K-clique-based method, Spectral clustering, Matrix Factorization, and Hierarchical clustering methods.



Figure 3: Main communities in a mobile network


Link Prediction

People interact and communicate with each other through many channels that are not always observable. They also form new relations and interactions over time. Compared to the universe of suspects’ interactions with other people, the law enforcement agencies typically gather only a portion of these. Also, criminals have everyday lives with interactions with family, friends, partners, colleagues, who are innocent people and it is not ethical to include their personal information in these investigations. These interactions take several forms, which can hardly be exhaustively mapped by law enforcement. It is pretty crucial in the context of criminology to uncover the missing information and predict such progress in the network to prevent further crimes. The process of “skimming” information during a criminal investigation may risk that relevant links may be lost or missing. Studies in the field showed that the criminal investigation process is relatively consistent in maintaining information about the main suspects. In contrast, data on marginal suspects may be filtered out and lost. In ROXANNE, we have implemented various similarity scores to measure the likelihood of the formation of an edge; these include Jaccard similarity, Adamic Adar, Soundarajan-Hopcroft similarity, Preferential attachment similarity. Figure 4 presents link prediction results for a single node.   



Figure 4: Results of link prediction for the node SID_25 in the mobile network. Yellow dashed edges illustrate the results of the link prediction method based on Jaccard similarity.


Cross-network Analysis

Most criminal network studies rely on a single source of data, for example, telephone logs, wiretapped calls, meetings, co-offending records, police stop records. In the existing studies, scholars have analyzed different relations extracted from the same data source (e.g., Calderoni & Superchi compared wiretap networks and meeting networks derived from arrest warrants) or from various data sources (Rostami & Mondani compared communications extracted from intelligence, surveillance, and co-offending data). Identifying the same entities from multiple heterogeneous networks is described in the network analysis literature as the user identity linkage across networks. Cross-network entity matching also referred to as “Cross-Domain Entity Resolution” or “Entity Linkage”, is a problem of finding entries related to the same entity across different data sources. It is employed to join different data sets or networks based on information about entities that can deviate from observed and recorded properties (location, name, time) across datasets such as investigation cases in ROXANNE. As a consequence of the general approaches, cross-network entity matching is rarely considered a distinct challenge. In few multi-modal sources studies, entity matching is manually conducted through the names or other unequivocal unique identifiers presented in the data.

In ROXANNE, the cross-network entities matching module includes three components:

(1) Graph Sampling: Currently, due to the lack of ground truth, we use graph sampling approaches to separate the nodes from an original network into two sub-networks with several overlapping nodes. We implement three methods for node sampling, including Degree Based Node Sampler,  Random Node Sampler, and Page-rank Based Node Sampler.

(2) Node Embeddings for Features Extraction: We examine typical node embedding methods for extracting features vectors, including DeepWalk, Node2Vec, SDNE.

(3) Custom Neural Network for Matching Relevant Nodes across Sub-Networks: our model learns a mapping function that could ideally transform the feature vector of an overlapping node in the first graph to its equivalent feature vector in the second graph. This is done by applying the triplet loss function in the training step.


Ethical Implications of Network Analysis

Network analysis methods can vastly improve the measurement of key concepts associated with the social environment of potential or current offenders in criminological research. However, there are numerous conditions to trust on network techniques for crime data, which we try to mitigate their associated risks through feedbacks from other work packages:

  • Most official co-offending data fails to consider the larger environment in which offenders are embedded. They fail to consider offenders who may have been instrumental in a crime but not detected and all of those offenders involved in undetected crimes. Thus, one of the questions is whether and to what extent the patterns in co-offending found in official data translate to undetected crimes.
  • Wiretap data perhaps provide the opposite problem: group members must speak without fear of being heard. There must not be a significant number of group members missing from the data, and a large number of several different types of conversations must be available. Network data, like any other data, must be subject to rigorous quality controls. Furthermore, network analysis potentially includes innocent people or may find an unnecessary, intrusive amount of information about the personal lives of a suspect when that information is not needed in an investigation.
  • Another risk is the creation of potentially artificial subgroups. While community detection methods are helpful to understand the social structure of a network, they may create the false illusion of clearly defined groups within a network when none of the individuals consider themselves actual “members” of a group.  
  • Network data alone, without the benefit of context or nodal attributes, results in helpful but sometimes tentative conclusions. For example, network visibility, as measured by centrality, need not be automatically associated with a leadership position or high social status in the network.
  • Much of the available network data is biased towards representing the network of specific individuals instead of others. The interpretation of results must preserve the nuances introduced by such biases. The network patterns emerging from one’s ego network should only be interpreted with that individual alone.



  1. Easley, David, and Jon Kleinberg. Networks, crowds, and markets. Vol. 8. Cambridge: Cambridge university press, 2010.
  2. Tang, Jie, et al. "Social influence analysis in large-scale networks." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009.
  3. Tang, Lei, and Huan Liu. "Community detection and mining in social media." Synthesis lectures on data mining and knowledge discovery 2.1 (2010): 1-137.
  4. Gergely Palla, Imre Derényi, Illés Farkas1, and Tamás Vicsek, Uncovering the overlapping community structure of complex networks in nature and society Nature 435, 814-818, 2005.
  5. Ng, Andrew Y., Michael I. Jordan, and Yair Weiss. "On spectral clustering: Analysis and an algorithm." Advances in neural information processing systems. 2002.
  6. Yang, Jaewon, and Jure Leskovec. "Overlapping community detection at scale: a nonnegative matrix factorization approach." Proceedings of the sixth ACM international conference on Web search and data mining. 2013.
  7. Berlusconi, Giulia, et al. "Link prediction in criminal networks: A tool for criminal intelligence analysis." PloS one 11.4 (2016).
  8. Adamic, Lada A., and Eytan Adar. "Friends and neighbors on the web." Social networks 25.3 (2003): 211-230.
  9. Soundarajan, Sucheta, and John Hopcroft. "Using community information to improve the precision of link prediction methods." Proceedings of the 21st International Conference on World Wide Web. 2012.
  10. Barabâsi, Albert-Laszlo, et al. "Evolution of the social network of scientific collaborations." Physica A: Statistical mechanics and its applications 311.3-4 (2002): 590-614.
  11. Calderoni, F., & Superchi, E. (2019). The nature of organized crime leadership: Criminal leaders in meeting and wiretap networks. Crime, Law and Social Change72(4), 419-444.
  12. Rostami, A., & Mondani, H. (2015). The complexity of crime network data: A case study of its consequences for crime control and the study of networks. PloS one10(3), e0119309.
  13. Adamic, L. A., Lukose, R. M., Puniyani, A. R., & Huberman, B. A. (2001). Search in power-law networks. Physical review E64(4), 046135.
  14. Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: sampling properties of networks. Proceedings of the National Academy of Sciences102(12), 4221-4224.
  15. Leskovec, J., & Faloutsos, C. (2006, August). Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 631-636).
  16. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710).
  17. Grover, A., & Leskovec, J. (2016, August). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 855-864).
  18. Wang, D., Cui, P., & Zhu, W. (2016, August). Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1225-1234).