Discovering criminal and terrorist networks is the primary task of law enforcement and intelligence agencies across Europe.

Criminals and terrorists use voice communication over different media. While the personal communication within their networks is usually performed within standard mobile networks and VoIP-type communications (skype, Google Hangouts, others), there might be a significant share of voice in public media, typically hate or propaganda speech on Youtube, Facebook or other social media channels. Determining and tracking target identities across such channels is extremely difficult, and speaker identification (SID) techniques (such as investigated in the European SiiP project) might not be effective in such challenging environments considering isolated data from one speaker only.

Link analysis (data-analysis techniques used to evaluate relationships/connections between nodes) has long been used for both intelligence and investigation work. At the end of the day, LEAs are not interested in independent individuals, but in the whole criminal or terrorist networks. The situation can be compared to the early days of Internet search - Altavista, Excite and others had some results and market uptake, but the whole domain changed when Google started to exploit the relations between web-pages (TF/IDF metrics, PageRank, etc.). In this project, we expect a similar break-through.

Consider the following example: with the current best SID technology, with an equal error rate of 1%, we obtain 5 false alarms and 5 misses on a set of 500 analyzed recordings. These can be audited by human experts. For analysis of millions of recordings from different media, such techniques are not powerful enough and we are convinced that only with link analysis, the field can make a significant leap forward.

This project proposes to combine the strengths of speaker data mining and link analysis to provide LEAs an efficient tool to track and uncover criminals and terrorists. The project will not process speaker data separately, but:

Make massive use of conversational nature of speech data - in case we know that A speaks often to B, then detecting A on one side of the call will automatically increase the prior probability of B even if the acoustic evidence is not reliable (due for example to illness, channel change or noise). A reliable diarization (determining who spoke when in the conversation) will be developed in the project as a crucial component for this analysis.

Use of call content. Standard text-independent speaker identification ignores the content of the call, while a simple sentence “Peter speaking” heard on two different calls can completely change the game. It is out of scope of the project to develop perfect speech-to-text (S2T) engines for all possible languages but commercially available ones will be deployed to generate relevant content information and combine it with acoustic speaker information. For languages with missing S2T, language-independent techniques such as universal phoneme sets or automatically determined acoustic units (AUD) will be used.

Meta-information is crucial for link analysis. Some of it is available (phone and IMEI numbers, geographical information, time-stamps) but the targets are aware such information is collected and have developed ways to falsify or obscure it (one-shot usage of prepaid SIM cards, use of Internet anonymization services, etc). Significant amount of meta-information can however be automatically extracted from the speech signal – for example, automatic detection of age, gender and accent of call participants. For example, identification of a pimp in an illegal child sexual network can be helped by the fact that his calls are predominantly to persons in <20 age category. Another interesting meta-information is the environment – even if the speaker changes his cell phone number every day, he is not likely to change his favorite car. Detecting that a call took place in given car can help the investigation.

By time-relation analysis, a classical problem of speaker recognition (speaker speaking very little in a call) can be turned into an advantage, as this speaker can simply be identified by the fact that he is speaking little. Hierarchy and trust can be also partially inferred from this analysis.

Data will be crucial for project success. As this project can not count on huge amounts of real investigation or wire-tap data, most of the R&D work will be done on data from public resources: media and social networks. However, we count on exercises performed on real data by LEAs participating in the consortium, that will provide the developers a valuable feedback.

The result of the project will be a prototype of system capable of

ingesting a significant amount of voice data from different media, along with meta-information.

analyzing this data in unsupervised or lightly supervised way.

Presenting the resulting network analysis and converting it to forms integrable with standard investigation SW solutions, such as IBM i2 Analyst Notebook.

Approach with Underlying Technologies

  1. Speech processing which will involve multiple technologies: (i) Speaker Identification to establish relations between different audio sources and to potentially determine whether a speaker is among a set of known individuals, (ii) Multilingual automatic speech recognition for rapid and accurate speech-to-text processing of raw audio materials.
  2. Natural language processing to identify entities such as locations, persons and companies from multilingual textual input.
  3. Video and geographical meta-data processing to make use of other visual and spatial information (such as identifying faces, places, backgrounds or a geographical position) which may accompany the auditory and textual data.
  4. Network analysis to establish connections of these results, enrich them with data from other available sources and a-priori knowledge, and analyse the final network for sense making of the cases.

Objectives of ROXANNE

  1. Develop a ROXANNE analytics platform enhancing investigation capabilities especially for large criminal cases.
  2. Improve identification of persons of interest by developing a two-way interface between multimodal (speech/text and video processing) technologies and criminal network analysis.
  3. Enhance criminal network analysis technology to significantly reduce network size and to aid decision making by police practitioners.
  4. Develop a dashboard for visualisation of investigation output and integrate with existing tools.
  5. Deploy and evaluate the ROXANNE platform on real criminal cases, help the LEAs adopt the technology in their daily work.
  6. Comply with EU and INTERPOL legal and ethical frameworks through internal expert analysis and external ethics and stakeholder boards.

Related Projects

coming soon

Consortium Partners


























Overall Framework