Skip to main content
Fig. 3 | Crime Science

Fig. 3

From: Unsupervised identification of crime problems from police free-text data

Fig. 3

Topic number determination through a two-step search process. a Long range search: a number of LDA models were generated using a range of topic (k) numbers (starting at 2, ending at 100, stepping by 5 each time) and coherence scores calculated and plotted. k = 17 was selected (shown by red line) as the topic number that yielded the highest Cv coherence score. b) Narrow range search: To ensure this topic number consistently gave a high coherence score LDA models were trained in multiple runs (× 10) for each topic number and coherence scores plotted as a box and whisker plot. During this stage the selected k number from the long range search was compared with models trained with topic numbers immediately within 6 steps greater than the long range selected k. From this narrow range search topic number 21 was identified as the highest and most consistent and selected for further experiments. These are plotted as box and whisker plots, with the median shown in yellow. The red box indicated the distribution of the final selected k topics number

Back to article page