Cluster analysis of regional indicators using DBSCAN algorithm
025
Keywords:
region, correlation coefficient Phi_K, clustering, DBSCAN algorithm, CART algorithm, classification treeAbstract
Regional economies are playing an increasingly important role in the development of the national economic complex of the country. The disproportionate development of economies at the mesolevel is associated with a number of risks affecting various markets and industries, which in turn necessitates the development of effective methods for identifying regional clusters and the search for effective methods for assessing the interconnections of regional economic determinants. To conduct the study, the authors accumulated data on 25 indicators reflecting the investment, resource, production and financial performance components of the socio-economic development of Russian regions. Applying machine learning algorithms such as XG Boost, Gradient Boosting, CART, we identified the most significant factor for assessing regional sustainability and established the regional development indicators associated with it by calculating the non-linear correlation coefficient Phi_K. The use of the DBSCAN algorithm allowed us to identify two regional clusters, while per capita consumption, the level of demographic load and urbanization were significant factors for the clustering of regions. The significance of the criteria for combining regions into clusters using the DBSCAN method was established using the construction of a classification tree.