A study of cross-linguistic sequential type features based on decision tree model

Authors

  • Zijun Wang

Keywords:

decision tree model, OBB evaluation, common correlation coefficient, quantifiers of language order, cross-lingual, classification efficiency

Abstract

At present, with the development of information technology and computer science, decision trees, random forest quantitative research methods and multidimensional research perspectives all play an increasingly important role in linguistic typology research. In this paper, we propose four methods to calculate the classification weights of decision trees, including OBB evaluation, sample data correlation coefficient evaluation, chi-square evaluation and mutual information evaluation, through which the computational classification of a single decision tree is achieved. The final results are then obtained by involving all decision trees in the classification, which effectively avoids the problem of overfitting and the relative independence of constructing decision trees is suitable for parallel computation to improve the classification efficiency of the model. Based on the decision tree model, the cross-linguistic sequential type features within the Indo-European language family are classified. The results show that the common correlation coefficient of the decision tree model is 0.85, and the dominant sequential information of the random forest model based on the weighted decision tree is exactly the same as that of the dominant sequences in WALS, with an accuracy rate of 100%, and can distinguish the languages of each language family within the Indo-European family well. This study is well applied to the study of sequential typology and can accurately capture cross-linguistic sequential features.

Downloads

Published

2023-07-01

Issue

Section

Articles