MEMPREDIKSI PENINGKATAN H-INDEKS UNTUK JURNAL PENELITIAN DENGAN MENGGUNAKAN ALGORITMA COST-SENSITIVE SELECTIVE NAIVE BAYES CLASSIFIERS

Reycardo Henglie; Yunianto Purnomo; Jusia Amanda Ginting

doi:10.30813/j-alu.v7i1.6028

MEMPREDIKSI PENINGKATAN H-INDEKS UNTUK JURNAL PENELITIAN DENGAN MENGGUNAKAN ALGORITMA COST-SENSITIVE SELECTIVE NAIVE BAYES CLASSIFIERS

Reycardo Henglie, Yunianto Purnomo, Jusia Amanda Ginting

Abstract

Machine learning community is not only interested in maximizing classiﬁcation accuracy, but also in minimizing the distances between the actual and the predicted class. Some ideas, like the cost-sensitive learning approach, are proposed to face this problem. In this paper, we propose two greedy wrapper forward cost-sensitive selective naive Bayes approaches. Both approaches readjust the probability thresholds of each class to select the class with the minimum-expected cost. The ﬁrst algorithm (CSSNB-Accuracy) considers adding each variable to the model and measures the performance of the resulting model on the training data. The variable that most improves the accuracy, that is, the percentage of well classiﬁed instances between the readjusted class and actual class, is permanently added to the model. In contrast, the second algorithm (CS-SNB-Cost) considers adding variables that reduce the misclassiﬁcation cost, that is, the distance between the readjusted class and actual class. We have tested our algorithms on the bibliometric indices prediction area. Considering the popularity of the well-known h-index, we have researched and built several prediction models to forecast the annual increase of the h-index for Neurosciences journals in a four-year time horizon. Results show that our approaches, particularly CS-SNB-Accuracy, achieved higher accuracy values than the analyzed cost sensitive classiﬁers and Bayesian classiﬁers. Furthermore, we also noted that the CS-SNB-Cost always achieved a lower average cost than all analyzed cost-sensitive and cost-insensitive classiﬁers. These cost sensitive selective naive Bayes approaches outperform the selective naive Bayes in terms of accuracy and average cost, so the cost-sensitive learning approach could be also applied in different probabilistic classiﬁcation approaches.

Keywords

CSSNB-Accuracy, CS-SNB-Cost, bibliometric, clasification, predicted distances

Full Text:

PDF 643-651 (Indonesian)

References

S. Alonso, F.J. Cabrerizo, E. Herrera-Viedma, F. Herrera, h-index: a review focused in its variants, computation and standardization for different scientiﬁc ﬁelds, J. Informetr. 3 (4) (2009) 273–289.

S. Alonso, F.J. Cabrerizo, E. Herrera-Viedma, F. Herrera, hg-index: a new index to characterize the scientiﬁc output of researchers based on the h- and g-indices, Scientometrics 82 (2) (2010) 391–400.

O.K. Baskurt, Time series analysis of publication counts of a university: what are the implications? Scientometrics 86 (3) (2011) 645–656.

P.D. Batista, M.G. Campiteli, O. Kinouchi, A.S. Martinez, Is it possible to compare researchers with different scientiﬁc interests? Scientometrics 68 (1) (2006) 179–189.

L. Bornmann, R. Mutz, H. Daniel, Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine, J. Am. Soc. Inf. Sci. Technol. 59 (5) (2008) 830–837.

F.J. Cabrerizo, S. Alonso, E. Herrera-Viedma, F. Herrera, q2-index: quantitative and qualitative evaluation based on the number and impact of papers in the Hirsch core, J. Informetr. 4 (1) (2010) 23–28.

J.S. Cardodo, J.F.P. da Costa, Learning to classify ordinal data: the data replication method, J. Mach. Learn. Res. 8 (2007) 1393–1429.

K. Crammer, Y. Singer, Pranking with ranking, in: Advances in Neural Information Processing Systems, vol. 14, 2002, MIT Press, pp. 641–647.

P. Domingos, Metacost: a general method for making classiﬁers cost-sensitive, in: Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 1999, pp. 155–164.

C. Drummond, R. Holte, Exploiting the cost (in)sensitivity of decision tree splitting criteria, in: Proceedings of the 17th International Conference on Machine Learning, 2000, pp. 239–246.

D.O. Duda, P.E. Hart, Pattern Classiﬁcation and Scene Analysis, John Wiley,New York, USA, 1973.

L. Egghe, Dynamic h-index: the Hirsch index in function of time, J. Am. Soc. Inf. Sci. Technol. 58 (3) (2006) 452–454.

L. Egghe, An improvement of the h-index: The g-index, ISSI Newslett. 2 (1) (2006) 8–9.

L. Egghe, The hirsch-index are related impact measures, Annu. Rev. Inf. Sci. Technol. 44 (2010) 65–114.

L. Egghe, R. Rousseau, An informetric model for the hirsch-index, Scientometrics 69 (1) (2006) 121–129.

C. Elkan, The foundations of cost-sensitive learning, in: Proceedings of the Seventeenth International Joint Conference of Artiﬁcial Intelligence, 2001, pp. 973–978.

E. Frank, M. Hall, A simple approach to ordinal classiﬁcation, in: Proceedings of the 12th European Conference on Machine Learning, 2001, pp. 145–156.

E. Frank, S. Kramer, Ensembles pf nested dichotomies for multi-class problems, in: Proceedings of the 21st International Conference on Machine Learning, 2004, pp. 305–312.

J. Furnkranz, Pairwise classiﬁcation as an ensemble technique, in: Proceedings of the 13th European Conference on Machine Learning, 2002, pp. 97–110.

P.E. Hart, The condensed nearest neighbour rule, Trans. Inf. Theory 14 (1968) 515–516.

R. Herbrich, T. Graepel, K. Obermayer, Regression Models for Ordinal Data: A Machine Learning Approach. Technical Report 99-3, Department of Computer Science, Technical University of Berlin, 1999.

R. Herbrich, T. Graepel, K. Obermayer, Large margin rank boundaries for ordinal regression, in: Advances in Large Margin Classiﬁers, MIT Press, Cambridge, MA, 2000, pp. 115–132 (Chapter 7).

F. F. Tampinongkol, Y. Herdiyeni, and E. N. Herliyana, “Feature extraction of Jabon (Anthocephalus sp) leaf disease using discrete wavelet transform,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 18, no. 2, p. 740, Apr. 2020, doi: 10.12928/telkomnika.v18i2.10714.

D.W. Hosmer, S. Lemeshow, Applied Logistic Regression, 2nd ed., Wiley, New York, USA, 2000.

A. Ibáñez, P. Larrañaga, C. Bielza, Predicting citation count of bioinformatics papers within four years of publication, Bioinformatics 25 (24) (2009) 3303–3309.

K. L. Hartono and J. A. Ginting, “Implementation of web-based Japanese digital handwriting OCR using chain code and manhattan distance,” 2023, p. 020017. doi: 10.1063/5.0174709.

B. Jin, h-index: an evaluation indicator proposed by scientist, Sci. Focus 1 (1) (2006) 8–9.

S.B. Kotsiantis, Local ordinal classiﬁcation, in: Artiﬁcial Intelligence Applications and Innovations. International Federation for Information Processing, Springer, Athens, Greece, 2004, pp. 1–8.

S.B. Kotsiantis, P.E. Pintelas, A cost sensitive technique for ordinal classiﬁcation problems, in: Methods and Applications of Artiﬁcial Intelligence. Lecture Notes in Computer Science, Springer, Samos, Greece, 2004, pp. 220–229.

C. Herdian, S. Widianto, J. A. Ginting, Y. M. Geasela, and J. Sutrisno, “The Use of Feature Engineering and Hyperparameter Tuning for Machine Learning Accuracy Optimization: A Case Study on Heart Disease Prediction,” 2024, pp. 193–218. doi: 10.1007/978-3-031-50300-9_11.

G. Krampen, A. von Eye, G. Schui, Forecasting trends of development of psychology from a bibliometric perspective, Scientometrics 87 (2) (2011) 687–694.

W. Kruskal, W. Wallis, Use of ranks in one-criterion variance analysis, J. Am. Stat. Assoc. 47 (260) (1952) 583–621.

P. Langley, S. Sage, Induction of selective bayesian classiﬁers, in: Proceedings of the 10th Conference on Uncertainty in Artiﬁcial Intelligence, 1994, pp. 399–406.

H.T. Lin, L. Li, Reduction from cost-sensitive ordinal ranking to weighted binary classiﬁcation, Neural Comput. 24 (5) (2012) 1329–1367.

C.X. Ling, Q. Yang, J. Wang, S. Zhang, Decision trees with minimal costs, in: Proceedings of the 21st International Conference on Machine Learning, 2004, pp. 69–77.

H. Mann, D. Whitney, On a test of whether one of two random variables is stochastically larger than the other, Ann. Math. Stat. 18 (1) (1947) 50–60.

P. McCullagh, Regression models for ordinal data, J. R. Stat. Soc. Ser. B 42 (2) (1980) 109–142.

P. McCullagh, J.A. Nelder, Generalized Linear Models, Chapman and Hall, London, 1983.

M. Minsky, Steps toward artiﬁcial intelligence, IRE 49 (1) (1961) 8–30.

R. Potharst, J.C. Bioch, Decision trees for ordinal classiﬁcation, Intell. Data Anal. 4 (2) (2000) 97–112.

J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Francisco, USA, 1993.

F. Ruane, R.S.J. Tol, Rational (successive) h-indices: an application toeconomics in the Republic of Ireland, Scientometrics 75 (2) (2008) 395–405.

A. Shashua, A. Levin, Ranking with large margin principle: two approaches, in: Advances in Neural Information Processing Systems, vol. 15, MIT Press, Cambridge, MA, 2003, pp. 961–968.

V.S. Sheng, C.X. Ling, Roulette sampling for cost-sensitive learning, in: Proceedings of the 18th European Conference on Machine Learning. Lecture Notes in Computer Science, 2007, Springer, pp. 724–731.

A. Sidiropoulos, D. Katsaros, Y. Manolopoulos, Generalized hirsch h-index for disclosing latent facts in citation networks, Scientometrics 72 (2) (2007) 253–280.

M. Stone, Cross-validation choice and assessment of statistical predictions, J. R. Stat. Soc. 36 (1974) 111–147.

K.M. Ting, Inducing cost-sensitive trees via instances weighting, in: Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery, 1998, pp. 23–26.

P.D. Turney, Cost-sensitive classiﬁcation: empirical evaluation of a hybrid genetic decision tree induction algorithm, J. Artif. Intell. Res. 2 (1995) 369–409.

I.H. Witten, E. Frank, Data Mining—Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann Publishers, San Francisco, CA, 2005.

F.Y. Ye, R. Rousseau, The power law model and total career h-index sequences, J. Informetr. 2 (4) (2008) 288–297.

B. Zadrozny, C. Elkan, Learning and making decisions when costs and probabilities are both unknown, in: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining, 2001, pp. 204–213.

B. Zadrozny, J. Langford, N. Abe, Cost-sensitive learning by cost-proportionate instance weighting, in: Proceedings of the 3rd International Conference on Data Mining, 2003, pp. 435–442.

DOI: http://dx.doi.org/10.30813/j-alu.v7i1.6028

Refbacks

There are currently no refbacks.

p-ISSN 2620-620X
e-ISSN 2621-9840

Indexed By

Recommended Tools:

Dimension

Username
Password
Remember me

Jurnal Algoritma, Logika dan Komputasi

MEMPREDIKSI PENINGKATAN H-INDEKS UNTUK JURNAL PENELITIAN DENGAN MENGGUNAKAN ALGORITMA COST-SENSITIVE SELECTIVE NAIVE BAYES CLASSIFIERS

Abstract

Keywords

Full Text:

References

Refbacks