Topic modelling of spa visitor reviews using the example of Gellért Spa and Swimming Pool
DOI:
https://doi.org/10.15170/MM.2021.55.04.03Keywords:
natural language processing, topic modelling, latent Dirichlet distribution, Gellért Spa and Swimming PoolAbstract
THE AIMS OF THE PAPER
The study presents the results of a computer based topical modelling of guest reviews written by visitors of the Gellért Spa and Swimming Pool between 2004 and 2021. From a tourism marketing point of view, the analysis of guest reviews is of particular importance, especially for attractions with a high turnover of visitors. The Gellért is an iconic monumental bath of Budapest and Hungary, a health tourism attraction that is an unmissable experience for many visitors to Budapest. This is reflected in the more than 10,000 guest reviews written in almost 30 languages on Tripadvisor over the last decade and a half, a number so huge that it can only be comprehensively understood by machine.
METHODOLOGY
All reviews written on Tripadvisor between 2005 and 2021 were downloaded using a dedicated app. Reviews written in languages other than English were translated into English using Google Translate. The resulting corpus was analysed using structured topic modelling with latent Dirichlet allocation, a rapidly developing method in text mining, to identify the topics that typically occur in multiple guest reviews. We did this using the statistical software R and the structured topic modelling application STM running in R environment.
MOST IMPORTANT RESULTS
The modelling identified 12 typical themes across the guest reviews. These were also compared with an earlier analysis of the same corpus based on word frequency analysis, which showed that similar themes could be identified using both methodologies. We also separately analysed the representation of opinions on service features that were largely negatively evaluated by guests over the time horizon studied. According to this, the proportion of themes related to hygiene and cleanliness increased, while the proportion of themes related to guest communication, also largely negatively rated, decreased in the written guest reviews of the Gellért Spa.
References
Balogh K. (2015). A látens Dirichlet allokáció társadalomtudományi alkalmazása [ELTE Társadalomtudományi Kar]. https://tas.precognox.com/labs/kuruc-info-visualization/A_latens_Dirichlet_allokacio_tarsadalomtudomanyi_alkalmazasa_Balogh_Kitti.pdf
Blei, D. M. (2012), Probabilistic topic models. Communications of the ACM, 55(4), 77–84. DOI: 10.1145/2133806.2133826
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003), Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(4–5), 993–1022. DOI: 10.1162/JMLR.2003.3.4-5.993
Calheiros, A. C., Moro, S., & Rita, P. (2017), Sentiment Classification of Consumer-Generated Online Reviews Using Topic Modeling. Journal of Hospitality Marketing & Management, 26(7), 675–693. DOI: 10.1080/19368623.2017.1310075
Gerrish, S., & Blei, D. M. (2012), How They Vote: Issue-Adjusted Models of Legislative Behavior. Advances in Neural Information Processing Systems, 25, 2753–2761. https://proceedings.neurips.cc/paper/2012/hash/193002e668758ea9762904da1a22337c-Abstract.html
Hinek M. (2021), Fesztivállátogatók véleményeinek számítógéppel támogatott tematikus modellezése – egy kísérlet eredményei Computer-aided topic modelling based on festival-goers’ opinions – results of an experiment. Turizmus Bulletin, 21(1), 4–12. DOI: 10.14267/TURBULL.2021v21n1.1
Hu, N., Zhang, T., Gao, B., & Bose, I. (2019), What do hotel customers complain about? Text analysis using structural topic model. Tourism Management, 72, 417–426. DOI: 10.1016/j.tourman.2019.01.002
Kirilenko, A. P., Stepchenkova, S. O., & Dai, X. (2021), Automated topic modeling of tourist reviews: Does the Anna Karenina principle apply? Tourism Management, 83, 104241. DOI: 10.1016/j.tourman.2020.104241
Korfiatis, N., Stamolampros, P., Kourouthanassis, P., & Sagiadinos, V. (2019), Measuring service quality from unstructured data: A topic modeling application on airline passengers’ online reviews. Expert Systems with Applications, 116, 472–486. DOI: 10.1016/j.eswa.2018.09.037
Park, K., & Ha, S. H. (2017), Customer Service Evaluation based on Online Text Analytics: Sentiment Analysis and Structural Topic Modeling. The Journal of Information Systems, 26(4), 327–353. DOI: 10.5859/KAIS.2017.26.4.327
Paul, M. J., & Dredze, M. (2014), A Model for Mining Public Health Topics from Twitter. PLoS ONE, 9(8), e103408. DOI: 10.1371/journal.pone.0103408
R Core Team. (2021), R: A language and environment for statistical computing (4.1.1) [Computer software]. https://www.R-project.org/
Roberts, M. E., Stewart, B. M., & Airoldi, E. M. (2016), A model of text for experimentation in the social sciences. Journal of the American Statistical Association, 111(515), 988–1003. DOI: 10.1080/01621459.2016.1141684
Roberts, M. E., Stewart, B. M., & Tingley, D. (2019), stm: An R Package for Structural Topic Models. Journal of Statistical Software, 91, 1–40. DOI: 10.18637/jss.v091.i02
Roberts, M. E., Tingley, D., Stewart, B. M., & Airoldi, E. M. (2013), The Structural Topic Model and Applied Social Science. NIPS 2013 Workshop on Topic Models: Computation, Application, and Evaluation. DOI: 10.1080/01621459.2016.1141684
Smith, M. K., Jancsik, A., & Puczkó, L. (2020), Customer satisfaction in post-socialist Spas: A case study of Budapest, City of Spas. International Journal of Spa and Wellness, 3(2–3), 165–186. DOI: 10.1080/24721735.2020.1866330
Sutherland, I., Sim, Y., Lee, S. K., Byun, J., & Kiatkawsin, K. (2020), Topic Modeling of Online Accommodation Reviews via Latent Dirichlet Allocation. Sustainability, 12(5), 1821. DOI: 10.3390/su12051821
Weinshall, D., Levi, G., & Hanukaev, D. (2013), LDA Topic Model with Soft Assignment of Descriptors to Words. Proceedings of the 30th International Conference on Machine Learning, 711–719. https://proceedings.mlr.press/v28/weinshall13.html