magazinelogo

Journal of Literature Advances

ISSN Online: 3066-0998 Downloads: 7779 Total View: 60612
Frequency: Instant publication CODEN:
Email: jla@hillpublish.com
Article Open Access http://dx.doi.org/10.26855/jla.2025.06.008

Information Extraction through Artificial Intelligence in Historical Texts

Gakis Panagiotis*, Tsalidis Christos, Stamouli Alexia-Foteini, Zgolompi Ismini, Stamatopoulos Ioannis

University of the Peloponnese, Tripoli 22100, Greece.

*Corresponding author: Gakis Panagiotis

Published: August 19,2025

Abstract

The advancement of technology, particularly artificial intelligence, has positively impacted the humanities, strengthening fields such as computational linguistics and natural language processing (NLP). From Winograd’s pioneering SHRDLU system to modern tools like Mnemosyne, significant progress has been made in machine understanding and analysis of language. NLP includes tasks such as information extraction and retrieval, named entity recognition (e.g., persons, locations, events), and the organization of large linguistic datasets. Recognizing these entities is crucial for understanding and organizing information from unstructured texts. Although progress on the Greek language is still limited, it shows promising potential. The General State Archives of Greece (GSA), as the main institution for preserving historical and administrative records in the country, faces challenges due to inconsistent categorization and thematic labeling. Technologies such as lemmatization, normalization, and the use of semantic tools (like SKOS and Mobi) offer solutions to these issues. These tools enable more consistent and hierarchical organization of thematic categories, improving search, navigation, and access for both researchers and the general public. They also support interoperability with national infrastructures such as SearchCulture.gr. The application of ontologies contributes to smarter management of cultural information and ensures unified access to public knowledge.

Keywords

Information extraction; Information retrieval; Named entities; General State Ar-chives of Greece; Historical texts

References

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. https://arxiv.org/abs/1810.04805

Gakis, P., Zgolompi, I., & Kokkinos, Th. (2025). Chatbots: Linguistic foundations and pedagogical implications. NEDA Scientific Journal, 3(2019-2020), 331-350. Herodotus.

Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), 199-220. https://doi.org/10.1006/knac.1993.1008

Jurafsky, D., & Martin, J. (2003). Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition. Prentice Hall.

Lavelli, A., Califf, M. E., Ciravegna, F., Freitag, D., Giuliano, C., Kushmerick, N., Romano, L., & Ireson, N. (2008). Evaluation of machine learning-based information extraction algorithms: Criticisms and recommendations. Language Resources and Evaluation, 42, 361-393. https://doi.org/10.1007/s10579-008-9079-3

Maistros, G., & Markantonatou, S. (2004). Introduction to programming languages. NTUA.

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press. https://nlp.stanford.edu/IR-book/

Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3-26. https://doi.org/10.1075/li.30.1.03nad

Pustejovsky, J., & Batiukova, O. (2019). The lexicon. Cambridge University Press.

How to cite this paper

Information Extraction through Artificial Intelligence in Historical Texts

How to cite this paper: Gakis Panagiotis, Tsalidis Christos, Stamouli Alexia-Foteini, Zgolompi Ismini, Stamatopoulos Ioannis. (2025). Information Extraction through Artificial Intelligence in Historical Texts. Journal of Literature Advances2(1), 48-56.

DOI: http://dx.doi.org/10.26855/jla.2025.06.008