Home > Software > SanskritTagger

The program SanskritTagger

SanskritTagger generates lexical and part-of-speech analyses of digital Sanskrit texts using a stochastic language model. SanskritTagger has been employed to build the annotated text corpus from which the Digital Corpus of Sanskrit (DCS) has been extracted. Please note that some parts of the program interface and of the help system of SanskritTagger are still in German! They will be rewritten in English during the next months.
SanskritTagger is described in the following publications:
  • Oliver Hellwig: SanskritTagger, a stochastic lexical and POS tagger for Sanskrit. In: Proceedings of the First International Sanskrit Computational Linguistics Symposium, pp. 37-46.
  • Oliver Hellwig: Performance of a lexical and POS tagger for Sanskrit. In: Proceedings of the Fourth International Sanskrit Computational Linguistics Symposium, pp. 162-172.

License

SanskritTagger is distributed as freeware under a permissive license. License terms are displayed during installation.

You are encouraged to share annotated data created using SanskritTagger with the scientific community. Please refer to the description of data synchronisation in the help file of SanskritTagger.

Downloading SanskritTagger

A comprehensive explanation of how to download and install the program SanskritTagger is found on the download page.

News

January 5th, 2016: Gujarati OCR (1.0.0.1) released

August 4th, 2015: Marathi OCR (1.0.0.4) released

All news and additional content