India’s manuscripts comprise one of the world’s largest heritage collections. India holds an unparalleled treasure of over 10 million manuscripts on palm leaves, birch bark, and handmade paper. These texts cover fields from medicine and philosophy to the arts and governance, embodying the Bhāratīya Gyan Paramparā (Indian Knowledge Tradition). Recognising the significance of this heritage, the Government of India has launched the Gyan Bharatam Mission (GBM). Under the aegis of Gyan Bharatam, the Ministry of Culture, Government of India, is organising the International Conference on "Reclaiming India’s Knowledge Legacy Through Manuscript Heritage" from 11th to 13th September 2025 at Bharat Mandapam, New Delhi.
Under the purview of above International Conference, Ministry of Culture, Government of India is pleased to launch Gyan-Setu, a national AI innovation challenge. This public call invites individuals, start-ups, researchers, and institutions to propose advanced AI solutions for preserving and opening up India’s manuscript legacy.
The goal is to develop tools that automate cataloguing, digitization, archiving, deciphering, and dissemination of ancient texts. AI has already shown promise in this domain, for example, optical character recognition (OCR) and machine learning can restore faded or damaged script, vastly improving on purely manual transcription. By harnessing AI, the initiative aims to accelerate conservation efforts, enhance accessibility, and share India’s centuries‑old knowledge with scholars and the public worldwide.
Participants should propose solutions in one or more of the following focus areas, describing how AI and related technology will address each:
AI-Based Cataloguing: Development of automated pipelines for generating descriptive metadata and classification of manuscript collections using artificial intelligence to create robust catalogue for Bharatiya Gyan Parampara Datasets
Proposed approaches may leverage machine learning and natural language processing to identify languages, scripts, authorship attribution, palaeographic features, and subject domains.
For instance, deep learning algorithms could process large-scale digitized corpora to automatically assign language, script, author, and thematic metadata, enabling scalable cataloguing and interoperability with digital library infrastructures at a speed and consistency better than manual methods.
Such systems can incorporate computer vision techniques to analyse digitized manuscript images for script recognition and layout segmentation, while semantic models can parse extracted text directly supporting the objectives of the Gyan Bharatam Mission for nationwide manuscript survey, preservation, and documentation.
EXPECTED DELIVERABLE : Proposed solutions should demonstrate a detailed catalogue (for Bharatiya Gyan Parampara) along with its creation process augmented with AI components which in-turn aids in generation of AI-trainable datasets.
Digitization Enhancement & AI-trainable dataset creation: Development of advanced image processing and optical character recognition (OCR) pipelines for text extraction from manuscript sources.
Proposed approaches may involve designing or fine-tuning OCR models specifically adapted to ancient and medieval Indian scripts, multi-lingual contexts, and cursive or handwritten styles.
Techniques could include deep neural networks for image enhancement of degraded or faded pages, and layout analysis to segment text from marginalia, illustrations, or annotations.
Handwriting recognition models can be trained on diverse script samples to capture regional variations and historical orthographies.
EXPECTED DELIVERABLE : Proposed Solutions must provide a detailed design, architecture for the enhancement pipeline that leads to generations of datasets appropriate for training AI models.
Digital Archiving: Development of intelligent systems for search, retrieval, and semantic linking across distributed digital manuscript repositories.
Proposed approaches may employ artificial intelligence techniques for multilingual full-text search, cross-lingual information retrieval, and semantic query expansion.
Beyond basic keyword matching Knowledge graphs may be constructed to represent entities such as authors, works, subjects, locations, and historical periods, allowing automated linking of manuscripts with shared attributes or intertextual references.
AI-driven chat interfaces could also mediate archive interaction, offering scholars a natural-language gateway into complex, interconnected collections.
EXPECTED DELIVERABLE : The proposed solutions must demonstrate RAG-like capabilities for search functionalities.
Script Deciphering: Development of automated recognition, transcription, and transliteration pipelines for rare, historic, and underrepresented Indic scripts.
Many manuscripts are written in scripts with minimal or no digital infrastructure such as Grantha, Modi, Sharada, Newari, or Kharoshthi where no mature OCR or font support currently exists.
Proposed approaches may leverage computer vision and deep learning architectures for image-to-text modelling, including convolutional-recurrent OCR frameworks, attention-based sequence transducers, and transformer-based handwriting recognition systems.
Training can incorporate parallel corpora, palaeographic samples, and synthetic data augmentation to overcome limited annotated resources.
EXPECTED DELIVERABLE : The proposed solutions must demonstrate automated Palaeographic classification & identification of manuscripts that can further be used in transcription & transliteration pipelines for downstream tasks.
Knowledge Dissemination: Development of AI-driven tools/platform built on top of datasets created from manuscripts for creating Indic-rich narratives in multilingual and multimodal for knowledge dissemination of Bhartiya Gyan Parampara.
Proposed approaches may employ multilingual natural language processing for automatic translation, transliteration, and abstractive summarization of complex texts into modern Indian and global languages.
Language models fine-tuned on parallel corpora can be adapted to render manuscripts originally in Sanskrit, Prakrit, Persian, or vernacular scripts into Hindi, English, or other widely spoken languages, while retaining cultural and semantic nuance. Speech technologies such as text-to-speech and voice-based conversational agents could enable oral exploration of manuscript content, particularly valuable for public outreach and inclusivity.
EXPECTED DELIVERABLE : The proposed solutions must demonstrate AI driven tool & platform capabilities where authentic manuscripts, classical texts can be integrated with multilingual & multimodal capabilities to deliver credible & manuscript-referenced AI based Bharatiya Gyan Parampara content to students, researchers and curious citizens.
AI-Infused Bharatiya Gyan Parampara Library: Development of digital library - populated with AI-generated text, audio & video built on top of authentic Indic datasets originated from manuscripts connecting it with modern science for enabling engagement among 21st century youth in multilingual and multimodal formats.
Proposed approaches may employ AI based tools for automatic creation of the said content in modern Indian and global languages.
Proposed approaches are encouraged to use domain expertise available.
EXPECTED DELIVERABLE : The proposed solutions must demonstrate successfully creation of the said library content along with proofs of authenticity of the content's source based on which the audio, text and video are being created. The process, validation of source content and the final quality of the library will be given utmost importance for this specific focus area.
The top three proposals, as determined by the jury, will be formally recognized at the conference and conferred with awards for excellence in AI-driven manuscript preservation and accessibility.
Beyond recognition, winning teams may also receive grant funding, incubation support, and opportunities for pilot deployment in collaboration with national repositories, research institutions, and cultural organizations.
Additional commendations may be awarded for innovation in specific thematic areas such as script decipherment, multilingual OCR, or public engagement platforms.
For further details, clarifications, or submission guidelines, participants may contact and submit the proposals by September 1, 2025 ( 5.30 pm)
Prof. Ramesh C. Gaur
Member Organizing Committee
Email: gbmconference@gmail.com
Tel: +91-11-2344 6557
Mobile: +91 99061 78739