Gyan-Setu: National AI Innovation Challenge for India’s Manuscript Heritage

India’s manuscripts comprise one of the world’s largest heritage collections. India holds an unparalleled treasure of over 10 million manuscripts on palm leaves, birch bark, and handmade paper. These texts cover fields from medicine and philosophy to the arts and governance, embodying the Bhāratīya Gyan Paramparā (Indian Knowledge Tradition). Recognising the significance of this heritage, the Government of India has launched the Gyan Bharatam Mission (GBM). Under the aegis of Gyan Bharatam, the Ministry of Culture, Government of India, is organising the International Conference on "Reclaiming India’s Knowledge Legacy Through Manuscript Heritage" from 11th to 13th September 2025 at Bharat Mandapam, New Delhi.

Under the purview of above International Conference, Ministry of Culture, Government of India is pleased to launch Gyan-Setu, a national AI innovation challenge. This public call invites individuals, start-ups, researchers, and institutions to propose advanced AI solutions for preserving and opening up India’s manuscript legacy.

The goal is to develop tools that automate cataloguing, digitization, archiving, deciphering, and dissemination of ancient texts. AI has already shown promise in this domain, for example, optical character recognition (OCR) and machine learning can restore faded or damaged script, vastly improving on purely manual transcription. By harnessing AI, the initiative aims to accelerate conservation efforts, enhance accessibility, and share India’s centuries‑old knowledge with scholars and the public worldwide.

Focus Areas for AI Solutions

Participants should propose solutions in one or more of the following focus areas, describing how AI and related technology will address each:

  • 1. AI-based Cataloguing & Dataset Creation - Automated metadata creation, classification, and language/script identification.
  • 2. Digitization Enhancement - Image processing, OCR, and multilingual text extraction.
  • 3. Digital Archiving - Intelligent search, retrieval, and semantic linking across repositories.
  • 4. Script Deciphering - AI-assisted recognition of rare and historic scripts.
  • 5. Knowledge Dissemination - Translation tools, interactive platforms, & public engagement applications.
  • 6. AI-Infused Bharatiya Gyan Parampara Library - Integrating Bharatiya Gyan Parampara with modern science using AI

Focus Area 1 : AI-Based Cataloguing & Dataset Creation

AI-Based Cataloguing: Development of automated pipelines for generating descriptive metadata and classification of manuscript collections using artificial intelligence to create robust catalogue for Bharatiya Gyan Parampara Datasets

Proposed approaches may leverage machine learning and natural language processing to identify languages, scripts, authorship attribution, palaeographic features, and subject domains.

For instance, deep learning algorithms could process large-scale digitized corpora to automatically assign language, script, author, and thematic metadata, enabling scalable cataloguing and interoperability with digital library infrastructures at a speed and consistency better than manual methods.

Such systems can incorporate computer vision techniques to analyse digitized manuscript images for script recognition and layout segmentation, while semantic models can parse extracted text directly supporting the objectives of the Gyan Bharatam Mission for nationwide manuscript survey, preservation, and documentation.

EXPECTED DELIVERABLE : Proposed solutions should demonstrate a detailed catalogue (for Bharatiya Gyan Parampara) along with its creation process augmented with AI components which in-turn aids in generation of AI-trainable datasets.

Focus Area 2 : Digitisation Enhancement & AI-Trainable Dataset Creation

Digitization Enhancement & AI-trainable dataset creation: Development of advanced image processing and optical character recognition (OCR) pipelines for text extraction from manuscript sources.

Proposed approaches may involve designing or fine-tuning OCR models specifically adapted to ancient and medieval Indian scripts, multi-lingual contexts, and cursive or handwritten styles.

Techniques could include deep neural networks for image enhancement of degraded or faded pages, and layout analysis to segment text from marginalia, illustrations, or annotations.

Handwriting recognition models can be trained on diverse script samples to capture regional variations and historical orthographies.

EXPECTED DELIVERABLE : Proposed Solutions must provide a detailed design, architecture for the enhancement pipeline that leads to generations of datasets appropriate for training AI models.

Focus Area 3 : Digital Archiving

Digital Archiving: Development of intelligent systems for search, retrieval, and semantic linking across distributed digital manuscript repositories.

Proposed approaches may employ artificial intelligence techniques for multilingual full-text search, cross-lingual information retrieval, and semantic query expansion.

Beyond basic keyword matching Knowledge graphs may be constructed to represent entities such as authors, works, subjects, locations, and historical periods, allowing automated linking of manuscripts with shared attributes or intertextual references.

AI-driven chat interfaces could also mediate archive interaction, offering scholars a natural-language gateway into complex, interconnected collections.

EXPECTED DELIVERABLE : The proposed solutions must demonstrate RAG-like capabilities for search functionalities.

Focus Area 4 : Script Deciphering

Script Deciphering: Development of automated recognition, transcription, and transliteration pipelines for rare, historic, and underrepresented Indic scripts.

Many manuscripts are written in scripts with minimal or no digital infrastructure such as Grantha, Modi, Sharada, Newari, or Kharoshthi where no mature OCR or font support currently exists.

Proposed approaches may leverage computer vision and deep learning architectures for image-to-text modelling, including convolutional-recurrent OCR frameworks, attention-based sequence transducers, and transformer-based handwriting recognition systems.

Training can incorporate parallel corpora, palaeographic samples, and synthetic data augmentation to overcome limited annotated resources.

EXPECTED DELIVERABLE : The proposed solutions must demonstrate automated Palaeographic classification & identification of manuscripts that can further be used in transcription & transliteration pipelines for downstream tasks.

Focus Area 5: AI driven tool/platform for Knowledge Dissemination

Knowledge Dissemination: Development of AI-driven tools/platform built on top of datasets created from manuscripts for creating Indic-rich narratives in multilingual and multimodal for knowledge dissemination of Bhartiya Gyan Parampara.

Proposed approaches may employ multilingual natural language processing for automatic translation, transliteration, and abstractive summarization of complex texts into modern Indian and global languages.

Language models fine-tuned on parallel corpora can be adapted to render manuscripts originally in Sanskrit, Prakrit, Persian, or vernacular scripts into Hindi, English, or other widely spoken languages, while retaining cultural and semantic nuance. Speech technologies such as text-to-speech and voice-based conversational agents could enable oral exploration of manuscript content, particularly valuable for public outreach and inclusivity.

EXPECTED DELIVERABLE : The proposed solutions must demonstrate AI driven tool & platform capabilities where authentic manuscripts, classical texts can be integrated with multilingual & multimodal capabilities to deliver credible & manuscript-referenced AI based Bharatiya Gyan Parampara content to students, researchers and curious citizens.

Focus Area 6: AI-Infused Bharatiya Gyan Parampara Library

AI-Infused Bharatiya Gyan Parampara Library: Development of digital library - populated with AI-generated text, audio & video built on top of authentic Indic datasets originated from manuscripts connecting it with modern science for enabling engagement among 21st century youth in multilingual and multimodal formats.

Proposed approaches may employ AI based tools for automatic creation of the said content in modern Indian and global languages.

Proposed approaches are encouraged to use domain expertise available.

EXPECTED DELIVERABLE : The proposed solutions must demonstrate successfully creation of the said library content along with proofs of authenticity of the content's source based on which the audio, text and video are being created. The process, validation of source content and the final quality of the library will be given utmost importance for this specific focus area.

Proposal Submission

  1. Applicants including individual researchers, start-ups, academic groups, or cultural institutions are invited to submit a concept note (maximum 6 pages) outlining their proposed AI-driven solution.
  2. The concept note should clearly articulate:
    • a. The technical approach (e.g., machine learning models, computer vision, natural language processing, multimodal integration);
    • b. The targeted manuscript collections, languages, or scripts (particularly those of historic or underrepresented significance);
    • c. The development plan, including methodology, milestones, team composition, and timeline;
    • d. Evidence of originality, feasibility, and scalability of the approach; and
    • e. The expected contribution to India’s manuscript heritage, in terms of preservation, accessibility, or public engagement.
  3. Entries must align with one or more of the Focus Areas. Submissions may also highlight cross-cutting innovations that span multiple focus areas.
  4. All proposals will undergo a structured, multi-stage evaluation process designed to ensure both technical rigor and cultural relevance.
    • a. In the first stage, an expert screening committee comprising domain specialists will assess concept notes for technical soundness, innovation, feasibility, relevance to India’s manuscript heritage, and potential for scalability across diverse collections.
    • b. In the second stage, shortlisted teams have to demonstrates/presents detailed system designs that showcase core functionality such as relevant platform capabilities & functionalities along with documentation of methodologies and datasets used.
    • c. In the final stage, selected teams will demonstrate their solutions to a technical jury during the Gyan Bharatam Conference (September 11--13, 2025, New Delhi), where evaluation will emphasize practical usability, interoperability with digital library infrastructures, and long-term sustainability.

Evaluation and Awards

The top three proposals, as determined by the jury, will be formally recognized at the conference and conferred with awards for excellence in AI-driven manuscript preservation and accessibility.

Beyond recognition, winning teams may also receive grant funding, incubation support, and opportunities for pilot deployment in collaboration with national repositories, research institutions, and cultural organizations.

Additional commendations may be awarded for innovation in specific thematic areas such as script decipherment, multilingual OCR, or public engagement platforms.

Timeline & Contact

  1. Call for Proposals for the Gyan-Setu is formally announced on 25th August 2025.
  2. The submission deadline for concept notes is September 1, 2025, after which proposals will undergo a structured review and shortlisting process.
  3. A screening Committee will review the proposals and will announce names of shortlisted proposals for stage 2 on September 2, 2025.
  4. Shortlisted teams have to demonstrates/presents detailed system designs that showcase core functionality such as relevant platform capabilities & functionalities along with documentation of methodologies and datasets used on September 5, 2025
  5. In the final stage, selected teams will demonstrate their solutions to a technical jury during the Gyan Bharatam International Conference being held " from 11th to 13th September 2025 at Bharat Mandapam, New Delhi.
  6. The final results and awards will be formally announced during the conference, in the presence of scholars, technologists, and cultural policymakers.

For further details, clarifications, or submission guidelines, participants may contact and submit the proposals by September 1, 2025 ( 5.30 pm)

Prof. Ramesh C. Gaur

Member Organizing Committee

Email: gbmconference@gmail.com

Tel: +91-11-2344 6557

Mobile: +91 99061 78739

at: gbmconference@gmail.com