About me

🔍 I’m a fifth-year Ph.D. candidate at Rochester Institute of Technology (RIT), conducting research at the Document and Pattern Recognition Lab (DPRL), under the mentorship of Dr. Richard Zanibbi.

💡 My work centers around designing fast, efficient, and interpretable parsers for recognizing complex mathematical and chemical formulas. I explore graphical notations across multiple formats, including PDFs, typeset images, and handwritten strokes. Through graph attention-based techniques, I aim to enhance how contextual information is processed, while preserving a natural and interpretable graph representation.

🎯 My goal is to deliver high accuracy in formula recognition through models that are not only faster but also easier to interpret than traditional encoder-decoder architectures.

đŸ’» Recently, I developed ChemScraper, a molecule diagram parser that extracts characters and graphics directly from PDF molecule images. By utilizing typesetting instructions and simple graph transformations, it generates both visual and chemical graphs — without the need for OCR, GPUs, or vectorization. ChemScraper offers a practical approach to creating fine-grained, annotated datasets for training visual parsers, and also a visual parser for parsing molecule images (raster) directly.

🌐 Research interests: Pattern recognition, recognition of graphical structures, computer vision, speaker understanding, large language models, multi-modal deep learning, natural language processing .


Get a PDF copy of my CV here


News

  • (Sept 3, 2024): Gave an oral presentation on “ChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsing” at the 18th International Conference on Document Analysis and Recognition, ICDAR 2024, Athens, Greece.

  • (May 2024): A revised paper on ChemScraper has been published at the 18th International Conference on Document Analysis and Recognition, ICDAR 2024, Athens, Greece – Journal Track. The paper describes (1) a fast and accurate technique for parsing born-digital (vector) PDF images, and (2) its use to create training data for a new approach to visual parsing of molecule diagrams in raster images (i.e., pixel-based such as from PNGs). Code is available.

  • (Nov 14, 2023): A paper describing ChemScraper parser for molecular diagrams in PDF drawing instructions (‘born-digital’) is available on arXiv here. The system can also generate annotated training data for visual parsers that recognize raster images (i.e., pixel-based, such as PNG). A link to associated code is provided in a footnote in the paper.

  • (Sept 12, 2023): Co-presented a poster on “ChemScraper: Extracting Molecule Diagrams from PDF Vector and Raster Images with CDXML and SMILES Output” at the Molecule Maker Lab Institute (MMLI) All-Institute Retreat at University of Illinois Urbana-Champaign (UIUC).

  • (Aug 22-23, 2023): Gave a poster presentation talk at Poster session 1 and doctoral consortium at the 17th International Conference on Document Analysis and Recognition, ICDAR 2023, San JosĂ©, California.

  • (June 28, 2023): Co-presented a poster on “ChemScraper: Extracting Molecule Diagrams from PDF Vector Images with Page-Level CDXML (ChemDraw) and SMILES Output” at the NSF Annual Review Meeting at University of Illinois Urbana-Champaign (UIUC).

  • (Apr 17, 2023): Gave a Research Idea Ring (RIR) talk on “Line-of-sight with Graph Attention Parser (LGAP) for Math Formulas” at RIT.

  • (Apr 2023): Our paper on the Line-of-sight with Graph Attention Parser (LGAP) for Math Formulas accepted for publication at the 17th International Conference on Document Analysis and Recognition, ICDAR 2023, San JosĂ©, California.

  • (Sept 27-28, 2022): Co-presented a poster on “Reconstructing the Structure of Molecular Diagrams in PDF Documents using a CNN-Attention-Based Parsing Model” at the Molecule Maker Lab Institute (MMLI) All-Institute Retreat at University of Illinois Urbana-Champaign (UIUC).

  • (Sep 5, 2021): Gave a guest lecture on “Bayesian Decision Theory” for RIT’s undergraduate course - Intro to Machine Learning (40 students).

  • (Aug 28, 2022): Successfully completed Applied Scientist Intern at Amazon (Alexa AI). Started as Graduate Teaching Assisstant (GTA) for the undergraduate course CSCI-335 Machine Learning.

  • (May 23, 2022): Started as Applied Scientist Intern at Amazon (Alexa AI). Worked on the Alexa Perceptual Technologies - Speaker Understanding team to improve speaker identification in Alexa devices.

  • (Apr 7, 2022): Gave a Research Idea Ring (RIR) talk on “A Fast and Interpretable Context-aware Parser for Isolated Formulas and Chemical Diagrams” at RIT.

  • (Sep 9, 2021): Gave a poster presentation talk on the MathSeer extraction pipeline at the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland virtually.

  • (Sep 2021): MathSeer extraction pipeline released. This tool extracts formula locations and content in PDF documents. The pipeline is available from GitLab, and includes improved versions of SymbolScraper, ScanSSD (now, ScanSSD-XYc), and QD-GGA. The pipeline was prepared by Ayush K. Shah, Abhisek Dey, Matt Langsenkamp, and Prof. Zanibbi.

  • (Apr 2021): Our paper on the MathSeer formula extraction and evaluation pipeline accepted for publication at the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland.

  • (May 2021): Successfully defended and passed my Ph.D. Research Potential Assessment (RPA).

  • (Aug 2020): Joined Rochester Institute of Technology (RIT) for Ph.D. in Computing and Information Sciences. Started as Graduate Research Assisstant (GRA) at the Document and Pattern Recognition Lab (DPRL) under Prof. Richard Zanibbi.

  • (Jan 2020): Promoted to Machine Learning Engineer Level 1 at Fusemachines Nepal.

  • (Aug 2019): Promoted to Machine Learning Engineer Associate at Fusemachines Nepal.

  • (Aug 2019): Graduated from Kathmandu University as a Computer Engineer.

  • (Jun 2019): Started working as a Machine Learning Engineer Trainee at Fusemachines Nepal.