About me
đ Iâm a fifth-year Ph.D. candidate at Rochester Institute of Technology (RIT), conducting research at the Document and Pattern Recognition Lab (DPRL), under the mentorship of Dr. Richard Zanibbi.
đĄ My work centers around designing fast, efficient, and interpretable parsers for recognizing mathematical formulas and chemical diagrams from documents across multiple formats, including PDFs, typeset images, and handwritten strokes. Through graph attention-based techniques in a multi-task learning framework, I aim to enhance contextual information, while preserving a natural and interpretable graph representation.
đ» Recent Projects:
- Multiodal Chemical Search: A multimodal search tool for retrieving chemical reactions, molecular structures, and associated text from scientific literature, linking visual and textual representations of chemical information.
- ChemScraper: A molecule diagram parser for extracting molecular graphics from PDFs.
- MathDeck: A math-aware search system supporting formulas & text.
đ Research interests: Pattern recognition, recognition of graphical structures, computer vision, speaker understanding, large language models, multi-modal deep learning, natural language processing.
đ° News
2025
Submitted our paper on the âMultimodal Search in Chemical Documents and Reactionsâ for publication at Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, in SIGIR â25, Padua, Italy. The system is available online at ReactionMiner.
2024
Successfully defended and passed my Ph.D. dissertation proposal on âParsing of Math Formulas and Chemical Diagrams using Graph-Based Representation and Attention Modelsâ.
Gave an oral presentation on âChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsingâ at the 18th International Conference on Document Analysis and Recognition, ICDAR 2024, Athens, Greece.
A revised paper on ChemScraper has been published at the 18th International Conference on Document Analysis and Recognition, ICDAR 2024, Athens, Greece â Journal Track. The paper describes (1) a fast and accurate technique for parsing born-digital (vector) PDF images, and (2) its use to create training data for a new approach to visual parsing of molecule diagrams in raster images (i.e., pixel-based such as from PNGs). Code is available and the system is online at ChemScraper.
2023
A paper describing ChemScraper parser for molecular diagrams in PDF drawing instructions (âborn-digitalâ) is available on arXiv here. The system can also generate annotated training data for visual parsers that recognize raster images (i.e., pixel-based, such as PNG). A link to associated code is provided in a footnote in the paper.
Co-presented a poster on âChemScraper: Extracting Molecule Diagrams from PDF Vector and Raster Images with CDXML and SMILES Outputâ at the Molecule Maker Lab Institute **(MMLI) All-Institute Retreat at University of Illinois Urbana-Champaign (UIUC).
Gave a poster presentation talk at Poster session 1 and doctoral consortium at the 17th International Conference on Document Analysis and Recognition, ICDAR 2023, San José, California.
Co-presented a poster on âChemScraper: Extracting Molecule Diagrams from PDF Vector Images with Page-Level CDXML (ChemDraw) and SMILES Outputâ at the NSF Annual Review Meeting at University of Illinois Urbana-Champaign (UIUC).
Gave a Research Idea Ring (RIR) talk on âLine-of-sight with Graph Attention Parser (LGAP) for Math Formulasâ at RIT.
Our paper on the âLine-of-sight with Graph Attention Parser (LGAP) for Math Formulasâ accepted for publication at the 17th International Conference on Document Analysis and Recognition, ICDAR 2023, San JosĂ©, California.
2022
Co-presented a poster on âReconstructing the Structure of Molecular Diagrams in PDF Documents using a CNN-Attention-Based Parsing Modelâ at the Molecule Maker Lab Institute **(MMLI) All-Institute Retreat at University of Illinois Urbana-Champaign (UIUC).
Gave a guest lecture on âBayesian Decision Theoryâ for RITâs undergraduate course - Intro to Machine Learning (40 students).
Successfully completed Applied Scientist Intern at Amazon (Alexa AI). Started as Graduate Teaching Assistant (GTA) for the undergraduate course CSCI-335 Machine Learning.
Started as Applied Scientist Intern at Amazon (Alexa AI). Worked on the Alexa Perceptual Technologies - Speaker Understanding team to improve speaker identification in Alexa devices.
Gave a Research Idea Ring (RIR) talk on âA Fast and Interpretable Context-aware Parser for Isolated Formulas and Chemical Diagramsâ at RIT.
2021
Gave a poster presentation talk on the MathSeer extraction pipeline at the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland virtually.
MathSeer extraction pipeline released. This tool extracts formula locations and content in PDF documents. The pipeline is available from GitLab, and includes improved versions of SymbolScraper, ScanSSD (now, ScanSSD-XYc), and QD-GGA. The pipeline was prepared by Ayush K. Shah, Abhisek Dey, Matt Langsenkamp, and Prof. Zanibbi.
Successfully defended and passed my Ph.D. Research Potential Assessment (RPA) on âRecognition of Mathematical Formulasâ
Our paper on the âMathSeer formula extraction and evaluation pipelineâ accepted for publication at the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland.
2020
Joined Rochester Institute of Technology (RIT) for Ph.D. in Computing and Information Sciences. Started as Graduate Research Assistant (GRA) at the Document and Pattern Recognition Lab (DPRL) under Prof. Richard Zanibbi.
Promoted to Machine Learning Engineer Level 1 at Fusemachines Nepal.
2019
Promoted to Machine Learning Engineer Associate at Fusemachines Nepal.
Graduated from Kathmandu University as a Computer Engineer.
Started working as a Machine Learning Engineer Trainee at Fusemachines Nepal.