About me
đ Iâm a fifth-year Ph.D. candidate at Rochester Institute of Technology (RIT), conducting research at the Document and Pattern Recognition Lab (DPRL), under the mentorship of Dr. Richard Zanibbi.
đĄ My work centers around designing fast, efficient, and interpretable parsers for recognizing mathematical formulas and chemical diagrams across multiple formats, including PDFs, typeset images, and handwritten strokes. Through graph attention-based techniques and the integration of Large Language Models (LLMs), I aim to enhance how contextual information is processed while preserving a natural and interpretable graph representation.
đŻ My goal is to deliver high accuracy in math formula and chemical diagram recognition through models that are not only faster but also easier to interpret than traditional encoder-decoder architectures. Also, by leveraging LLMs, Iâve improved recognition accuracy in math and chemical diagram parsing.
đ» Recently, I developed ChemScraper, a molecule diagram parser that extracts characters and graphics directly from PDF molecule images. By utilizing typesetting instructions and simple graph transformations, it generates both visual and chemical graphs â without the need for OCR, GPUs, or vectorization. ChemScraper offers a practical approach to creating fine-grained, annotated datasets for training visual parsers, and also a visual parser for parsing molecule images (raster) directly.
đïž In addition to my work at DPRL, I interned at Amazon with the Alexa Speaker Understanding team, where I focused on improving speech recognition and speaker identification using LLMs, generative AI models for speech synthesis, combined with a semi-supervised approach. This experience enhanced my skills in applying LLMs to real-world problems in speech technologies.
đ Research interests: Pattern recognition, recognition of graphical structures, computer vision, speaker understanding, large language models, multi-modal deep learning, natural language processing.
Get a PDF copy of my CV here
News
(Sept 3, 2024): Gave an oral presentation on âChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsingâ at the 18th International Conference on Document Analysis and Recognition, ICDAR 2024, Athens, Greece.
(May 2024): A revised paper on ChemScraper has been published at the 18th International Conference on Document Analysis and Recognition, ICDAR 2024, Athens, Greece â Journal Track. The paper describes (1) a fast and accurate technique for parsing born-digital (vector) PDF images, and (2) its use to create training data for a new approach to visual parsing of molecule diagrams in raster images (i.e., pixel-based such as from PNGs). Code is available.
(Nov 14, 2023): A paper describing ChemScraper parser for molecular diagrams in PDF drawing instructions (âborn-digitalâ) is available on arXiv here. The system can also generate annotated training data for visual parsers that recognize raster images (i.e., pixel-based, such as PNG). A link to associated code is provided in a footnote in the paper.
(Sept 12, 2023): Co-presented a poster on âChemScraper: Extracting Molecule Diagrams from PDF Vector and Raster Images with CDXML and SMILES Outputâ at the Molecule Maker Lab Institute (MMLI) All-Institute Retreat at University of Illinois Urbana-Champaign (UIUC).
(Aug 22-23, 2023): Gave a poster presentation talk at Poster session 1 and doctoral consortium at the 17th International Conference on Document Analysis and Recognition, ICDAR 2023, San José, California.
(June 28, 2023): Co-presented a poster on âChemScraper: Extracting Molecule Diagrams from PDF Vector Images with Page-Level CDXML (ChemDraw) and SMILES Outputâ at the NSF Annual Review Meeting at University of Illinois Urbana-Champaign (UIUC).
(Apr 17, 2023): Gave a Research Idea Ring (RIR) talk on âLine-of-sight with Graph Attention Parser (LGAP) for Math Formulasâ at RIT.
(Apr 2023): Our paper on the Line-of-sight with Graph Attention Parser (LGAP) for Math Formulas accepted for publication at the 17th International Conference on Document Analysis and Recognition, ICDAR 2023, San José, California.
(Sept 27-28, 2022): Co-presented a poster on âReconstructing the Structure of Molecular Diagrams in PDF Documents using a CNN-Attention-Based Parsing Modelâ at the Molecule Maker Lab Institute (MMLI) All-Institute Retreat at University of Illinois Urbana-Champaign (UIUC).
(Sep 5, 2021): Gave a guest lecture on âBayesian Decision Theoryâ for RITâs undergraduate course - Intro to Machine Learning (40 students).
(Aug 28, 2022): Successfully completed Applied Scientist Intern at Amazon (Alexa AI). Started as Graduate Teaching Assisstant (GTA) for the undergraduate course CSCI-335 Machine Learning.
(May 23, 2022): Started as Applied Scientist Intern at Amazon (Alexa AI). Worked on the Alexa Perceptual Technologies - Speaker Understanding team to improve speaker identification in Alexa devices.
(Apr 7, 2022): Gave a Research Idea Ring (RIR) talk on âA Fast and Interpretable Context-aware Parser for Isolated Formulas and Chemical Diagramsâ at RIT.
(Sep 9, 2021): Gave a poster presentation talk on the MathSeer extraction pipeline at the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland virtually.
(Sep 2021): MathSeer extraction pipeline released. This tool extracts formula locations and content in PDF documents. The pipeline is available from GitLab, and includes improved versions of SymbolScraper, ScanSSD (now, ScanSSD-XYc), and QD-GGA. The pipeline was prepared by Ayush K. Shah, Abhisek Dey, Matt Langsenkamp, and Prof. Zanibbi.
(Apr 2021): Our paper on the MathSeer formula extraction and evaluation pipeline accepted for publication at the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland.
(May 2021): Successfully defended and passed my Ph.D. Research Potential Assessment (RPA).
(Aug 2020): Joined Rochester Institute of Technology (RIT) for Ph.D. in Computing and Information Sciences. Started as Graduate Research Assisstant (GRA) at the Document and Pattern Recognition Lab (DPRL) under Prof. Richard Zanibbi.
(Jan 2020): Promoted to Machine Learning Engineer Level 1 at Fusemachines Nepal.
(Aug 2019): Promoted to Machine Learning Engineer Associate at Fusemachines Nepal.
(Aug 2019): Graduated from Kathmandu University as a Computer Engineer.
(Jun 2019): Started working as a Machine Learning Engineer Trainee at Fusemachines Nepal.