About me

🔍 I’m a fourth-year Ph.D. candidate at Rochester Institute of Technology (RIT), actively involved in cutting-edge research at the Document and Pattern Recognition Lab (DPRL), under the guidance of Prof. Richard Zanibbi.

💡 My current focus revolves around developing a fast, interpretable visual parser for math and chemical formulas. I specialize in recognizing graphical notations, including complex math and chemical formulas, across various mediums like born-digital PDFs, typeset images, and handwritten strokes. Exploring innovative graph attention-based task interaction techniques, I aim to enhance contextual information while maintaining a natural and interpretable graph representation.

🎯 The ultimate goal? Achieving competitive accuracy recognizing math and chemical formulas, packaged in a faster and more interpretable model than traditional encoder-decoder setups.

💻 Recently, I completed ChemScraper, a molecule diagram parser, which extracts characters and graphics from PDF molecule images using typesetting instructions, applies simple graph transformation algorithms to convert them into visual and then chemical graphs — without OCR, GPU, or vectorization. ChemScraper’s fast speed and reliable accuracy enables it to contribute significantly in creating fine-grained annotated dataset for training visual parsers.

🌐 Research interests: pattern recognition, computer vision, deep learning, and speech and natural language processing.


Get a PDF copy of my CV here


News

  • (Nov 14, 2023): A paper describing ChemScraper parser for molecular diagrams in PDF drawing instructions (‘born-digital’) is available on arXiv here. The system can also generate annotated training data for visual parsers that recognize raster images (i.e., pixel-based, such as PNG). A link to associated code is provided in a footnote in the paper.

  • (Sept 12, 2023): Co-presented a poster on “ChemScraper: Extracting Molecule Diagrams from PDF Vector and Raster Images with CDXML and SMILES Output” at the Molecule Maker Lab Institute (MMLI) All-Institute Retreat at University of Illinois Urbana-Champaign (UIUC).

  • (Aug 22-23, 2023): Gave a poster presentation talk at Poster session 1 and doctoral consortium at the 17th International Conference on Document Analysis and Recognition, ICDAR 2023, San José, California.

  • (June 28, 2023): Co-presented a poster on “ChemScraper: Extracting Molecule Diagrams from PDF Vector Images with Page-Level CDXML (ChemDraw) and SMILES Output” at the NSF Annual Review Meeting at University of Illinois Urbana-Champaign (UIUC).

  • (Apr 17, 2023): Gave a Research Idea Ring (RIR) talk on “Line-of-sight with Graph Attention Parser (LGAP) for Math Formulas” at RIT.

  • (Apr 2023): Our paper on the Line-of-sight with Graph Attention Parser (LGAP) for Math Formulas accepted for publication at the 17th International Conference on Document Analysis and Recognition, ICDAR 2023, San José, California.

  • (Sept 27-28, 2022): Co-presented a poster on “Reconstructing the Structure of Molecular Diagrams in PDF Documents using a CNN-Attention-Based Parsing Model” at the Molecule Maker Lab Institute (MMLI) All-Institute Retreat at University of Illinois Urbana-Champaign (UIUC).

  • (Sep 5, 2021): Gave a guest lecture on “Bayesian Decision Theory” for RIT’s undergraduate course - Intro to Machine Learning (40 students).

  • (Aug 28, 2022): Successfully completed Applied Scientist Intern at Amazon (Alexa AI). Started as Graduate Teaching Assisstant (GTA) for the undergraduate course CSCI-335 Machine Learning.

  • (May 23, 2022): Started as Applied Scientist Intern at Amazon (Alexa AI). Worked on the Alexa Perceptual Technologies - Speaker Understanding team to improve speaker identification in Alexa devices.

  • (Apr 7, 2022): Gave a Research Idea Ring (RIR) talk on “A Fast and Interpretable Context-aware Parser for Isolated Formulas and Chemical Diagrams” at RIT.

  • (Sep 9, 2021): Gave a poster presentation talk on the MathSeer extraction pipeline at the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland virtually.

  • (Sep 2021): MathSeer extraction pipeline released. This tool extracts formula locations and content in PDF documents. The pipeline is available from GitLab, and includes improved versions of SymbolScraper, ScanSSD (now, ScanSSD-XYc), and QD-GGA. The pipeline was prepared by Ayush K. Shah, Abhisek Dey, Matt Langsenkamp, and Prof. Zanibbi.

  • (Apr 2021): Our paper on the MathSeer formula extraction and evaluation pipeline accepted for publication at the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland.

  • (May 2021): Successfully defended and passed my Ph.D. Research Potential Assessment (RPA).

  • (Aug 2020): Joined Rochester Institute of Technology (RIT) for Ph.D. in Computing and Information Sciences. Started as Graduate Research Assisstant (GRA) at the Document and Pattern Recognition Lab (DPRL) under Prof. Richard Zanibbi.

  • (Jan 2020): Promoted to Machine Learning Engineer Level 1 at Fusemachines Nepal.

  • (Aug 2019): Promoted to Machine Learning Engineer Associate at Fusemachines Nepal.

  • (Aug 2019): Graduated from Kathmandu University as a Computer Engineer.

  • (Jun 2019): Started working as a Machine Learning Engineer Trainee at Fusemachines Nepal.