Research

Research Interests:

Pattern recognition, recognition of graphical structures, computer vision, speaker understanding, large language models, multi-modal deep learning, natural language processing.

Current work:

My current focus revolves around developing a fast, interpretable visual parser for math and chemical formulas. Exploring innovative graph attention-based task interaction techniques, I aim to enhance contextual information while maintaining a natural and interpretable graph representation to recognize graphical notations, including complex math and chemical formulas, across various mediums like born-digital PDFs, typeset images, and handwritten strokes.

Past/ongoing research works:

MultiModal Chemical Search, A system for searching chemical reactions, molecular structures, and text in scientific literature. It integrates text, SMILES, and reaction-based queries, linking extracted reaction details with molecular diagrams and textual descriptions. The interface provides structured reaction and molecule cards for easy navigation and retrieval, supporting chemists in literature exploration and data extraction. Code is open-source and available here.
ChemScraper, a fast and accurate molecule diagram parser using characters and graphics extracted from born-digital (vector) PDF images—without the need for OCR, GPU, or vectorization. It uses these outputs to create training data for a new approach to visual parsing of molecule diagrams in raster images (i.e., pixel-based formats like PNGs) using a multi-task, segmentation-aware convolutional neural network (CNN). Code is open-source and available here.
MathDeck project, a system for searching PDF documents in a portion of the ACL Anthology, incorporating both formulas and text, displaying matched words and formulas in context. Its user-friendly interface includes formula ‘chips’ for easy formula creation, search, reuse and annotation. MathDeck supports both LaTeX and visual formula editing.
Built a new open-source math formula extraction pipeline for PDF files
Adopted distributed parallelization methods with multiple GPUs and implemented custom dataloader with dynamic batch size to fully utilize the GPU, which increased the speed of the math formula parser by 6 times
Built new tools for visualization and evaluation of parsing results and fine-grained errors analysis
Worked on a PDF symbol extractor, called SymbolScraper that identifies precise bounding box locations in born-digital PDF documents
Wrote an API for recognizing handwritten and typeset formulas and output the corresponding LATEX and MathML

Ayush Kumar Shah

Research

Current work:

Past/ongoing research works: