Research
Research Interests:
Pattern recognition, recognition of graphical structures, computer vision, speaker understanding, large language models, multi-modal deep learning, natural language processing.
Current work:
My work centers around designing fast, efficient, and interpretable parsers for recognizing mathematical formulas and chemical diagrams across multiple formats, including PDFs, typeset images, and handwritten strokes. Through graph attention-based techniques and the integration of Large Language Models (LLMs), I aim to enhance how contextual information is processed while preserving a natural and interpretable graph representation.
Past/ongoing research works:
- ChemScraper, a fast and accurate molecule diagram parser using characters and graphics extracted from born-digital (vector) PDF images—without the need for OCR, GPU, or vectorization. It uses these outputs to create training data for a new approach to visual parsing of molecule diagrams in raster images (i.e., pixel-based formats like PNGs) using a multi-task, segmentation-aware convolutional neural network (CNN).
- MathDeck project, a system for searching PDF documents in a portion of the ACL Anthology, incorporating both formulas and text, displaying matched words and formulas in context. Its user-friendly interface includes formula ‘chips’ for easy formula creation, search, reuse and annotation. MathDeck supports both LaTeX and visual formula editing.
- Built a new open-source math formula extraction pipeline for PDF files
- Adopted distributed parallelization methods with multiple GPUs and implemented custom dataloader with dynamic batch size to fully utilize the GPU, which increased the speed of the math formula parser by 6 times
- Built new tools for visualization and evaluation of parsing results and fine-grained errors analysis
- Worked on a PDF symbol extractor, called SymbolScraper that identifies precise bounding box locations in born-digital PDF documents
- Wrote an API for recognizing handwritten and typeset formulas and output the corresponding LATEX and MathML