Searching the ACL Anthology with Math Formulas and Text

Published in International ACM SIGIR Conference on Research and Development in Information, 2023

Recommended citation: B. Amador, M. Langsenkamp, A. Dey, A. K. Shah, and R. Zanibbi, “Searching the ACL Anthology with Math Formulas and Text,” in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, in SIGIR ’23. New York, NY, USA: Association for Computing Machinery, Jul. 2023, pp. 3110–3114. doi: 10.1145/3539618.3591803.

[url] [pdf] [poster] [code]

Abstract:

Mathematical notation is a key analytical resource for science and technology. Unfortunately, current math-aware search engines require LaTeX or template palettes to construct formulas, which can be challenging for non-experts. Also, their indexed collections are primarily web pages where formulas are represented explicitly in machine-readable formats (e.g., LaTeX, Presentation MathML). The new MathDeck system searches PDF documents in a portion of the ACL Anthology using both formulas and text, and shows matched words and formulas along with other extracted formulas in-context. In PDF, formulas are not demarcated: a new indexing module extracts formulas using PDF vector graphics information and computer vision techniques. For non-expert users and visual editing, a central design feature of MathDeck’s interface is formula ‘chips’ usable in formula creation, search, reuse, and annotation with titles and descriptions in cards. For experts, LaTeX is supported in the text query box and the visual formula editor. MathDeck is open-source, and our demo is available online.



.bib:

@inproceedings{10.1145/3539618.3591803,
author = {Amador, Bryan and Langsenkamp, Matt and Dey, Abhisek and Shah, Ayush Kumar and Zanibbi, Richard},
title = {Searching the ACL Anthology with Math Formulas and Text},                                         
year = {2023},                                                                                             
isbn = {9781450394086},                                                                                    
publisher = {Association for Computing Machinery},                                                         
address = {New York, NY, USA},                                                                             
url = {https://doi.org/10.1145/3539618.3591803},                                                           
doi = {10.1145/3539618.3591803},                                                                           
booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {3110–3114},                                                                                       
numpages = {5},                                                                                            
keywords = {mathematical information retrieval (mir), multimodal retrieval, latex, pdf, math-aware search},
location = {Taipei, Taiwan},                                                                               
series = {SIGIR '23}
}

Leave a Comment