Semantic Recognition on Table Images from Visually Rich Documents

Xiao, Bin

Semantic Recognition on Table Images from Visually Rich Documents

Fichiers

Xiao_Bin_2024_thesis.pdf (64.21 MB)

Date

2024-10-18

Authors

Xiao, Bin

Éditeur

Université d'Ottawa / University of Ottawa

Licence Creative Commons

Attribution-NonCommercial-NoDerivatives 4.0 International

Résumé

Visually rich documents have been widely used in many scenarios because of their user-friendliness for human readers. However, with the surging number of these documents, it has been far beyond the capacity of humans to manage, extract critical information and mine useful knowledge from these documents efficiently, making it necessary to develop tools to manage and interpret these documents automatically. Typically, a document can contain different types of components, such as Regular Text areas, Tables, and Figures, and these components usually require different processing methods. Since tables are usually used to summarize vital information, and their two dimensions and complex structures make the semantic recognition challenging, this thesis focuses on the semantic recognition task on tables. With the development of Large Language Models (LLMs), applying LLMs to semantic recognition tasks has become a popular choice, as many studies have demonstrated the remarkable capacities of LLMs for semantic recognition tasks. Considering that even though multi-model LLMs can process images directly, their performance on semantic recognition tasks with images containing dense text is still far behind text-only LLMs, this thesis proposes to apply text-only LLMs on semantic recognition task on the table images from visually rich documents. Since the tables from visually rich documents are usually images, this thesis introduces a complete solution to fill the modality gap between table images and text-only LLMs, including Table Detection (TD) and Table Structures Recognition (TSR) models, and further use Table Question Answering (Table-QA) problem as an example of semantic recognition tasks. Comprehensive experiments are conducted on various datasets for models in the TD, TSR and Table-QA, and the experimental results demonstrate the superiority of the proposed solution.

Mots-clés

Table Detection, Table Structure Recognition, Table Question Answering, Large Language Model

URI

http://hdl.handle.net/10393/49773
https://doi.org/10.20381/ruor-30630

Collections

- Thèses, 2011 - // Theses, 2011 -

Notice complète

Semantic Recognition on Table Images from Visually Rich Documents

Fichiers

Date

Authors

Nom de la revue

ISSN de la revue

Titre du volume

Éditeur

Licence Creative Commons

Résumé

Description

Mots-clés

Citation

URI

Collections

Approbation

Évaluation

Complété par

Référencé par