On May 14 this year, OpenAI released GPT-4o. By late May, LLMs began a price war, with major manufacturers reducing the prices of their flagship models by over 90%, and many consumer applications were made fully free. LLM-powered QA bots are no longer exclusive to high-end tech players. With the decreasing usage threshold, even non-tech users like us, who don't write professional prompts, can use LLMs in a conversational manner to replace human labor and complete some tasks.
However, as the number of users increases, the “unsatisfactory cases” shared online have become quite varied. On social media platforms, we often see complaints about LLMs being “not accurate,” “not intelligent,” or even “nonsensical.”
For instance, professionals in the financial sector have told us that they often use LLMs for reading, summarizing, and extracting information from annual reports. While LLMs serve as quite good “intelligent assistants” in summarization, they frequently encounter hallucination issues when specific information extraction is required. LLMs sometimes fail to locate and extract the correct content from documents, fabricate information, or respond based on internet searches rather than the provided documents.
When LLMs start to “make things up,” users face double the trouble—imagine relying on AI to improve efficiency while spending more time correcting the AI, making it hard to tell who is working for whom.
We speculate that there are several possible reasons for this issue: First, some annual reports are hundreds of pages long, and the context length required for input and output might exceed the token limits of LLMs, leading to inaccurate or fabricated generation. Second, annual reports often contain many charts; if these charts are embedded as images in PDF files, LLMs might not correctly parse them, resulting in lost key data.
When LLMs encounter more "non-professional" scenarios, such as inventory forms with handwritten corrections or poorly scanned medical test reports, they often perform unsatisfactorily. Poor table recognition, data extraction errors, and failed information reorganization are obstacles that hinder our use of LLMs to "reduce workload."
After going through the effort of photographing, scanning, and uploading a bunch of documents, the results often turn out to be riddled with errors. It's no wonder there's frequent criticism: teaching an LLM to work can be more time-consuming than doing it manually!
So, are there ways to improve the effectiveness of LLMs for everyday consumer use?
Is the LLM not performing well? Or are we just not using it correctly?
Is it that the LLM is underperforming, or are users not using it correctly? The answer to both questions is clearly not affirmative.
Firstly, training and optimizing LLMs require substantial computational power and data support. Over the past two years, LLMs have been growing at an astonishing pace, and we have no doubt they will change existing work practices. However, any new technology or tool entering society inevitably goes through a period of adjustment and collision.
Secondly, as LLM-related products enter various fields of work and daily life, it is unrealistic to expect everyone to possess detailed technical knowledge, such as understanding prompt engineering and its logic, to communicate effectively with AI. Technological development and updates are meant to benefit more people, not create knowledge barriers.
For us, the closer the LLM's interaction mode is to human communication, the better it works.
In the cases mentioned above, there are a few methods to improve the effectiveness of current LLM-powered QA bot products: First, reduce the context length provided to the LLM and manually perform an initial screening of information; Second, use more effective prompts to communicate with the LLM. However, both methods require users to invest more effort or wait for LLM products to be optimized and iterated, which is not the ideal solution we are looking for.
Despite this, when facing issues with LLM file recognition and information extraction, we can find convenient and quick solutions from another perspective. Currently, most LLMs support multimodal interaction, but when users upload non-digital documents or files with complex layouts, the response effectiveness is often suboptimal. The main issue is that during file parsing, complex structural errors or losses affect the generation of response results. For example, if the recognition of tables goes wrong, the correspondence of rows and columns becomes confused, and the precise data contained becomes meaningless, turning into data useless for subsequent understanding.
For LLMs, file formats like Markdown and JSON that contain structured information are better input methods. Markdown, in particular, is usually consistent with the file formats used during LLM training, making it clear and "user-friendly" for the LLM.
Therefore, external tools for file parsing that are efficient, accurate, compatible, and support various formats are excellent assistants for everyday LLM use.
I recommend a user-friendly, LLM-friendly parsing tool—TextIn Document Parser. It supports converting any file format (images, PDFs, Doc/Docx, web pages, etc.) into Markdown or JSON format, with speeds up to parsing a 100-page document in just 1.5 seconds. It operates on mobile devices without the need to upload long documents and wait while holding your phone. Additionally, TextIn Document Parser offers high parsing accuracy, capable of handling complex tables and scanned documents, eliminating the need to struggle with curved or dark photos taken by your phone.
Let’s take a look at its practical application.
Is the parsing tool effective?
A practical parsing tool not only needs to be accurate and efficient in professional fields but also must meet everyday usage needs and address common scanning and parsing challenges.
One such challenge is hospital lab reports. For individuals, finding unfamiliar metrics in lengthy health check reports or numerous lab documents can be quite overwhelming. Even though healthcare professionals are more specialized, they handle a larger volume of data, making LLMs potentially very useful in this field.
However, medical reports, which vary in format, are often poorly photographed and lack standardized table structures, pose significant difficulties for document parsing.
The image above is a screenshot of a blood test report. We conducted a simple test by uploading the PDF version of the report to a commonly used domestic LLM and requested the LLM to extract several key pieces of information that we were interested in
Comparing with the original image, it is clear that the LLM's response was not ideal. The first data point had parsing errors due to the file's lack of clarity, resulting in unsupported values from the LLM. For the second data point, the LLM misinterpreted a downward arrow as the number 1. In the third question, there was a phenomenon of serial reading errors. From these incorrect results, it is evident that the most likely cause of the errors in generation comes from the file parsing process.
We uploaded the same document to the PDF-to-Markdown tool in TextIn Tools and obtained the Markdown format parsing results. The preview shows that the parsing tool converted the file data into a table with a clear structure that the LLM can understand.
Upload the Markdown file back to the LLM and pose the same questions.
This time, for the three previously incorrect questions, the LLM provided clear and correct answers, including information on arrows in the test report and reference value ranges.
It is evident that an accurate and efficient parsing tool significantly improves the effectiveness of using LLM-powered QA bots in everyday applications.
In other complex formatted images and documents, such as forms and invoices, the TextIn Document Parser also performs well. For example, common quotation sheets with merged cells and stamp obstructions are accurately and completely parsed by the PDF to Markdown tool.
Overcoming these challenges allows TextIn Tools to help us use AI tools anytime and anywhere in our daily lives, reducing manual, repetitive labor, and saving time, effort, and hassle. We believe this is the true significance of LLM development for each of us.
Try the Current Version of TextIn's PDF to Markdown Parser
If you have immediate needs, can you use TextIn's doc paser on demand?
Developers can register an account on the TextIn platform and try the latest version of the TextIn's PDF to markdown paser at any time.
Visit: http://textin.ai/experience/pdf_to_markdown
If you want to try code calls, you can also visit the corresponding API documentation:
http://textin.ai/document/pdf_to_markdown
The platform provides a Playground to help developers pre-debug the interface.
Click the "API Debug" button on the page to enter the debugging page.
Here you can configure some interface parameters, and after initiating the call, the results will appear on the right side.
If you want to use Python for calls, you can refer to the general sample code on the platform or join our discord group https://discord.gg/s2N3SAh9 to get more comprehensive demo code.
Our PDF to Markdown Paser now offers a free trial quota of 1000 pages of PDF, which can be claimed by joining our discord group. We welcome everyone to communicate more with our team and provide opinions or suggestions.