Professional Software Development Outsourcing
Not all companies have the luxury of owning an in-house tech team. That is to say, the luxury of having a skilled team they can consult when looking to achieve tech-related business goal...
Generating PDF from HTML would seem a rather simple action requiring little time and effort. However the reality is somewhat different, and finding the best solution can often be challenging.
Let’s consider this hypothetical scenario:
We have a React App through which we’d like to create a PDF either from the entire page or just in part. Our PDF document could contain charts, tables, images and/or plain text and should be structured without cuts or overlappings. We also want a button on the page allowing us to save our document.
In this article, I will walk you through some different solutions whilst outlining the pros and cons of each. We will start with the simplest method then graduate to the most complex.
Generally speaking, a browser can already save and print PDFs from our pages: just press Ctrl/Cmd + P for the adjustable document pop-up by which you can customize its appearance.
Creating a button to perform the same action is as follows:
Should we wish to change its appearance, hide certain items or change the elements’ size in the PDF, we can write CSS print rules:
We also might want to manage page breaks and/or eliminate overlappings. This can be achieved with some specific style properties as shown in this example:
Here’s an excellent article describing more of what you can do using these print rules in CSS.
For something small and simple this is an ideal solution and one that would be over-engineered by the use of libraries. But it is not so ideal when access to code-generated documents is required.
Pros:
There are no external libraries
It is simple to implement
It does not overload the user's machine
It allows the user to select and search text
Cons:
It can be problematic rendering identical results in different browsers
Save buttons can be difficult to find owing to browser-rendering discrepancies
There is no access to code-generated PDFs
The PDF document content is dependent upon the size of the browser window
Here’s another straightforward solution: just take a screenshot of the page or element and convert it to a PDF document with Canvas and image transformation:
html2canvas for creating a screenshot from HTML and generating Canvas from it
jsPDF to transform an image to PDF
We can make the Canvas => Image part with vanilla JavaScript. Accordingly, the function will look like this:
As you can see, it is possible to add some styles before the HTML => Canvas transformation.
However, if the HTML is lengthy you might want to relegate different elements to separate pages. To do this, create a screenshot from multiple elements and combine them into one PDF document:
This way, you can create decent PDFs looking just like the original HTML, and now have control over both the document’s appearance and the elements that can be included in it. The downside, however, is that there is still no capacity to select and search the text.
Pros:
It is highly similar to HTML
Easy implementation
It has access to generated PDF from code
Cons:
The user is unable to select and search text
The PDF document content is dependent upon the size of the browser window
External packages are required
jsPDF (as mentioned), PDFKit, React-pdf are all libraries you can use to create PDF in React, however, a problem remains: that all HTML and CSS must be specifically created for your PDF document.
In our scenario, then, this solution is also insufficient since we prefer simply to copy our HTML with minor changes, not rewrite it with variations on the same design. Still, it is a useful option if you want to create a PDF from scratch using information from another source.
Here’s how it looks with React-pdf:
Pros:
It gives access to generated PDF from code
The PDF document content is not dependent upon on the browser window size
The result is identical in different browsers
It is able to select and search text
Cons:
It is unable to copy the existing on-page HTML
It can be quite time-consuming.
The code is likely to contain two different variations of the same design
Given its code is written on the back-end, this solution is the most complex and unlike any of the fully client-side ones mentioned above.
In other words, the Puppeteer is a browser you can run from Node.js. And from the documentation, we see that it can be used to generate screenshots and PDFs of pages.
Here’s an example that navigates directly to the URL, changes some styles and generates a PDF file:
Once created, the document is sent back to the front-end. On the client-side, it is then fetched, transformed to the blob and saved.
Like so:
This seems the most comprehensive solution as it affords the greatest number of benefits and can address even the most difficult of cases. Moreover, the document it will generate allows the selecting and searching of text and can also be saved on the server without any additional API calls.
Pros:
It has access to generated PDF from code
The PDF document content is not dependent upon the size of the browser window
It allows you to select and search text
It is very similar to HTML
Does not overload the user's machine
Cons:
It requires client and server-side code
Implementation can be complicated in some cases
As we’ve seen, generating PDFs from HTML can be problematic. But it need not be, and the examples above are just a few options for you to consider when tackling the issue. With a bit of trial and error you’ll breach the impasse, so do try tinkering with some different options to determine which solution works best for you.
And... good luck!
If you would like to know more, contact us - hello@start-up.house