Aspose.Pdf allows developers to convert Html pages to PDF documents. With version 2.9.0.0 for .NET, Aspose.Pdf provides support to convert Aspx pages to PDF documents by BindHTML method . As same As HTML2PDF, developers can only convert simple Aspx pages at present. The way to do Aspx2Pdf is as same as HTML2PDF which will give the usage at last.
Limitations on current version:
How to Convert Html?
Aspose.Pdf offers Pdf class that has two methods: BindHTML and BindHTMLFromUrl . These methods enable developers to bind an Html document into Aspose.Pdf DOM and then save it to PDF document.
There are many ways to convert an Html file to a PDF document. Developers can take input Html file in many forms and then convert them to PDF documents simply by just calling a method. Let's understand all these possible ways to convert Html files to PDF ones with the help of examples given below:
Html File as a String
It is possible to convert a local Html file stored on your system by passing the physical path of the local html file to the BindHTML method of Pdf class as a String . Finally, call Save method of the Pdf class to save the Html content as PDF document.
Code Snippet
[C#]
Pdf pdf = new Pdf();
pdf.BindHTML(@"C:/xml/Test.html");
pdf.Save(@"C:/xml/Test.pdf");
[VB.NET]
Dim pdf As Pdf = New Pdf()
pdf.BindHTML("C:/xml/Test.html")
pdf.Save("C:/xml/Test.pdf")
Html File as a Stream
Developers can encapsulate an Html file as a Stream object and then pass that Stream to the BindHTML method of Pdf class. In the example, we have made use of MemoryStream class to represent an Html file as a Stream .
Example:
[C#]
Pdf pdf = new Pdf();
MemoryStream ms = new MemoryStream();
ms=...;
pdf.BindHTML(ms);
pdf.Save(@"C:/xml/Test.pdf");
[VB.NET]
Dim pdf As Pdf = New Pdf()
Dim ms As MemoryStream = New MemoryStream()
ms=...
pdf.BindHTML(ms)
pdf.Save("C:/xml/Test.pdf")
[Java]
Pdf pdf = new Pdf();
File cssInHtml=new File("examples/resources/cssinner.html");
pdf.bindHTML(new FileInputStream(cssInHtml),cssInHtml.toURL());
pdf.save(new FileOutputStream(new File("exampleOutput/HtmlExample_cssInHtml.pdf")));
Html File as a TextReader
Developers can also encapsulate an Html file as a TextReader object and then pass it to the BindHTML method of Pdf class.
Code Snippet
[C#]
Pdf pdf = new Pdf();
TextReader tr = new TextReader(htmlFile);
pdf.BindHTML(tr);
pdf.Save(@"C:/xml/Test.pdf");
[VB.NET]
Dim pdf As Pdf = New Pdf()
Dim tr As TextReader = New TextReader(htmlFile)
pdf.BindHTML(tr)
pdf.Save("C:/xml/Test.pdf")
Html File as a Web URL
The most common way would be to pass the Html file in the form of a Web URL to the BindHTMLFromUrl method of Pdf class. The Web URL would be passed as a String .
Code Snippet
[C#]
Pdf pdf = new Pdf();
pdf.BindHTMLFromUrl(@"http://www.Aspose.com/Test.html");
pdf.Save(@"C:/xml/Test.pdf");
[VB.NET]
Dim pdf As Pdf = New Pdf()
pdf.BindHTMLFromUrl("http://www.Aspose.com/Test.html")
pdf.Save("C:/xml/Test.pdf")
Developers also have the option to specify the HTTP method used to open HTTP connection such as GET, POST, PUT, or PROPFIND when binding the Html file to PDF document.
Code Snippet
[C#]
Pdf pdf = new Pdf();
pdf.BindHTMLFromUrl(@"http://www.Aspose.com/Test.html","GET");
pdf.Save(@"C:/xml/Test.pdf");
[VB.NET]
Dim pdf As Pdf = New Pdf()
pdf.BindHTMLFromUrl("http://www.Aspose.com/Test.html","GET")
pdf.Save("C:/xml/Test.pdf")
Supported Html Tags
Tag Types |
Tags |
Skeleton |
<html>, <head>, <body> |
Text (Nature based) |
<b>, <big>, <i>, <small>, <sub>, <sup>, <tt>, <u>, <s>, <strike> |
Text (Content based) |
<cite>, <code>, <em>, <kbd>, <samp>, <strong>, <var>,<dfn> |
Table |
<table>, <caption>, <tbody>, <tfoot>, <thead>, <tr>, <td>, <th> |
List (Definition based) |
<dl>, <dt>, <dd> |
List (Unordered) |
<ul>, <li> |
List (Ordered) |
<ol>, <li> |
Others |
<address>, <blockquote>, <br>, <font>, <hr>, <p>, <pre>, <span>, <a>,<center>,<form>,<xmp>,<img>,<div>,<br> |
Note:The effect of conversion of blockquote, form, dl and xmp is not so good as the rest because there may be unnecessary line breaks.
Mapping of Each Html Tag to Aspose.Pdf Xml
<a>
Processing for the anchor tag is complex. Anchor tags can be of two types. One is the named anchor and other is the regular one.
First of all, if it is a named anchor, we write an element with the appropriate ID. ( In the special case, if the next element is an <h1>, we ignore the element altogether and put the ID on the <h1> )
Next, if this is a regular anchor and the href attribute value starts with a hash mark (#), a link with an internal destination is created. Moreover, an acnhor with a hash (#) is appropriate to refer to a relative PDF document. Otherwise, we create a link with an external destination URL. The linked text is colored as blue text.
<address>
An address element is rendered in Italics.
<body>
If there is a title attribute, it is displayed at the beginning as the PDF document title. And if there is a text attribute, we set the inner text color as the attribute suggests. The inner content is nested processed.
<b> <kbd> <strong>
For bold elements, we just change the font weight as the IsTrueTypeFontBold set to true.
<big>
The big element is handled with a relative font size. That means, a <big> element inside another <big> element will be even bigger, just as it is in Html.
<blockquote>
A blockquote is indented on both sides. It is rendered as MarginLeft and MarginRight are set to 2cm.
<br>
A break element is handled by inserting a carriage return.
<center>
If you use <center>, it creates a new paragraph that's centered on the page.
<cite>
The <cite> element is rendered in Italics as attribute IsTrueTypeFontItalic is set to true.
<code>
<code> is inlined in a monospaced font. But now Aspose.Pdf doesn’t support monospace font.
<dl>
We don't do anything with the <dl> element, we just handle the elements it contains. We ignore any text that appears in the <dl> itself.
<dd>
We render it as indented.
<dt>
It is rendered as bold text.
<em> <i> <var>
The Html <em> element is typically rendered in Italics as the attribute IsTrueTypeFontItalic set to true.
<font>
For the <font> element, The color, face, and size attributes are handled. Color works if it's one of the twelve colors or RGB decimal digit format, supported by Aspose.Pdf (In other words, if you set the color to #ffffff, you're out of luck). The face attribute will work if Aspose.Pdf supports it. Size attribute is supported for values like size = "14" (that is pt as unit), size = "+1" and size = "-1".
If the size attribute contains the string pt (for example: size = "24pt") then use the value as it is (but remove the "pt" unit).
If the size attribute begins with a plus or minus sign like size = "2" or size = "-1" then use a relative size for the font. For example, map 1 to a relative font size of 2 increase.
As a last resort, set the font size to 12pt.
<h1> <h2> <h3> <h4> <h5> <h6>
We render these tags as the table below. Each heading element is converted to a Heading element in Aspose.Pdf with a different level. We also generate an ID for this element.
HTML Tag |
Font size |
Heading Level |
Other |
<h1> |
24pt |
1 |
Add a break before the text |
<h2> |
20pt |
2 |
None |
<h3> |
18pt |
3 |
None |
<h4> |
16pt |
4 |
None |
<h5> |
14pt |
5 |
None |
<h6> |
12pt |
6 |
Text is italicized |
<hr>
We render an <hr> as a carriage return (so no leading rule rendered). Because <hr> is empty, we don't have to process any child elements.
<img>
For the <img> element, we use the src attribute as it comes from Html. We also check for any width and height attributes. If those attributes are there, we try to use them like height="300px" is used as it is ( but no unit reserved ) while height="300" is converted to the value as it is.
<ol> <li>
We handle an ordered list with two complications: If the list appears inside another list ( either an <ol> or <ul> ), we don't put any vertical space after it. The other issue is that we indent the list according to how deeply nested the list is. If there is no ancestor <ol> or <ul> , it will add just 2 indent whitespaces before <li> text. If either <ol> or <ul> is present then before the <li> text ,we add whitespaces of number which equals to the count of ancestor <ol> or <ul> multiply by 2. The numbering label of the <li> text is restarted if the lists are nested.
<p>
It is rendered as a basic paragraph.
<pre>
Pre-formatted text is rendered in a monospaced font. We just copy its inner text as the source.
<samp>
Sample text is rendered in a slightly larger ( like font size = "+1" ) monospaced font.
<small>
The <small> element is rendered with a relative font size. That means putting one <small> element inside another one creates really small text. The conversion details of font size are as same as we process the <big> tag.
<strike>
For strikethrough text, we use the text decoration property IsStrikeOut as true.
<sub>
For subscript text, we use the IsBaseline = "true" property and decrease the font size to 5.
<sup>
For supscript text, we use the IsBaseline = "false" property and decrease the font size to 5.
<table>
Tables are a hassle. The main problem we have is converting the cols attribute into some number of <Table> element’s ColumnWidths attribute. If there is a cols attribute in the <table> tag and its value is larger than 9 then we make the ColumnWidths attribute value as it is. If there is a cols attribute in the <table> tag but its value is smaller than 9 then we just take it as a column number and evaluate the average column width as:
( page width - page margin left - page margin right ) / @cols |
where @cols means Number of Columns
If there is no cols attribute in <table> tag, we evaluate the ColumnWidths attribute value as:
( page width - page margin left - page margin right ) / column number |
The number columnnumber is computed by searching the first <tr>‘s subordinate <td> which has no columnspan attribute and the respective <td> elements which has columnspan attribute. We also process the caption subordinate tag of <table> just to display it.
<tfoot> <thead> <tbody>
Just to process their subordinate <tr> tags.
<td>
We render the <td> as a Cell in a Row of a Table. Each Cell has attributes: PaddingTop = "3", PaddingBottom = "3", PaddingLeft = "3" and PaddingRight = "3". If there is a border attribute, we also generate a Border element in the Cell. The border is black colored and has a line width equal to 1. Still, there is VerticalAlignment process if the <td> has a valign attribute.
Note: Aspose.Pdf doesn’t process the align attribute yet.
<th>
We render the <th> as a Cell in a Row of a Table too. Each Cell has attributes: PaddingTop = "3", PaddingBottom = "3", PaddingLeft = "3" and PaddingRight = "3". If there is a border attribute, we also generate a Border element in the Cell. The border is black colored and has a line width equal to 1. Still, there is VerticalAlignment process if the <td> has a valign attribute.
Note: Aspose.Pdf doesn’t process the align attribute yet.
<tr>
We render the <td> as a Row in a <Table>.
<tt>
Teletype text is rendered in a monospaced font. But Aspose.Pdf doesn’t process the monospaced font.
<u>
For underlined text, we use the text decoration property: IsUnderline = "true".
<ul> <li>
List items inside unordered lists are easy to handle. We just have to use the correct Unicode character • for the bullet. The indentation and whitespaces before the <li> text is same as that of <ol>.
Convert Aspx to PDF
[C#]
pdf = new Aspose.Pdf.Pdf();
string surl=this.Request.Url.ToString();
WebClient webClient = new WebClient();
byte[] myDataBuffer = webClient.DownloadData(surl);
MemoryStream postStream=new MemoryStream (myDataBuffer);
pdf.BindHTML(postStream);
pdf.Save(@"c:\test.pdf")
[VB]
pdf = New Aspose.Pdf.Pdf
Dim surl As String = Request.Url.ToString()
Dim webClient As WebClient = New WebClient
byte[] myDataBuffer = webClient.DownloadData(surl)
MemoryStream(postStream = New MemoryStream(myDataBuffer))
pdf.BindHTML(postStream)
pdf.Save("c:\test.pdf")