PDF Viewer

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6
At a glance
Powered by AI
Some of the key takeaways from the document are that it discusses how to create a .NET PDF viewer control without requiring Acrobat, and that it uses various open source libraries like Ghostscript and PDFLibNET to convert PDFs to images and for rendering.

The basic steps that need to take place in order to view a PDF document are to get the page count, convert specific pages to raster images on demand, extract the current frame to be viewed from the raster image, convert the frame to a System.Image, and display the frame in a PictureBox control.

The main libraries used are Ghostscript for printing, PDF to image conversion and fallback rendering, PDFLibNET for quick PDF to screen rendering, searching and bookmarks, iTextSharp for getting page counts and extracting bookmarks, and FreeImage.NET for image manipulation.

Introduction

This article discusses how to create a .NET PDF Viewer control that is not dependent on Acrobat
software being installed.

Fundamental Concepts
The basic steps that need to take place in order to view a PDF document:

1. Get a page count of the PDF document that needs to be viewed to define your page
number boundaries (iTextSharp or PDFLibNET)
2. Convert the PDF document (specific page on demand) to a raster image format
(GhostScript API or PDFLibNET)
3. --(Deprecated) Extract only the current frame to be viewed from the raster image
(FreeImage.Net)
4. Convert the current frame to be viewed into a System.Image
5. Display the current frame in a PictureBox control

Several utility classes were created or added from others which expose functionality needed from
the various helper libraries.

 GhostScriptLib.vb (contains methods to convert PDF to TIFF for Viewing and Printing)
 AFPDFLibUtil.vb (contains methods to convert PDF to System.Image for Viewing and
Printing as well as methods to create a Bookmark TreeView)
 iTextSharpUtil.vb (contains methods for getting PDF page count, converting images to
searchable PDF and for extracting PDF bookmarks into TreeNodes)
 PrinterUtil.vb (contains methods for sending images to printers)
 ImageUtil.vb (contains methods for image manipulation such as resize, rotation,
conversion, etc.)
 TesseractOCR.vb (contains methods for Optical Character Recognition from images)
 PDFViewer.vb (contains the Viewer user control)

I was tempted to move every function over to PDFLibNet (XPDF) which is faster, but after a lot
of testing, I decided to use Ghostscript and PDFLibNET. Ghostscript is used for printing, "PDF
to image" conversion, and as a secondary renderer in case of XPDF incompatibility.
PDFLibNET is used for quick PDF to screen rendering, searching, and bookmarks.

Using the Code


This project consists of 7 DLLs that must all be in the same directory:

 FreeImage.dll
 FreeImageNET.dll
 gsdll32.dll
 itextsharp.dll
 PDFLibNET.dll
 tessnet2_32.dll
 PDFView.dll

Due to file size restrictions, I could not include the Ghostscript 8.64 DLL (gsdll32.dll) in the
source code. Please download the Win32 Ghostscript 8.64 package from sourceforge.net and
place the file "gsdll32.dll" into the \PDFView\lib directory where the other DLLs already exist.

To place a PDF control on form:

Hide Copy Code


Dim PDFFileName As String = "MyPDF.pdf"

Dim PDFViewer As New PDFView.PDFViewer


' Specify whether you want to see bookmarks in the control
' Bookmarks are enabled by default
' PDFViewer.AllowBookmarks = False 'Disable bookmarks

' Get the page count of the PDF document if you want to
' conditionally set properties of the PDFViewer control
' Dim PageCount As Integer = PDFViewer.PageCount(PDFFileName)

' To use Ghostscript, UseXPDF = False


' Ghostscript is slower, but is more compatible and has higher quality
rendering
' To use XPDF, UseXPDF = True
' XPDF is quite a bit faster than Ghostscript since there is no file i/o
involved
' PdfViewer1.UseXPDF = False 'Disables use of XPDF and associated features

' PDFViewer displays the file as soon as the FileName property is set
' File can be a PDF or a TIFF
PDFViewer.FileName = OpenFileDialog1.FileName

PDFViewer.Dock = DockStyle.Fill 'Autosize the viewer control

Me.Controls.Add(PDFViewer)

The essential part of this solution is extracting the current frame to be viewed from a multi-frame
(or single frame) image. At first I used System.Drawing to implement it. I found this to be
slower than other C++ solutions that use DIBs (Device Independent Bitmaps) to perform graphic
conversions.

Hide Copy Code


Public Shared Function GetFrameFromTiff_
(ByVal Filename As String, ByVal FrameNumber As Integer) As Image
Dim fs As FileStream = File.Open(Filename, FileMode.Open,
FileAccess.Read)
Dim bm As System.Drawing.Bitmap = _
CType(System.Drawing.Bitmap.FromStream(fs), System.Drawing.Bitmap)
bm.SelectActiveFrame(FrameDimension.Page, FrameNumber)
Dim temp As New System.Drawing.Bitmap(bm.Width, bm.Height)
Dim g As Graphics = Graphics.FromImage(temp)
g.InterpolationMode = InterpolationMode.NearestNeighbor
g.DrawImage(bm, 0, 0, bm.Width, bm.Height)
g.Dispose()
GetFrameFromTiff = temp
fs.Close()
End Function

I then tried implementing FreeImage with a .NET wrapper which gave it a little speed boost.
FreeImage also has a ton of image conversion functions which may come in handy if you
wanted to extend this into an editor.

Hide Copy Code


Public Shared Function GetFrameFromTiff2_
(ByVal Filename As String, ByVal FrameNumber As Integer) As Image
Dim dib As FIMULTIBITMAP = New FIMULTIBITMAP()
dib = FreeImage.OpenMultiBitmapEx(Filename)
Dim page As FIBITMAP = New FIBITMAP()
page = FreeImage.LockPage(dib, FrameNumber)
GetFrameFromTiff2 = FreeImage.GetBitmap(page)
page.SetNull()
FreeImage.CloseMultiBitmapEx(dib)
End Function

I ended up implementing PDFLibNET which gave it a substantial speed boost since the amount of
File I/O operations were reduced. Another streamlined routine for extracting one page from a
PDF was added to the Ghostscript utility class as well.

AFPDFLibUtil.vb

Hide Shrink Copy Code


Public Shared Sub DrawImageFromPDF(ByRef pdfDoc As AFPDFLibNET.AFPDFDoc,
ByVal PageNumber As Integer, ByRef oPictureBox As PictureBox)
If pdfDoc IsNot Nothing Then
pdfDoc.CurrentPage = PageNumber
pdfDoc.CurrentX = 0
pdfDoc.CurrentY = 0
pdfDoc.RenderDPI = RENDER_DPI
pdfDoc.RenderPage(oPictureBox.Handle.ToInt32())
oPictureBox.Image = Render(pdfDoc)
End If
End Sub

Public Shared Function Render(ByRef pdfDoc As AFPDFLibNET.AFPDFDoc) As


Bitmap
If pdfDoc IsNot Nothing Then
Dim backbuffer As New Bitmap(pdfDoc.PageWidth, pdfDoc.PageHeight)
Dim g As Graphics = Graphics.FromImage(backbuffer)
Using g
Dim lhdc As Integer = g.GetHdc().ToInt32()
pdfDoc.RenderHDC(lhdc)
g.ReleaseHdc()
End Using
g.Dispose()
Return backbuffer
End If
Return Nothing
End Function

GhostScriptLib.vb

Hide Shrink Copy Code


Public Shared Function GetPageFromPDF(ByVal filename As String,
ByVal PageNumber As Integer, Optional ByVal ToPrinter As Boolean = False)
As Image
Dim converter As New ConvertPDF.PDFConvert
Dim Converted As Boolean = False
converter.RenderingThreads = Environment.ProcessorCount
converter.OutputToMultipleFile = False
If PageNumber > 0 Then
converter.FirstPageToConvert = PageNumber
converter.LastPageToConvert = PageNumber
Else
GetPageFromPDF = Nothing
Exit Function
End If
converter.FitPage = False
converter.JPEGQuality = 70
If ToPrinter = True Then 'Settings for decent print quality
converter.TextAlphaBit = -1
converter.GraphicsAlphaBit = -1
converter.ResolutionX = PRINT_DPI
converter.ResolutionY = PRINT_DPI
Else 'Settings for screen resolution
converter.TextAlphaBit = 4
converter.GraphicsAlphaBit = 4
converter.ResolutionX = VIEW_DPI
converter.ResolutionY = VIEW_DPI
End If
converter.OutputFormat = COLOR_PNG_RGB
Dim input As System.IO.FileInfo = New FileInfo(filename)
Dim output As String = System.IO.Path.GetTempPath & Now.Ticks &
".png"

Converted = converter.Convert(input.FullName, output)


If Converted Then
GetPageFromPDF = New Bitmap(output)
ImageUtil.DeleteFile(output)
Else
GetPageFromPDF = Nothing
End If
End Function

In the PDFViewer code, a page number is specified and:

 The page is loaded from the PDF file and converted to a System.Image object.
 The PictureBox is updated with the image.
Hide Copy Code
Private Function ShowImageFromFile(ByVal sFileName As String,
ByVal iFrameNumber As Integer, ByRef oPictureBox As PictureBox,
Optional ByVal XPDFDPI As Integer = 0) As Image
oPictureBox.Invalidate()
If mUseXPDF Then 'Use AFPDFLib (XPDF)
If ImageUtil.IsPDF(sFileName) Then
If XPDFDPI > 0 Then
AFPDFLibUtil.DrawImageFromPDF(mPDFDoc, iFrameNumber + 1,
oPictureBox, XPDFDPI)
Else
AFPDFLibUtil.DrawImageFromPDF(mPDFDoc, iFrameNumber + 1,
oPictureBox)
End If
End If
Else 'Use Ghostscript if PDF or use System.Drawing if TIFF
If ImageUtil.IsPDF(sFileName) Then 'convert one frame to a tiff
for viewing
oPictureBox.Image =
ConvertPDF.PDFConvert.GetPageFromPDF(sFileName,
iFrameNumber + 1)
ElseIf ImageUtil.IsTiff(sFileName) Then
oPictureBox.Image = ImageUtil.GetFrameFromTiff(sFileName,
iFrameNumber)
End If
End If
oPictureBox.Update()
Return oPictureBox.Image
End Function

Points of Interest
This project was made possible due to various open source libraries that others were kind enough
to distribute freely. I would like to thank all of the Ghostscript, FreeImage.NET, iTextSharp,
TessNet, and AFPDFLib (PDFLibNet) developers for their efforts.

History
 19th June, 2009: 1.0 Initial release
 22nd June, 2009: Updated source code to correctly scale printed pages to the Printable
Page Area of the printer that is selected
 7th July, 2009: Updated source code to use AFPDFLib(XPDF) or Ghostscript for PDF
rendering
 15th July, 2009: Updated source code to use PDFLibNet(XPDF ver 3.02pl3) and added
search/export options
 22nd July, 2009: Added "Image to PDF" import, password prompt for encrypted PDF
files, fallback rendering to Ghostscript if XPDF fails, latest version of PDFLibNet with
various bug fixes applied, and LZW compression for "PDF to TIFF" export
 20th August, 2009: Major changes:
o Added the ability to convert images into a searchable PDF (OCR is English only
for now)
o Added the ability to export a PDF to an HTML Image Viewer
o Pages are only rendered at the DPI needed to fill the Viewer window (good speed
increase)
o Rotated page settings are kept while viewing the document
o Added the ability to convert images into an encrypted PDF
o Changed bookmark tree generation to use recursion
o Multiple bug fixes (see SVN log on the repository)
 5th October, 2009
o Fixed problem with incorrect configuration error with PDFLibNet.dll
o Removed dependencies on FreeImage

You might also like