Update: I updated the links again. pdf.js has moved to a new location on github.
Why?
While traveling to the Firefox 4 launch parties in Seoul and Taipei all the way from California, we killed a lot of time by brainstorming cool things to do with the web platform. Like many before us, we were wondering why nobody had implemented a PDF reader in HTML5/JavaScript. The kinds of operations a PDF reader needs to be fast at –render text, draw lines, blit images– need to be fast in browsers too, so browsers are already highly optimized for them.
Building an HTML5-based PDF renderer would also answer the question of whether the web platform and in particular canvas and SVG APIs are complete enough to efficiently and faithfully render PDFs.
Displaying PDFs directly in the browser would definitely improve the user’s experience. There are literally millions (billions?) of PDFs floating around the web, and on many devices loading PDFs switches to a different application (e.g. Preview on OS X and PDF View on Android). Also, external PDF readers and many plugins don’t support important PDF features well, including content links and fetch-as-you-go (HTTP range requests).
External readers and plugins are also forced to reinvent their own user interaction paradigms, meaning for example that users might scroll HTML pages in one way with one set of heuristics in the browser, but a totally different way in an external PDF reader.
It’s important to note that we’re not trying to promote PDF to a first-class web citizen like HTML5 is. Instead we hope that a browser-native PDF renderer written on the web platform allows web technologies to subsume PDF.
Benefits
The traditional approach to rendering PDFs in a browser is to use a native-code plugin, either Adobe’s own PDF Reader or other commercial renderers, or some open source alternative (e.g. poppler). From a security perspective, this enlarges the trusted code base, and because of that Google’s Chrome browser goes through quite some pain to sandbox the PDF renderer to avoid code injection attacks. An HTML5-based implementation is completely immune to this class of problems.
Project Status
We have been developing pdf.js in the open (on github.com), albeit quietly, for about a month now. We were waiting on the completion of some major features (Type1 fonts, gradients, etc.) before communicating pdf.js more broadly. We’ve been taken by surprise by the early and intense interest in our work, so we decided to blog and talk about our project earlier than we initially planned.
As part of our project plan, we are initially focused on achieveing pixel-perfect rendering of a single PDF paper, a 2009 paper on Trace Compilation we submitted to the ACM SIGPLAN PLDI conference. As the Tracemonkey work described in the paper led the way for JavaScript JITs, so we hope pdf.js opens the door to implementing legacy formats on top of the web platform.
If you want to see a demo of pdf.js, click on this link. There are still glitches and rendering artifacts, but you will get the picture. We are still missing Type1 PostScript fonts, which Vivien Nicolas is working on.
Along the way, we had to add some new interfaces to the HTML5 canvas element, and figure out how to implement some difficult features of the PDF spec in JavaScript. See Chris’s post for a general technological overview, and Shaon’s post for details on rendering “shading patterns”.
Whats next?
We intend to use pdf.js to render PDFs “natively”, within Firefox itself. Our most immediate goal is to implement the most commonly used PDF features so we can render a large majority of the PDFs found on the web. We believe we can reach that point in less than 3 months (the entire code so far is less than one month old, and it already renders a large set of PDF features).
Initially we will make a Firefox extension available to interested users that enables inline PDF rendering using pdf.js, but our ultimate goal is of course shipping pdf.js with Firefox. This will result in a substantial usability but also security improvement for our users. pdf.js uses only safe web languages and doesn’t contain any native code pieces attackers could exploit.
Open Source
We want pdf.js to be a community driven and governed open-source project. We’ll use it for Firefox, but we think there are many cool applications for it. We would love to see it embedded in other browsers or web applications; because it’s written only in standards-compliant web technologies, the code will run in any compliant browser. We are licensing pdf.js under a very liberal 3-clause BSD license and we welcome external contributors. We are looking forward to your ideas or code to make pdf.js better! Take a look at our github and our wiki, or talk to us on IRC in #pdfjs.
Chris Jones and Andreas Gal (and the pdf.js team)
OK This is very cool.
I’m evaluating the potential of client side javascript as a natural language processing platform, and what would be especially useful would be to have client side pdf to text functionality. I think that (and the general lack of javascript libraries for NLP ) are the two main blocks on the road to getting this stuff started. My javascript chops aren’t terribly good, but as a perl hacker of minor accomplishments, my ability to tape existing libraries together are second to none :). On the other hand I am trying to port some of the good perl NLP tools over to JS right now.
It’s been done before: http://blogs.gnome.org/alexl/2011/03/15/gtk-html-backend-update/
Watch for the evince part of the demo.
That is not the same thing. That is using an X11 windowing system running a PDF application to render to an HTML 5 canvas, rather than to the screen. It requires significant server-side resources and is inappropriate given the goals of this project. Broadway is an intriguing project in its own right, but cannot be included by default into any browser.
A lot of people have misinterpretted that same post. Various other erroneous assertions have been given about how much of a “game changer” it is.
It’s a very interesting project but it’s possible uses are limited by performance. For things like remote access or testing apps before downloading it’s awesome. However it has very little relavence to client-side rendering of pdf files.
Interesting idea and project, I’ll follow its evolutions.
I just find annoying that the pdf document renders as an image in the canvas element. It doesn’t allow plain text search and selection/copy/paste of thext
Canvas is our first backend. We are planning an SVG backend which will allow search, text selection and accessibility (its also a bit slower, so we will probably render canvas and then build an SVG DOM on demand). Chris’ blog post explains this in some detail.
And also provide better accessibility – though I just heard mo are working on remaining Canvas a11y bugs – ftw
I understand the security benefits here, but surely shipping a native PDF reader within firefox would allow quicker rendering (which is what I want as a user).
Actually I am not so sure about that. Canvas is highly optimized (using OpenGL and GPU acceleration), whereas most plugins use CPU-based pixel pushing. The first page of the document currently renders in 60ms on my machine. Acrobat Reader takes almost 2 seconds to open. So I think we can come up with something pretty competitive here when it comes to performance.
Canvas doesn’t use OpenGL for anything but composition into the page (yet). It only uses GPU acceleration when Firefox uses Direct2D, i.e., on Windows Vista and 7.
Nothing is being shown on the demo page with Safari, just the grey background. When I test with chrome it’s ok.
Very nice idea!
Yes, current Safari seems to not support typed arrays (trunk does I believe).
Same problem in the new Safari on 10.7 Lion DP4.
Hello,
I cannot say it strong enough how important this is. Thank you.
And as practicaly all print/press revolves around PDF/X-1a:2001, please, are you considering to support it?
We are aiming to support all frequently used PDF features. I haven’t looked at the PDF/X spec yet, but its definitely something that is within the scope of our project.
Cool!
PDF/X-1a is PDF 1.3 (i.e. no transparency), no RGB (just CMYK, spot colors and gray-scale) and mandatory ICC profiles and fully embedded fonts (i.e. no substitutions).
In other words, the aim of “1a” is for a print to come out as close to the author’s intend as possible (e.g.: no color shifts and no wrongly substituted or dropped characters).
Hey, This is Tim from Mozilla Taiwan.
Rendering PDF in Javascript is extremely interesting, and I am impressed that you guys can make things work within a month. Should invite you guys to Asia more :P
Hey Tim. Yes, our Asia trip was clearly very fruitful :) See you at the next Firefox community party, maybe :)
Pingback: pdf.js - randează documente PDF folosind HTML 5 şi Javascript | WorldIT
Sorry for the stupid question, but how (in one high level sentence) in HTML5+JS are you reading the binary PDF file?
We are using XMLHttpRequest and have it return an array of bytes which we then parse in JS.
“Building an HTML5-based PDF renderer would also answer the question of whether the web platform and in particular canvas and SVG APIs are complete enough to efficiently and faithfully render PDFs.”
Hasn’t Scripd already answered this question? True, we don’t know how they do it, since their conversion runs server-side, but they do output something that is HTML5, according to http://en.wikipedia.org/wiki/Scribd#Technology.
http://crocodoc.com/ seems to do server-side PDF conversion and HTML rendering as well. We are looking for maximum visual quality, anti-aliasing, sub-pixel font rendering etc. We already ran into a few missing APIs in canvas (specific gradients, dashed lines), so we do expect to push the envelope on canvas and we will propose some small additions to the standard.
Pingback: Mozilla planning to render PDF with HTML5 and JavaScript in Firefox | ZDNet
Pingback: Mozilla planning to use HTML5 and JavaScript to render PDFs in Firefox
Pingback: UK’s top mobile operators announce mobile wallet partnership « V E X E D
Cool … I guess stuff like the type1 font support will be available as a seperate library ?
Yes. The type1 support seems worth of a stand-alone library.
I love the goals of your project. Unfortunately the text is too blurry to be comfortably read at this point (rendered in Chrome 12, various zoom levels). I hope you are able to improve that aspect. Good work and good luck!
Cool! Quick question, I figure the answer is no but I’m interested in your ideas on this, will it be possible to only render the pages requested and not have to download the whole file to the user’s browser? There’s a lot of BIG pdfs out there and the advantage to having them processed on the server side is that you can use an image viewer like BookReader.js to just load the pages the user navigates to. I suppose the work on this project could be implemented to render svg from a pdf using a node.js server and then delivered on request to the browser…
I absolutely love this project and hope it does well. The only issue I see is that it currently does not work in IE 9. I know, most developers don’t like IE however the majority of companies use IE. If this can work in IE 9+ then I could see this as a huge winner with a lot more appeal. Are y’all going to be working on getting it to work with IE 9+?
That’s probably a question you should ask Dean Hachamovitch. IE9 doesn’t implement typed arrays I believe, which is probably why things don’t work. I heard some rumors this will change with IE10. IE9 is a pretty small market share (5%?) so we aren’t too worried about that. They will catch up with the HTML5 spec eventually.
Too bad typed arrays are nowhere in the HTML5 specs. They’re in WebGL, and not native to HTML5 per se. On the other side so called image data arrays are present in all HTML5 browsers, as they’re actually native to any browser supporting Canvas. If you would use that instead of the actually non-HTML5-standard typed arrays, your library would run in IE9 too.
So much about Mozilla and their projects being standards compliant.
WebGL is an open-standard. Every relevant modern browser implements it. Microsoft has the choice to get with the program, or be left behind. That having said, we are happy to take a patch that makes pdf.js work without typed arrays. Its probably not a lot of work.
With “every relevant browser” you obviously mean Chrome and Firefox, which even together only own about 1/3 of the browser market. IE, which owns 2.5x times the share of Firefox and 4.5x times that of Chrome’s (source: http://marketshare.hitslink.com/browser-market-share.aspx?qprid=0), is obviously considered non-revelant by you. Interesting re-defintion of the world “relevant”.
Also, Mozilla was always the first to blame web incompatibility issues on IE’s proprietary extensions to HTML and JavaScript, which even though were always free to be implemented by Firefox, did never actually ever happen, because “they were not part of the offical (W3C) specs”. Now, that tables have turned (in regard to proprietary extensions being introduced additional to the official specs) you obviously have no longer a problem with not even just providing, but even promoting them in your own projects, regardless of them leading to the same type of incompatibility issues (on whose elimination btw Microsoft worked so hard with IE9).
Hypocrisy much?
@ff2:
Seriously: Leave these guys alone. They are building something cool here.
It’s their project, and they are free to drive wherever the want.
They probably choose the implementation details for a reason and not to punish you. Microsoft is sliding into irrelevance anyway. IE9 wasn’t adopted and probably never will, so maybe MS manages to push IE 10 (including support of the needed feature here) to more than a couple of their users…
Once this project matures it will be an inspiration for similar projects anyway, so you’ll benefit from it in any case…
Short version: Shut up! ;-)
Could you please set up a Twitter account for this project and give updates?
We already have that in place! You can follow @pdfjs.
@pdfjs is carries the commit log. It’d be helpful to have a feed for major updates, etc…
HA HA HA … another open source project that was done where I will take the code and use it in a consulting gig to make thousands of dollars and no one who made the code will get a dime … I LOVE OPEN SOURCE
You are welcome to use our source. We appreciate if you contribute changes back to us, but its not required. Good luck with your project!
>It’s important to note that we’re not trying to promote PDF to a first-class web citizen like HTML5 is.
Actually, PDF is referenced numerous times in the normative portion of the HTML5 specification – thus by definition it ALREADY IS a first class web citizen in the same way that JPEG, PNG and GIF are.
Pingback: Firefox soportará el estándar PDF de forma nativa, utilizando HTML 5 y Javascript | NotiGeek
Pingback: Firefox soportará el estándar PDF de forma nativa, utilizando HTML 5 y Javascript | Las mejores web en español
Pingback: Ezek után ne mondja senki, hogy a HTML5 nem hoz valódi újítást | szimpatikus.hu trackback proxy
Where’s the donate (buy me a beer) link? This effort deserves it!
https://donate.mozilla.org/page/contribute/openwebfund
Pingback: HTML5 PDF reader now under development | The Digital Reader
Pingback: Firefox soportará el estándar PDF de forma nativa, utilizando HTML 5 y Javascript |
Pingback: Firefox soportará el estándar PDF de forma nativa, utilizando HTML 5 y Javascript | Linkeando: La Isla Buscada
Pingback: Mozilla developing Firefox PDF renderer | HitechNews
Pingback: Mozilla developing Firefox PDF renderer - SHADI HANIA
Would this allow PDFs to be more easily indexed by Google?
No, I think for that server-side PDF decoding works best.
Are there any plans for defining an API to produce PDFs in JavaScript too? I’d imagine that a lot of the code could be shared, and such functionality would be highly useful. Perhaps it could be written in a way that is not browser dependent, making it work in other environments, such as Node too.
I am aware of jsPDF, but its support of PDF features is very basic, and development seems to have come to a halt:
http://code.google.com/p/jspdf/
I think there are already a few JS libraries for that. One of them is called jspdf.
Have you tried PDFKit? I don’t know how much work it’d take to get it running in the browser, but that’s an actively developed Node.js PDF-generation library that seems pretty powerful.
Oops. Forgot the link.
http://devongovett.github.com/pdfkit/
Pingback: View a PDF in the Browser using JavaScript
Pingback: links for 2011-06-16 « 個人的な雑記
Pingback: IT Business Reviews » Archive » HTML5 Week in Review
I like the idea, but I think that this will stay just that – a demo. Developing a full real-world PDF viewer in JS is not practical due to:
a) complexity,
b) accuracy,
c) performance issues, and
d) browser incompatibilities.
PDF supports stuff such as JBIG2, JPEG2000, CCITTFax, AES, RC2, Type1, CID, CFF, MMaster, 14 Blend Modes (not supported by canvas), Trans groups, ICC profiles, huge images, huge-files, Tensor/Coons/Gouraud – even radial shading etc. Then there is text selection and gazzilion of other issues that I can’t think of on top of my head.
There are several C++ implementations of PDF renderers out there. I think we have shown that JS performs roughly as well as C++ for this task, so the rest is hard work and software engineering.
I don’t think you did :) A blank page would render even faster! At the same time I will not try to disprove you since you will have to find it for yourself. In any case I fully support your project!
Pingback: Top Posts — WordPress.com
how is it different from http://trapeze.xyrka.com/ ?
Pingback: Рендеринг PDF с помощью HTML5 и JavaScript (pdf.js) | Группа компаний «Дэвар»
Pingback: pdf.js: Rendering PDF with HTML5 and JavaScript | webtoolkit4.me
I just happened to stumble on this on github today and I must say that I am really excited by where this project is in only a month of development, and even more excited by where it may be in a few more. Keep up the great work and know you have a bunch of positive support behind you!
Looks great so far, our ear is definitely to the ground on this one over here at sociomantic labs GmbH in Berlin, Germany. Keep up the good work and keep all us eager developers posted!
Pingback: HTML5 Week in Review | Ugli
Pingback: Mozilla sta lavorando su una tecnologia per aprire i PDF all’interno del browser » Italia SW
Pingback: Mozilla planche sur un lecteur PDF en HTML5 pour Firefox | Allomonsite.com
Would love to see IMG and other contexts that take a SRC attribute be able to display a PDF!
According to the HTML5 specification section 4.8.1 (‘img’) where it clearly states:
Images can thus be static bitmaps (e.g. PNGs, GIFs, JPEGs), single-page vector documents (single-page PDFs, XML files with an SVG root element), animated bitmaps (APNGs, animated GIFs), animated vector graphics (XML files with an SVG root element that use declarative SMIL animation), and so forth.
Some browsers already support the use of PDF in this way while others do not.
Some browsers… which? And what should happen for a multi-page PDF — error, or truncate to single page?
Ideally one could control what page was displayed with a URL hash, e.g.
src=”doc.pdf#22″
We added #page hashes already. Check out the multi-page viewer.
Pingback: Renderizar PDF en Firefox con HTML 5 | TICbeat
Pingback: Renderizar PDF en Firefox con HTML 5 |
Pingback: Mozilla Developing In-Browser PDF Viewer in HTML5, JavaScript | PCE Groups, LLC
Pingback: Mozilla Developing In-Browser PDF Viewer in HTML5, JavaScript : Test Drive
Pingback: Mozilla to add built-in PDF viewer to Firefox « BT Consultare
Pingback: Mozilla Firefox incorporará un lector de PDF en su Navegador « .::Practicando Perú
Pingback: Mozilla vuole visualizzare i documenti in PDF con HTML5 e JavaScript | Indipedia – Indipendenti nella rete
Pingback: Mozilla to add built-in PDF viewer to Firefox - dotmem.com | random technology news, one post at a time
That looks very exciting
What about PDF fillable forms and PDF fillable forms with automatic calculation ?
It seems that poppler doesn’t handle that beceause it doesn’t handle Javascript… (1)
Does it mean 1°) that pdf.js could handle it and 2°) that popler could use pdf.js to this goal ?
Thanks
(1) see https://bugs.freedesktop.org/show_bug.cgi?id=14433 from https://bugzilla.gnome.org/show_bug.cgi?id=480324
Pingback: Mozilla planche sur un lecteur PDF en HTML5 pour Firefox | Global-SSII
Pingback: E-TechNews - Mozilla to add built-in PDF viewer to Firefox
rendering pdf files inside a browser by default should be punished with dead penalty for the person responsible for that decision.
Pingback: Mozilla planche sur un lecteur PDF en HTML5 pour Firefox | Tutoriel HTML 5
ff22: “Mozilla” never complained about proprietary IE extensions that were based on draft standards, or proposed for standardization because they filled a needed gap (offsetLeft, e.g.). IE4 had some good stuff in it, and shame on Microsoft and Netscape for not standardizing it. At least, that has been my position since mozilla.org was founded.
XHR is another example. Not the greatest API design but it filled a needed gap, and Mozilla implemented it early, well before any standard was proposed.
Where Microsoft went wrong was (a) abusing its monopoly, for which it was convicted in the U.S.; (b) not de-jure standardizing the stuff it pushed into de-facto standards; (c) not collaborating in standards bodies very well (this covers Ecma until the Harmony era, as well as w3c).
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-canvas-element.html#canvaspixelarray may or may not be usable by pdf.js. It imposes rounding overhead and that also creates a semantic difference from power-of-two modular chopping used by the WebGL typed arrays.
Anyway, typed arrays are a de-facto standard rumored to be coming to IE, so setlle down. Standards are made by experimenting with *non-proprietary* (open draft spec) but also *not yet standardized* (in order to avoid design by committee and premature standardization) extensions.
/be
Pingback: Linux Blog » Blog Archive » V Mozille pracují na PDF prohlížeči v HTML5 pro Firefox
It seems that my former rmessage has not been saved, here it is :
Thank you for this initiative that looks promising.
Will pdf.js be able to handle PDF fillable forms (and also PDF fillable forms with automatic calculation) ?
If yes, could that benefit to poppler ? There is a bug opened againt poppler concerning this : https://bugs.freedesktop.org/show_bug.cgi?id=14433
(from evince https://bugzilla.gnome.org/show_bug.cgi?id=480324 )
Thanks
Pingback: Mozilla Developing In-Browser PDF Viewer in HTML5, JavaScript | dblogz
Pingback: Mozilla Developing In-Browser PDF Viewer in HTML5, JavaScript | _-_-ForumPKOnline-_-_
Pingback: Firefox krijgt ingebouwde pdf lezer | Software Labo
Pingback: TTMMHTM: iPad talks, Rainbows, TED talks | Christian Heilmann's blog – Wait till I come!
Pingback: Firefox trabajando en un lector de PDF sin plugin - VitaminaWEB.com
Pingback: Mozilla développe un lecteur PDF en HTML5 et JavaScript pour Firefox LaptopSpirit.fr - PC Portable, Ultraportables, Netbooks, UMPC et mobilité
Pingback: IT Secure Site » Firefox to Add Built-In PDF Display Framework
Pingback: Web Solutions Area » Рендеринг PDF файлов с помощью HTML5 и JavaScript – pdf.js
Pingback: Mozilla to add built-in PDF viewer to Firefox | Joomla Showcase : CSS Showcase
>An HTML5-based implementation is completely immune to this class of problems.
Uh. Are you serious? How many known security bugs does SpiderMonkey have?
>Initially we will make a Firefox extension available to interested users
>that enables inline PDF rendering using pdf.js
Please do, it will allow those of us which don’t use nightly FF builds to try it.
Pingback: SWL-Projekt » TTMMHTM: iPad talks, Rainbows, TED talks
Also pdf.js is a bad name, there’s already a project with this name. viewpdf.js might have been a better name.
Pingback: links for 2011-06-22 « Webデザインリンク集 Webデザインポータルサイト S5-Style
Hi Andreas,
When I test it in Chrome, the text is messed up in strange charts. Is this a known problem and is or will it be fixed?
Great open solution source btw :D!!
grtz.
Yeah, we are bringing up test-coverage on chrome, that should help track down regressions. Currently most of us use Firefox for obvious reasons.
I am a little concerned about one point.
The reason that many sites use PDFs is for “pixel perfect” printing.
I know that there are ways to achieve this with HTML but it is still not as reliable and portable as PDF.
I opened the demo in FF5 in a windows VM and it is not rendering text. Lack of or poor hardware accel. might be an issue I guess?
However the print preview is still far from what I would expect.
I would challenge you to get this part right, esp. if you want to make it part of FF by default.
It would be hard for me to continue recommending FF to clients if PDF Printing was not of an acceptable quality and I suspect many others would be in the same position.
We aren’t anywhere near printing support yet. Our plan is to use SVG for this, and we will have to work on the FF printing code a bit to make sure you can print without FF adding headers and footers. Its not a particularly hard or exciting problem, we just didn’t get to it yet.
Pingback: Mozilla move spotlights PDF’s ascent on the Web | Get News, Articles and other Informations
Pingback: Mozilla move spotlights PDF’s ascent on the Web | Tech News
Pingback: Mozilla move spotlights PDF’s ascent on the Web | Christian Media Cross
Pingback: Mozilla move spotlights PDF’s ascent on the Web
Pingback: Mozilla eyes hassle-free PDFs on the Web | Get News, Articles and other Informations
Pingback: Mozilla eyes hassle-free PDFs on the Web - dotmem.com | random technology news, one post at a time
Hiya Andreas — I talked to Shaon about this earlier, but an interesting approach taken at Adobe was to write a LLVM backend, Posix API, etc., to port existing C/C++ (they demo’d Doom, Quake, etc.). From there, they got the Python interpreter and python programs. Doing the same here, where the focus is on rendering API (e.g., proxy OpenGL calls to WebGL), would be interesting and have much more of a legacy format reach. (Though, as seen with Scribd and others, supporting documents is useful!)
Either way, good luck :)
Hey Leo. We actually tried that. Google for emscripten. Alon compiler poppler with that. It does work, but its megabytes of code, and its really slow (seconds to minutes to render something). Our work translates a high level API (PDF) to a high level API (canvas). poppler does pixel pushing and font rasterization, which is a much more computationally intensive task, and it runs in an emulated environment, so its a lot slower. But its definitely a possible way of doing it, and for some really rarely used formats it might be quite applicable.
Pingback: Mozilla eyes hassle-free PDFs on the Web | Mohinder's Blog
Pingback: Download Mozilla eyes hassle-free PDFs on the Web | Letshare.it blog
Pingback: Mozilla eyes hassle-free PDFs on the Web | RegionalForward.info
Pingback: Mozilla eyes hassle-free PDFs on the Web | Christian Media Cross
Pingback: Mozilla eyes hassle-free PDFs on the Web | Source Of Drivers
Pingback: Un lecteur PDF utilisant des standards du web | Korben
Pingback: Les PDF sont tentés par HTML5 et JavaScript : PDF.js projet de la fondation Mozilla
Pingback: Swiffy; Convert SWF to SVG++ | FunctionSource Development
Two needed features:
1. The ability to fill-in PDF forms (as previously mentioned by avatar4antistress).
2. The ability to print PDF documents without alteration as to size, line breaks, fonts, etc, as required by the U.S. IRS for tax forms.
Pingback: Linkcsokor – június 29. | hdesign
Pingback: Links 06/2011 | Per Anhalter durch die Technik
Pingback: pdf.js, VisualSearch.js, Voice Debugger
That’s quite impressive. The usability of your JS renderer is for simple documents already at least on par with Acrobat Reader (startup time, keyboard focus/keyboard navigation issues, …) I’m looking forward for better search functionality, though.
Pingback: PDF-Rendering im Browser « Browser Fuchs
What about selection mapping? The interface knows which page is being viewed, so it presumably knows which letter boxes fall into the selected area. This can be mapped back to a selection via a selection.createRange on the underlying textual data. (unless PDF.js ‘forgets’ the original text, but then how are we to query the PDF object for its delicious content? =)
This will require taking phantom instructions into account, where the text selected may not necessarily be the text as it gets shown in the pdf (I use ghost text to effect copy-pasting of Japanese phrases without the furigana, phonetic guide text, that is visible in the pdf on grammar.nihongoresources.com).
Also, I have to say the idea of an SVG version doesn’t sound very appealing…SVG is again a graphics template that happens to also be able to do text, much like canvas (except canvas has even less of an idea what text is, thanks to the Canvas 2D API committee). What about an XHTML+CSS3 version, instead? Since ultimately it’s about arranging boxes of boxes, there’s no inherent problem other than the graphics (for which canvas is of course entirely intended)
Pingback: pdf.js: Ist dies der erste Tag vom Ende des Adobe Reader?
Pingback: PDF-Rendering im Browser | Thomas Joss
Excuse me for a moment while I have a nerdgasm …
This is sooo frickin’ cool!!!!!!!!!
Thank. I feel better now. :-)
Pingback: Bruce Lawson’s personal site : Reading list
This version does not support Cyrillic documents.
May be you fixed this in some of next releases?
Can you send us a document and a screenshot what it should look like? gal at mozilla dot com Thanks!
Pingback: Mozilla prepara lector nativo de PDF para Firefox « Conocimiento Libre (o lo que está detrás del Software Libre)
To Johan Geerts and Mark: regarding Safari and the demo here: http://people.mozilla.org/~gal/test.html, it seems that the canvas is appearing over the page nav toolbar and that the first page of this PDF is initially rendering blank (white). If you zoom in/out (cmd +) or make Safari’s browser window very wide, you’ll see the nav tool bar at the top of the page, just under the top of the canvas. You can navigate to the next page (which renders correctly) and then back to page 1 and it will be rendered correctly. There is another version of this demo posted here: http://karthikk.net/projects/pdfjs/web/viewer.html that doesn’t exhibit the initial blank page problem with Safari. However, the issue with the canvas appearing over the nav toolbar happens on both pages.
I’d like to second what Daniel Hendrycks said, could we have a twitter feed for major updates? Firefox doing PDF rendering is important to me (I’m phd student so im forever opening and closing pdfs in journals. I’ve been using FF in 32-bit mode on OSX just so I can continue to use this: http://code.google.com/p/firefox-mac-pdf/updates/list), but I don’t need the detail of the commit log…
Question, are you aware of similar effort to also render MS Word and other Office formats/documents in the same manner?
I don’t think so. But there are native code projects that do that, and I am sure the approach would work there as well.
Assuming license compatibility isn’t an issue (It’s AGPLed), there’s WebODF for rendering OpenDocument files (text, spreadsheet, and presentation, last I checked) in the browser.
http://www.webodf.org/
My company has been working on a Flash-based PDF reader for the last few months, so we can appreciate the challenges you’re facing! Especially when it comes to decoders such as CCITT, JBIG and JPX… code which is just not available for JavaScript/ActionScript/HaXe. We’re rendering 99.9% of PDFs “pixel perfect”, the challenge is mostly performance and getting the code to parse/decode/render as quickly as possible.
We’d be happy to contribute our CCITT decoder to your effort, it’s written in Haxe so you’ll need to port.
Keep up the great work!
Pingback: Mozilla inicia la creación de su propio sistema operativo — ALT1040
Pingback: Mozilla inicia la creación de su propio sistema operativo | La Isla Buscada
Pingback: Mozilla inicia la creación de su propio sistema operativo | R-NET
Pingback: Blog Avizpado: Mozilla trabaja en su propio Sistema Operativo móvil
Pingback: Mozilla trabaja en su propio sistema operativo móvil - ENTER.CO
nice and usefull article, thanks.
Pingback: Render PDF files with HTML5 and JavaScript using PDF.JS | World Wide What?
question about the implamentation of the pdf.js.
Would it require it’s own page or could I render a pdf in say a div?
(looks awesome by the way. pretty exciting.)
In an iframe works, in a div using innerHTML … maybe with some changes.
Pingback: Mozilla inicia la creación de su propio sistema operativo | NoticiaGeek
Pingback: My most important Twitter Messages #11 - Flash, Arduino, Programming | der hess
Its very exciting I think. Personally I like Opera browser. But I know nothing about Html. I’ve found interesting information about that. Here tutapoint.com Review
Looks like a great project, and I can see some great applications for my project. Just reviewed the main javascript file. Any chance you can keep the code within a single namespace?
We just added that.
Hi, this is really interesting idea. This is great!!
How can i donate for this project?
You can support the mozilla foundation if you would like to donate, or help us build it on github!
Pingback: Mozilla to add built-in PDF viewer to Firefox | PDF World
Hello!
It is an amazing tool!
Taking a look on github I can see a crypto.js file, does it means pdf.js will have support for encrypted files? or tranmitting them by a secured chanel?
Thank you for this big effort :)
Hello! This post could not be written any better! Reading through this post reminds me of my previous room mate! He always kept talking about this. I will forward this article to him. Fairly certain he will have a good read. Thanks for sharing!
Do you intend to display PDF signatures, and validate them, in your renderer? Signed PDFs are a great mean so “virtualize” contracts and many non-repudiable documents. I wish that when a web application delivers a signed PDF to the user, the browser validates and displays the signature in addition to the printable PDF content.
PDF signatures aren’t very reliable, so I am not sure this is an urgent feature for the mozilla team, but its open-source. If someone submits a patch, we would definitely take it.
Very interesting and I moves us even closer to HTML as the platform if real use cases. Of course the reason for PDFs is to print not view (or should be) but as so much content is in PDF this will be a great boon.
The demo link is 404.
Pingback: An afternoon with pdf.js and CubicVR.js
Pingback: PDF Rendering with HTML5 | Epiphia
Pingback: Weekly Links– 2010_34 (50 for Web Devs & Other Geeks) :MisfitGeek (Joe Stagner)
Pingback: Advanced php, PDF to HTML | Gravity Layouts
As much as I detest the PDF format, and feel that it is inappropriate for the web, I have to admit that this is useful and pretty cool. ;)
Pingback: PDF Dateien mit HTML5 und JavaScript darstellen | Html5 Tipps
Hi! I’ve done a port of this code so I can read pdfs pulled from a Fedora repository on a Drupal site – works well in all modern browsers, but the file isn’t loaded on my iPad. Are there known problems with iPad?
We don’t do automated tests with mobile safari but feel free to file a bug on github
well that’s pretty useful for me. I am thinking of deceloping a html5 web application for ebooks ..following you github..
Pingback: What is PDF.js? Will it be Integrated into Firefox? – Biggest Headline News Update Daily
T^hanks for telling us about the benefits Rendering PDF with HTML5 and JavaScript.this tutorial contain great information.thanks for sharing
Pingback: On the joys of 1.0
Pingback: Firefox PDF.js Extension To Open PDF Documents Natively
Pingback: Firefox PDF.js Extension To Open PDF Documents Natively — Techitech
Pingback: Ouvrir un PDF dans un onglet Firefox | Korben
Pingback: PDF.js : Un lecteur pdf open-source en html5 | Firasofting
Pingback: ASCII by Jason Scott / Javascript Hero: Change Computer History Forever
Pingback: Times are a’changing « /var/log/rouvas
Would the Google crawlers be able to read the content for relevant words, for SEO purposes? Everything I’ve read before is if you have a PDF as a download, you also need to duplicate the content of the PDF in your code because web crawlers can’t get to it.
Pingback: مجتمع موزيلا يطور قارئ PDF خاص بها
Pingback: pdf.js: Rendering PDF with HTML5 and JavaScript | iamfriendly.com
I’d love to check out the demo, but I’m getting a 404 error on the demo link: http://people.mozilla.org/~gal/test.html
http://andreasgal.github.com/pdf.js/web/viewer.html is the new demo page
Hi Andreas,
You may wish to update the link in the following sentence above: “If you want to see a demo of pdf.js, click on this link.”
Best,
I installed the latest version on FFb4 from here
http://andreasgal.github.com/pdf.js/extensions/firefox/pdf.js.xpi
While it installed fine it rendered pdf’s blank. Is that because I’m on a MAC?
I use a mac as well so that can’t be the reason.
I used the latest pdf.js.xpi on FF8b4. While it works on many pdf’s it doesn’t work on all. For example,
Click to access bulletin_20111023.pdf
Note: That PDF that won’t render
Click to access bulletin_20111023.pdf
was created in Adobe Publisher.
Pingback: Blog de JPilldev » Enlaces recomendados 25 de Febrero de 2012
Hi!
Thanks for this awesome tool!
But in my tests I have found an issue…
When is used a pdf created in Open Office, the viewer shows a corrupted text while a pdf created with MS Office works fine. I used the same font type in both files (Time News Roman).
I noticed that the Font objects created for each file are different: the encoding for alphabet(a-z A-Z) is missing in the OpenOffice file.
I’m posting links for test replication:
– http://dl.dropbox.com/u/26617411/object_Font_in_MS_Office.docx
– http://dl.dropbox.com/u/26617411/object_Font_in_OpenOffice.docx
– http://dl.dropbox.com/u/26617411/MSoffice_pdf.pdf
– http://dl.dropbox.com/u/26617411/OpenOffice_pdf.pdf
– http://dl.dropbox.com/u/26617411/error_image.PNG
PS.: forgive my poor English…
Wonderful. It looks great and is perfect for building web apps.