Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessibility of in-browser PDF Viewer #6179

Open
brennanyoung opened this issue Feb 15, 2022 · 7 comments
Open

Accessibility of in-browser PDF Viewer #6179

brennanyoung opened this issue Feb 15, 2022 · 7 comments

Comments

@brennanyoung
Copy link

I welcome the appearance of data on in-browser PDF Viewers, resulting from #2212

Different browsers handle pdf in different ways, and the various PDF viewers have different capabilities, and this can have an impact on accessibility conformance.

AFAIK, only pdf.js used by Mozilla makes an effort to communicate PDF content to the accessibility tree (i.e. the subset of the DOM which is communicated to assistive tech). It makes a translation from PDF/UA tags to ARIA roles and attributes.

If people are deciding to use or not use PDF-in-browser on the basis of caniuse data, I believe they should be informed of different levels of accessibility support.

@LifeIsStrange
Copy link

LifeIsStrange commented Mar 4, 2022

Note that a website can embed PDF.js and it works on chromium browsers since it is an HTML5 renderer.
However most users probably prefer the default experience of their browser.

@brennanyoung
Copy link
Author

@LifeIsStrange That's exactly the kind of information that would be useful to see on caniuse.

BTW, pdf.js relies on aria-owns to construct an accessibility tree. Quite a cunning solution, except that aria-owns is not supported at all in Safari.

Given that Acrobat and Preview also fail to generate such a tree, this means that at time of writing there are no PDF viewers that run on any of Apple's platforms (inside or out of browser) which communicate the tree to the system level accessibility API.

This has an impact on the defacto portability of this nominally portable file format.

@Malvoz
Copy link
Contributor

Malvoz commented Apr 9, 2022

@brennanyoung
Copy link
Author

brennanyoung commented Sep 12, 2022

Just a FYI: if you open a semantically-well-formed HTML5 document in chrome, and print to PDF (using the default mechanism for this) you will get a nearly semantically-well-formed PDF/UA. Headings and lists are getting tagged correctly, at least.

However, there are still some issues - I reported several on the chromium bug database yesterday. Lots of bogus <NonStruct> tags are getting generated, which are relatively harmless (similar to role="generic").

Unfortunately several meaningful semantics such as article and section are also getting mapped to <NonStruct>, even tho PDF/UA has <Art> and <Sect> available.

Creating PDF is not really the bread-and-butter of caniuse, but this is a REALLY good development. It means that Chrome is a viable authoring tool for accessible PDF, which plays well with (e.g.) Acrobat and NVDA.

However, the default PDF view in Chrome does not seem to generate a proper accessibility tree at time of writing.

@LifeIsStrange
Copy link

LifeIsStrange commented Sep 12, 2022

@LifeIsStrange That's exactly the kind of information that would be useful to see on caniuse.

BTW, pdf.js relies on aria-owns to construct an accessibility tree. Quite a cunning solution, except that aria-owns is not supported at all in Safari.

Given that Acrobat and Preview also fail to generate such a tree, this means that at time of writing there are no PDF viewers that run on any of Apple's platforms (inside or out of browser) which communicate the tree to the system level accessibility API.

This has an impact on the defacto portability of this nominally portable file format.

@jensimmons friendly ping

@brennanyoung
Copy link
Author

Update - I'm having some success with semantic browsing in Preview and VoiceOver! Not sure what has changed or when. (The PDF document used matters a great deal, of course). I haven't seen any announcements from Apple about this feature.
Very obvious that things behave differently to the web, but at least there is a minimal implementation. I hope it will be fleshed out.

@brennanyoung
Copy link
Author

brennanyoung commented Sep 13, 2022

Sketching out a test profile for consumption (not authoring).

This will not be exhaustive, but it will get us moving. I'm using the nomenclature as it appears in Acrobat, or in the Tagged PDF Best Practice Guide
I've broken these into categories but the breakdown is open to adjustment. I imagine one test PDF per category, or something like that. (Please advise on the wisdom of this, or offer any suggestions for further enrichment/value).

I imagine each of these as pass/fail. A "pass" is if the AT announces the element and (if non-generic) the role. For tree exclusions, a "pass" is if the AT does not announce the content.

I expect that we will need to document/express partial pass (with remarks) in some cases, but we'll cross that bridge later.

Essential metadata (level A)

note: as in HTML, the Lang attribute may be applied to almost any other tag, including those with generic semantics such as Div and Span, so we should test for SC 3.1.2 "Language of Parts" too), especially with a mixed lang document. A "pass" here would be (e.g) for a screen reader's speech synth to use the correct phonemes (if available). Not sure if there are similar criteria that could be used for (e.g.) Braille devices. Advice welcome.

Basic Block Level Semantics

Required for WCAG SC 4.1.2: Name, Role, Value

  • Paragraph (<P>)
  • Heading (<H> and <H1>-<H6>)
  • Heading Level (<H1>-<H6>)
  • Lists, List Items, label and body (<L>, <LI>, <Lbl>, <LBody>)
  • Citation (<BlockQuote>)

Inline semantics

Required for WCAG SC 1.3.1: Info and Relationships and SC 4.1.2: Name, Role, Value.

  • Span (<Span>)
  • Inline quote (<Quot>)

Links, References and Annotations

Required for WCAG SC 1.3.1: Info and Relationships and SC 4.1.2: Name, Role, Value.

  • Hyperlinks (<Link>, OBJR)
  • Cross reference (<Reference>)
  • Footnote/Endnote (<Reference>, <Lbl>)
  • Annotation (<Annot>, OBJR)
  • Table of contents (<TOC>, <TOCI>)
  • Index (<Index> containing <L>)
    (we might consider an additional test for an AT making a successful "round trip" to and from the linked/referenced item)

Structural Semantics

Required for WCAG SC 1.3.1: Info and Relationships and SC 4.1.2: Name, Role, Value.

  • Generic wrapper (<Div>)
  • Data table basics (<Table>, <TR>, <TD>)
  • Data table extras (<THead>, <TH>, <TBody>, <TFoot>, TH scope attribute, TD ColSpan attribute)
  • Nested lists
  • Document structure (<Document>, <Part>, <Art>, <Sect>)

Text Alternatives

Required for WCAG WCAG SC 1.1.1: Non-text Content.

  • Captioned figure (<Figure>, <Caption>)
  • Captioned table (<Table>, <Caption>)
  • Alt text (Alt and ActualText attributes)
  • Expansion of abbreviations/acronyms ('E' attribute)

Exclusions from the Accessibility Tree

  • Open (<NonStruct>) i.e. excluded but does not hide contents from tree (≅ role="generic")
  • Closed (<Private>) i.e. excluded and hides its contents from tree (≅ aria-hidden="true")
  • Suppressed for readability (Artifact) i.e. AT may announce but not by default (somewhat similar to aria-details?)

To be considered/explained/understood before testing

  • Code (<Code>) Tagged PDF Best Practice Guide has low expectations about how ATs may handle this. Do we agree?
  • Bibliography (<BibEntry>) - presumably there are features for book metadata such as date, publisher, ISBN etc.?
  • Asian writing tags (<Ruby>, <RB>, <RT>, <RP>, <Warichu>, <WT>, <WP>)
  • Form elements (<Form> ... but what about operable elements? Needs a deep dive.)
  • Sidebar (has no explicit semantic in PDF/UA, but can be implied with certain structures. Should we test for this?)
  • We might want to test for other metadata such as Subject, Author and Keywords, but I don't know how ATs are supposed to handle those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants