Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .Data.Headers in GetRemote #12521

Open
irkode opened this issue May 19, 2024 · 2 comments
Open

Add .Data.Headers in GetRemote #12521

irkode opened this issue May 19, 2024 · 2 comments
Assignees
Milestone

Comments

@irkode
Copy link

irkode commented May 19, 2024

Support Link Response fields in GetRemote Result object

Discourse 49894

Issue

Different APIs use different Response-Header fields to return additional data about the Response.

Especially paginating REST APIs (GitHub, GitLab guess there are many) used in combination with GetRemote are affected.

Hugo currently only provides access to the Content-Type filed through it's Data Interface in
resource Object returned from GetRemote

I would like to have see the Link Header fields for Pagination at the resource.

  • start
  • first
  • next
  • prev(ious)
  • last
  • self

Link Response Header Field (RFC 5988)

RFC has a complete grammar in section 5

Just the key points for the stuff used in pagination here:

Link                 = "Link" ":" #link-value
  link-value         = "<" URI-Reference ">" *( ";" link-param )
    link-param       = ( "rel" "=" relation-types )             <-- the are more keywords than "rel" and more types
      relation-types = relation-type
                 |     <"> relation-type *( 1*SP relation-type ) <">

relation types are defined in
section 622

commonly used types are

  • self - the link to the page itself
  • help - usually links to the api documentation
  • stylesheet - for XML delivery or transformation
  • start - slighty different definition than first (non native speaker) - and there is no end type
    ;-)

So we have multiple types in a rel field that may link to the same Url. And two different types
for the same (prev and previous) Hurray.

Example from GitHub

This is a one line string, newlines added for readability

Link:
<https://api.github.com/repositories/11180687/issues?page=1&per_page=2>; rel="prev",
<https://api.github.com/repositories/11180687/issues?page=3&per_page=2>; rel="next"
<https://api.github.com/repositories/11180687/issues?page=233&per_page=2>; rel="last",
<https://api.github.com/repositories/11180687/issues?page=1&per_page=2>; rel="first"
<https://www.example.com; rel="self">
<https://www.example.com; rel="self stylesheet">

Implementation thoughts

Implementation effort varies on the degree of Link attribute support

  1. only rel=
    1.2 all types
    1.3 list of types (minimal: next,prev(ios),first,last,start)
    2 complete Link RFC
    3 all or some other Headers

I would vote for 1.2 maybe 1.1 so it's full supported. Guess you have some Go libraries already in
use. The grammar for the Link attribute seems to regex capture handlebar.

Easy access

should be dead simple $resource.Links.First $resource.Links "first" ... no puzzling around with
walking a structure or index function.

Should not fail

If the link cannot be parsed completely it should not fail but store it in a String representation
maybe Resource.Header.LinkRaw ...

Field that may be worth adding

  • Content-Language might be interesting, so one can handle the process according to Hugo site
    languages.

Further considerations

For simplicity all fields exposed are directly available at the .Data field.

Could be moved to a .Data.Header storing the Response headers. Just one field currently for the
Response-Header, so a deprecation now and duplicate/link it to .Data.Headers. or an interface,
getter ... i'm not in Go. The new link Field might be widely used because of paginating APIs
utilized with GetRemote and Content Adapters. imho if there should be a change, early is better ;-)

Dunno if these will be a conflict candidate later but at lest a point to consider

Resources

@irkode
Copy link
Author

irkode commented May 20, 2024

a few hours of 😴 and a quit lesson 🧑‍🎓 in building Hugo

I would say that the most practical approach would be to add needed Fields only.

the Link is returned as unicoded string and to extract only the needed rel types it looks quite simple.
In Perl I would use a little regex magic to create fields for every "rel=" type for each link.
regex 101

<(.*?)>\s*;\s*.*?(rel="(first|last|next|(prev(:?ious)?)|start|self)(\s+(first|last|next|(prev(:?ious)?)|start|self))?").*?(,|$)

I could imagine that it would be good to have the unparsed Link stored, too.

.Data.Links [
  rawLink = res.Header.Get("Link")
  first = "https...  
  last = "https"
  ...
]

the presence of each link type is determined by the query result. So only available links are in the Response. The first page fe. wouldn't provide a previous link in the response header field

tested with

just added a res.Header.Get("Link") to responseToData and we get a Raw string looks like Unicode.

Dumped in a Template:

   {{ $github_rest_api_branches := "https://api.github.com/repos/gohugoio/hugo/branches?per_page=10&page=10" }}
   {{ $github_rest_options := dict
      "headers" (dict "Accept" "application/vnd.github+json" "Content-Type" "application/json")
   }}
   {{ with resources.GetRemote $github_rest_api_branches $github_rest_options }}
      {{ warnf (debug.Dump .Data) }}
   {{ end }}

will result in:

{
  "ContentLanguage": "",
  "ContentLength": -1,
  "ContentType": "application/json; charset=utf-8",
  "Link": "\u003chttps://api.github.com/repositories/11180687/issues?per_page=10\u0026page=9\u003e; rel=\"prev\", \u003chttps://api.github.com/repositories/11180687/issues?per_page=10\u0026page=11\u003e; rel=\"next\", \u003chttps://api.github.com/repositories/11180687/issues?per_page=10\u0026page=47\u003e; rel=\"last\", \u003chttps://api.github.com/repositories/11180687/issues?per_page=10\u0026page=1\u003e; rel=\"first\"",
  "Status": "200 OK",
  "StatusCode": 200,
  "TransferEncoding": null
}```

@bep bep changed the title Add pagination link header fields to the resource object returned by GetRemote Add .Data.Headers in GetRemote May 20, 2024
@bep bep added this to the v0.127.0 milestone May 20, 2024
@bep bep self-assigned this May 20, 2024
@irkode
Copy link
Author

irkode commented May 21, 2024

Just a thought - What about an Opt-In for complex header fields....

{{ $opts := dict
  "method" "post"
  "body" `{"complete": true}` 
  "headers" (dict  "Content-Type" "application/json")
  "responseHeaderFields= (slice "Links", ...)
}}

Will limit the performance impact if one has many Get-Remote Calls without need for links.
An opt-In would lower this and avoid a performance regression in current usage.

@bep bep modified the milestones: v0.127.0, v0.128.0 Jun 8, 2024
@bep bep modified the milestones: v0.128.0, v0.129.0 Jun 21, 2024
@bep bep modified the milestones: v0.129.0, v0.131.0 Jul 22, 2024
@bep bep modified the milestones: v0.131.0, v0.133.0 Aug 9, 2024
@bep bep modified the milestones: v0.133.0, Unscheduled Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants