|
| 1 | +--- |
| 2 | +title: Traversing with Pagination | GitHub API |
| 3 | +--- |
| 4 | + |
| 5 | +# Traversing with Pagination |
| 6 | + |
| 7 | +* TOC |
| 8 | +{:toc} |
| 9 | + |
| 10 | +The GitHub API provides a vast wealth of information for developers to consume. |
| 11 | +Most of the time, you might even find that you're asking for _too much_ information, |
| 12 | +and in order to keep our servers happy, the API will automatically [paginate the requested items][pagination]. |
| 13 | + |
| 14 | +In this guide, we'll make some calls to the GitHub Search API, and iterate over |
| 15 | +the results using pagination. You can find the complete source code for this project |
| 16 | +in the [platform-samples][platform samples] repository. |
| 17 | + |
| 18 | +## Basics of Pagination |
| 19 | + |
| 20 | +To start with, it's important to know a few facts about receiving paginated items: |
| 21 | + |
| 22 | +1. Different API calls respond with different defaults. For example, a call to |
| 23 | +[list GitHub's public repositories](http://developer.github.com/v3/repos/#list-all-public-repositories) |
| 24 | +provides paginated items in sets of 30, whereas a call to the GitHub Search API |
| 25 | +provides items in sets of 100 |
| 26 | +2. You can specify how many items to receive (up to a maximum of 100); but, |
| 27 | +3. For technical reasons, not every endpoint behaves the same. For example, |
| 28 | +[events](http://developer.github.com/v3/activity/events/) won't let you set a maximum for items to receive. |
| 29 | +Be sure to read the documentation on how to handle paginated results for specific endpoints. |
| 30 | + |
| 31 | +Information about pagination is provided in [the Link header](http://tools.ietf.org/html/rfc5988) |
| 32 | +of an API call. For example, let's make a curl request to the search API, to find |
| 33 | +out how many times Mozilla projects use the phrase `addClass`: |
| 34 | + |
| 35 | + curl -I "https://api.github.com/search/code?q=addClass+user:mozilla" |
| 36 | + |
| 37 | +The `-I` parameter indicates that we only care about the headers, not the actual |
| 38 | +content. In examining the result, you'll notice some information in the Link header |
| 39 | +that looks like this: |
| 40 | + |
| 41 | + Link: <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2>; rel="next", |
| 42 | + <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>; rel="last" |
| 43 | + |
| 44 | +Let's break that down. `rel="next"` says that the next page is `page=2`. This makes |
| 45 | +sense, since by default, all paginated queries start at page `1.` `rel="last"` |
| 46 | +provides some more information, stating that the last page of results is on page `34`. |
| 47 | +Thus, we have 33 more pages of information about `addClass` that we can consume. |
| 48 | +Nice! |
| 49 | + |
| 50 | +Keep in mind that you should **always** rely on these link relations provided |
| 51 | +to you. Don't try to guess or construct your own URL. Some API calls, like [listing |
| 52 | +commits on a repository][listing commits], use pagination results that are based |
| 53 | +on SHA values, not numbers. |
| 54 | + |
| 55 | +### Navigating through the pages |
| 56 | + |
| 57 | +Now that you know how many pages there are to receive, you can start navigating |
| 58 | +through the pages to consume the results. You do this by passing in a `page` |
| 59 | +parameter. By default, `page` always starts at `1`. Let's jump ahead to page 14 |
| 60 | +and see what happens: |
| 61 | + |
| 62 | + curl -I "https://api.github.com/search/code?q=addClass+user:mozilla&page=14" |
| 63 | + |
| 64 | +Here's the link header once more: |
| 65 | + |
| 66 | + Link: <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=15>; rel="next", |
| 67 | + <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>; rel="last", |
| 68 | + <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=1>; rel="first", |
| 69 | + <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=13>; rel="prev" |
| 70 | + |
| 71 | +As expected, `rel="next"` is at 15, and `rel="last"` is still 34. But now we've |
| 72 | +got some more information: `rel="first"` indicates the URL for the _first_ page, |
| 73 | +and more importantly, `rel="prev"` lets you know the page number of the previous |
| 74 | +page. Using this information, you could construct some UI that lets users jump |
| 75 | +between the first, previous, next, or last list of results in an API call. |
| 76 | + |
| 77 | +### Changing the number of items received |
| 78 | + |
| 79 | +By passing the `per_page` parameter, you can specify how many items you want |
| 80 | +each page to return, up to 100 items. Let's try asking for 50 items about `addClass`: |
| 81 | + |
| 82 | + curl -I "https://api.github.com/search/code?q=addClass+user:mozilla&per_page=50" |
| 83 | + |
| 84 | +Notice what it does to the header response: |
| 85 | + |
| 86 | + Link: <https://api.github.com/search/code?q=addClass+user%3Amozilla&per_page=50&page=2>; rel="next", |
| 87 | + <https://api.github.com/search/code?q=addClass+user%3Amozilla&per_page=50&page=20>; rel="last" |
| 88 | + |
| 89 | +As you might have guessed, the `rel="last"` information says that the last page |
| 90 | +is now 20. This is because we are asking for more information per page about |
| 91 | +our results. |
| 92 | + |
| 93 | +## Consuming the information |
| 94 | + |
| 95 | +You don't want to be making low-level curl calls just to be able to work with |
| 96 | +pagination, so let's write a little Ruby script that does everything we've |
| 97 | +just described above. |
| 98 | + |
| 99 | +As always, first we'll require [GitHub's Octokit.rb][octokit.rb] Ruby library, and |
| 100 | +pass in our [personal access token][personal token]: |
| 101 | + |
| 102 | + #!ruby |
| 103 | + require 'octokit' |
| 104 | + |
| 105 | + # !!! DO NOT EVER USE HARD-CODED VALUES IN A REAL APP !!! |
| 106 | + # Instead, set and test environment variables, like below |
| 107 | + client = Octokit::Client.new :access_token => ENV['MY_PERSONAL_TOKEN'] |
| 108 | + |
| 109 | +Next, we'll execute the search, using Octokit's `search_code` method. Unlike |
| 110 | +using `curl`, we can also immediately retrieve the number of results, so let's |
| 111 | +do that: |
| 112 | + |
| 113 | + #!ruby |
| 114 | + results = client.search_code('addClass user:mozilla') |
| 115 | + total_count = results.total_count |
| 116 | + |
| 117 | +Now, let's grab the number of the last page, similar to `page=34>; rel="last"` |
| 118 | +information in the link header. Octokit.rb support pagination information through |
| 119 | +an implementation called "[Hypermedia link relations][hypermedia-relations]." |
| 120 | +We won't go into detail about what that is, but, suffice to say, each element |
| 121 | +in the `results` variable has a hash called `rels`, which can contain information |
| 122 | +about `:next`, `:last`, `:first`, and `:prev`, depending on which result you're |
| 123 | +on. These relations also contain information about the resulting URL, by calling |
| 124 | +`rels[:last].href`. |
| 125 | + |
| 126 | +Knowing this, let's grab the page number of the last result, and present all |
| 127 | +this information to the user: |
| 128 | + |
| 129 | + #!ruby |
| 130 | + last_response = client.last_response |
| 131 | + number_of_pages = last_response.rels[:last].href.match(/page=(\d+)$/)[1] |
| 132 | + |
| 133 | + puts "There are #{total_count} results, on #{number_of_pages} pages!" |
| 134 | + |
| 135 | +Finally, let's iterate through the results. You could do this with a loop `for i in 1..number_of_pages.to_i`, |
| 136 | +but instead, let's follow the `rels[:next]` headers to retrieve information from |
| 137 | +each page. For the sake of simplicity, let's just grab the file path of the first |
| 138 | +result from each page. To do this, we'll need a loop; and at the end of every loop, |
| 139 | +we'll retrieve the data set for the next page by following the `rels[:next]` information. |
| 140 | +The loop will finish when there is no `rels[:next]` information to consume (in other |
| 141 | +words, we are at `rels[:last]`). It might look something like this: |
| 142 | + |
| 143 | + #!ruby |
| 144 | + loop do |
| 145 | + puts last_response.data.items.first.path |
| 146 | + last_response = last_response.rels[:next].get |
| 147 | + sleep 4 # back off from the API rate limiting; don't do this in Real Life |
| 148 | + break if last_response.rels[:next].nil? |
| 149 | + end |
| 150 | + |
| 151 | +Changing the number of items per page is extremely simple with Octokit.rb. Simply |
| 152 | +pass a `per_page` options hash to the initial client construction. After that, |
| 153 | +your code should remain intact: |
| 154 | + |
| 155 | + #!ruby |
| 156 | + require 'octokit' |
| 157 | + |
| 158 | + # !!! DO NOT EVER USE HARD-CODED VALUES IN A REAL APP !!! |
| 159 | + # Instead, set and test environment variables, like below |
| 160 | + client = Octokit::Client.new :access_token => ENV['MY_PERSONAL_TOKEN'] |
| 161 | + |
| 162 | + results = client.search_code('addClass user:mozilla', :per_page => 100) |
| 163 | + total_count = results.total_count |
| 164 | + |
| 165 | + last_response = client.last_response |
| 166 | + number_of_pages = last_response.rels[:last].href.match(/page=(\d+)$/)[1] |
| 167 | + |
| 168 | + puts last_response.rels[:last].href |
| 169 | + puts "There are #{total_count} results, on #{number_of_pages} pages!" |
| 170 | + |
| 171 | + puts "And here's the first path for every set" |
| 172 | + |
| 173 | + loop do |
| 174 | + puts last_response.data.items.first.path |
| 175 | + last_response = last_response.rels[:next].get |
| 176 | + sleep 4 # back off from the API rate limiting; don't do this in Real Life |
| 177 | + break if last_response.rels[:next].nil? |
| 178 | + end |
| 179 | + |
| 180 | +## Constructing Pagination Links |
| 181 | + |
| 182 | +Normally, with pagination, your goal isn't to concatenate all of the possible |
| 183 | +results, but rather, to produce a set of navigation, like this: |
| 184 | + |
| 185 | + |
| 186 | + |
| 187 | +Let's sketch out a micro-version of what that might entail. |
| 188 | + |
| 189 | +From the code above, we already know we can get the `number_of_pages` in the |
| 190 | +paginated results from the first call: |
| 191 | + |
| 192 | + #!ruby |
| 193 | + require 'octokit' |
| 194 | + |
| 195 | + # !!! DO NOT EVER USE HARD-CODED VALUES IN A REAL APP !!! |
| 196 | + # Instead, set and test environment variables, like below |
| 197 | + client = Octokit::Client.new :access_token => ENV['MY_PERSONAL_TOKEN'] |
| 198 | + |
| 199 | + results = client.search_code('addClass user:mozilla') |
| 200 | + total_count = results.total_count |
| 201 | + |
| 202 | + last_response = client.last_response |
| 203 | + number_of_pages = last_response.rels[:last].href.match(/page=(\d+)$/)[1] |
| 204 | + |
| 205 | + puts last_response.rels[:last].href |
| 206 | + puts "There are #{total_count} results, on #{number_of_pages} pages!" |
| 207 | + |
| 208 | + |
| 209 | +From there, we can construct a beautiful ASCII representation of the number boxes: |
| 210 | + |
| 211 | + #!ruby |
| 212 | + numbers = "" |
| 213 | + for i in 1..number_of_pages.to_i |
| 214 | + numbers << "[#{i}] " |
| 215 | + end |
| 216 | + puts numbers |
| 217 | + |
| 218 | +Let's simulate a user clicking on one of these boxes, by constructing a random |
| 219 | +number: |
| 220 | + |
| 221 | + #!ruby |
| 222 | + random_page = Random.new |
| 223 | + random_page = random_page.rand(1..number_of_pages.to_i) |
| 224 | + |
| 225 | + puts "A User appeared, and clicked number #{random_page}!" |
| 226 | + |
| 227 | +Now that we have a page number, we can use Octokit to explicitly retrieve that |
| 228 | +individual page, by passing the `:page` option: |
| 229 | + |
| 230 | + #!ruby |
| 231 | + clicked_results = client.search_code('addClass user:mozilla', :page => random_page) |
| 232 | + |
| 233 | +If we wanted to get fancy, we could also grab the previous and next pages, in |
| 234 | +order to generate links for back (`<<`) and foward (`>>`) elements: |
| 235 | + |
| 236 | + #!ruby |
| 237 | + prev_page_href = client.last_response.rels[:prev] ? client.last_response.rels[:prev].href : "(none)" |
| 238 | + next_page_href = client.last_response.rels[:next] ? client.last_response.rels[:next].href : "(none)" |
| 239 | + |
| 240 | + puts "The prev page link is #{prev_page_href}" |
| 241 | + puts "The next page link is #{next_page_href}" |
| 242 | + |
| 243 | +[pagination]: /v3/#pagination |
| 244 | +[platform samples]: https://github.com/github/platform-samples/tree/master/api/ruby/traversing-with-pagination |
| 245 | +[octokit.rb]: https://github.com/octokit/octokit.rb |
| 246 | +[personal token]: https://help.github.com/articles/creating-an-access-token-for-command-line-use |
| 247 | +[hypermedia-relations]: https://github.com/octokit/octokit.rb#pagination |
| 248 | +[listing commits]: http://developer.github.com/v3/repos/commits/#list-commits-on-a-repository |
0 commit comments