サクサク読めて、アプリ限定の機能も多数!
トップへ戻る
iPhone 16e
hoffa.medium.com
With persistent UDF support in BigQuery, now you can: Create user-defined SQL and JavaScript functions.Reference these functions across queries and in logical views.Create org-wide libraries of business logic within shared datasets.For example, now anyone can call this number parsing function: SELECT fhoffa.x.parse_number('one hundred fifty seven') , fhoffa.x.parse_number('three point 5') , fhoffa
1 row in 3 seconds is not impressive. 5 billion rows in 56 seconds is. Let’s see how we went from a to b: BigQuery strengths: Throughput, not latencyIf you are expecting results in less than one second, BigQuery is not the right tool: Usually BigQuery won’t return results in less than one second. This because BigQuery was built for throughput, not for latency. A race car will always be faster than
Stack Overflow published an article analyzing the “top weekend programming languages”. One of their data scientists — Julia Silge — did an awesome job, but she only analyzed Stack Overflow tags. Many questions were raised on reddit and Hacker News, and I’m going to use data from GitHub’s commits to find them an answer. The top weekend languages 2016: The top weekend languages 2016. Source: GitHub+
The rules:Data source: GitHub files stored in BigQuery.Stars matter: We’ll only consider the top 400,000 repositories — by number of stars they got on GitHub during the period Jan-May 2016.No small files: Files need to have at least 10 lines that start with a space or a tab.No duplicates: Duplicate files only have one vote, regardless of how many repos they live in.One vote per file: Some files us
Some people have asked “can we analyze GitHub issues on BigQuery?”. Good news: While that data is not part of the new open source GitHub code on BigQuery dataset, you can find this in the classic GitHub Archive dataset. Let’s quickly find the 10 repos with the most comments in their issues: SELECT repo.name, COUNT(*) c FROM [githubarchive:month.201606] WHERE type IN ( 'IssueCommentEvent') GROUP BY
All the open source code in GitHub is now available in BigQuery. Go ahead, analyze it all. In this post you’ll find the related resources I know of so far: Update: I know I said all — but it’s not all. I’m updating the answers to these and other questions at github.com/fhoffa/analyzing_github. The pipeline mirrors code from: Projects that have a clear open source license.Forks and/or un-notable pr
The Google Analytics team just announced Data Studio: their free, new, Data Visualization Product. Read their post for more details, but since I just got a new load of reddit comments from /u/Stuck_in_the_Matrix loaded in BigQuery, I’m eager to show you how to connect both. The first surprise with Data Studio: It’s really easy to connect it to BigQuery: Add a data source, choose BigQuery.To work w
このページを最初にブックマークしてみませんか?
『hoffa.medium.com』の新着エントリーを見る
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く