Dataflowã®Java ã«ã¹ã¿ã ãã³ãã¬ã¼ãã®åå¿é²ã¨ãªãã¾ããpythonã®ãã³ãã¬ã¼ãä½ææ¹æ³ã¯æ¬¡ã®éãã§ãã
www.case-k.jp
ç°å¢æ§ç¯
ã¾ãã¯Mavenãã¤ã³ã¹ãã¼ã«ãã¦Javaããã°ã©ã ã®ãã«ãç°å¢ãä½ãã¾ãã
tar xzvf apache-maven-3.6.3-bin.tar
- Dockerfile
FROM google/cloud-sdk RUN apt-get update RUN apt-get install vim -y RUN echo export M3_HOME=/tmp/working/apache-maven-3.6.3 >> ~/.bash_profile RUN echo M3=$M3_HOME/bin >> ~/.bash_profile RUN echo export PATH=$M3:$PATH >> ~/.bash_profile RUN source ~/.bash_profile RUN gcloud config set project <project ID>
- docker-compose.yml
version: "3" services: cloudsdk: build: . tty: true container_name: cloudsdk volumes: - $PWD:/tmp/working working_dir: /tmp/working
Maven – Download Apache Maven
Maven – Installing Apache Maven
【超初心者向け】Maven超入門 - Qiita
よく使うMavenコマンド集 - Qiita
Java と Apache Maven を使用したクイックスタート | Cloud Dataflow | Google Cloud
ãã³ãã¬ã¼ãä½æ
ãã«ãç°å¢ãã§ããã®ã§ãã³ãã¬ã¼ããä½ãã¾ãã
ãµã³ãã«ãã³ãã¬ã¼ãä½æ
ãã³ãã¬ã¼ãæ©è½ã¯Apache Beam ã§ã¯ãªãDataflowç¬èªã®ãã®ã§ããããã±ã¼ã¸ã«ã¯ã-Dpackage=com.google.cloud.teleport.templates ããé©ç¨ããå¿
è¦ãããã¾ãã
次ã®ãã£ã¬ã¯ããªã«ããå
±éã¢ã¸ã¥ã¼ã«ãcommonãã®æ¹ãã³ãã³ãããå®è¡ããã¢ã¸ã¥ã¼ã«ããå
å®ãã¦ãã¾ãããmvnã§ããã±ã¼ã¸ãé¸ãã§å®è¡ãã¾ãã
github.com
Apache Beam 2.22.0-SNAPSHOT
Cloud Dataflow Runner
DataflowTemplates/WordCount.java at master · GoogleCloudPlatform/DataflowTemplates · GitHub
mvn archetype:generate \ -DarchetypeGroupId=org.apache.beam \ -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ -DarchetypeVersion=2.20.0 \ -DgroupId=org.example \ -DartifactId=word-count-beam \ -Dversion="0.1" \ -Dpackage=com.google.cloud.teleport.templates \ -DinteractiveMode=false
ãªãã·ã§ã³ãå®ç¾©
ãªãã·ã§ã³ã«ãã³ãã¬ã¨ä½æã«å¿
è¦ãªãã©ã¡ã¼ã¿ãå®ç¾©ãã¾ãã
Beam Programming Guide
public interface WordCountOptions extends PipelineOptions { /** * By default, this example reads from a public dataset containing the text of King Lear. Set * this option to choose a different input file or glob. */ @Description("Path of the file to read from") @Default.String("gs://apache-beam-samples/shakespeare/kinglear.txt") String getInputFile(); void setInputFile(String value); /** Set this required option to specify where to write the output. */ @Description("Path of the file to write to") @Required String getOutput(); void setOutput(String value); /** @Description("Staging for the pipeline") @Default.String("gs://streaming-datatransfer-dev/staging/") String setStagingLocation(); void setStagingLocation(String value); */ //@Description("Template for the pipeline") //@Default.String("gs://streaming-datatransfer-dev/templates/WordCountTmp.java") String getTemplateLocation(); void setTemplateLocation(String value); }
ã©ãããã.waitUntilFinish();ãã使ãã¨ãã³ãã¬ã¼ãã¯ä½ãã¾ããã¨ã©ã¼ãåºã¾ããApache Beam ã®ISSUEã«ãå®ç¾©ãã¦ããã¾ããã
ãp.run()ãã§ãã³ãã¬ã¼ããä½ãã¾ãã
Exception in thread "main" java.lang.UnsupportedOperationException: The result of template creation should not be used.
https://issues.apache.org/jira/browse/BEAM-2400
// p.run().waitUntilFinish();
p.run();
ãã³ãã¬ã¼ãä½æ
mvn -Pdataflow-runner compile exec:java \ -Dexec.mainClass=com.google.cloud.teleport.templates.WordCount \ -Dexec.args="--project=<project id> \ --tempLocation=gs://<project id> /tmp \ --templateLocation=gs://<project id> /templates/WordCountTmp \ --output=gs://<project id> /output \ --runner=DataflowRunner"
テンプレートの作成 | Cloud Dataflow | Google Cloud
Java と Apache Maven を使用したクイックスタート | Cloud Dataflow | Google Cloud
GCSä¸ã«ãã³ãã¬ã¼ããä½ããããã¨ã確èªãã¾ãã
ææ
Googleããã¥ã¡ã³ããªãã§è¦ã¤ããããªãã£ãã ãããããã¾ããããããã¥ã¡ã³ãã«è¨è¼ã®ããæ¹æ³ã§å®è¡ãã¦ããã¾ãå®è¡ã§ããªãã£ãã®ã§èª¿ã¹ãã®ã«æéããããã¾ããã