-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5332: [R] Update R package README with richer installation instructions #4322
Conversation
Confirmed that this fixes the installation on sparklyr too: https://travis-ci.org/nealrichardson/sparklyr/jobs/533045007 |
Probably worth waiting for a review from @romainfrancois... Those flags were added for the initial commit from Romain: ea8940a#diff-2816240325a8e9e6df50fb7c1faa8113 and then removed again by Romain with a comment pointing to @jeroen. I don't see other R packages using the path explicitly. So maybe there is some abnormal configuration in Travis and your machine or maybe there is something deeper going on. I did spot this article: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Unix_002dalikes -- In any case, probably worth getting more eyes on this change. |
You shouldn't hardcode your local installation path in the generic makefile. What problem are you trying to solve exactly? This works for me on macos:
And then in R: remotes::install_github("apache/arrow/r") |
That works for me, too, but that's not the problem I'm trying to solve here. See https://issues.apache.org/jira/browse/ARROW-5332 for details. Installing the R package using the locally built C++ library is failing, not only on my machine but also on Travis on Ubuntu for the sparklyr package. It was not my intention to hardcode my path; I merely bisected git history to find out when this stopped working, and I landed on that line in that diff. A couple of alternatives that come to mind:
|
If it works for the brewed libarrow, and for arrow 0.13, I think this is a problem in the C++ library? Either way I think hardcoding a local path in the r bindings is not the solution. |
See relevant rpath-related build options
Not sure if changing those fixes the issue, but this is the first rpath issue we've seen in quite some time |
@nealrichardson is libarrow in your LD_LIBRARY_PATH/DYLD_LIBRARY_PATH for your local install? If the library is in a non-standard location then you need to do that |
Like I said, setting Stepping back, I wonder if it would make sense for us to provide an installation script that compiles with the appropriate flags that the R package needs (leaving aside this path issue, it seems that R doesn't currently need Gandiva, Flight, et al.). The R README could recommend using it to install from source, and other packages like |
For posterity, I came across this discussion around the original hard-coding of the rpath: #2489 (comment). |
On Linux, you need to add the directory where your diff --git a/.travis.yml b/.travis.yml
index 77574796..62211779 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -74,6 +74,7 @@ matrix:
- ARROW_ENABLED="true"
- ARROW_VERSION="devel"
- ARROW_SOURCE="build"
+ - LD_LIBRARY_PATH="/usr/local/lib"
- JAVA_VERSION=oraclejdk8
- name: "Deps Devel (tidyverse, r-lib, forge)"
env: R_DEVEL_PACKAGES="true"
@@ -92,6 +93,7 @@ matrix:
- ARROW_ENABLED="true"
- ARROW_VERSION="devel"
- ARROW_SOURCE="build"
+ - LD_LIBRARY_PATH="/usr/local/lib"
- JAVA_VERSION=oraclejdk8
before_install: See https://travis-ci.org/kou/sparklyr/builds/534033939 for the result with the change. Note that R build is broken on the current master because R needs follow changes at e68ca7f . You need to use 1ed608a for now: diff --git a/ci/arrow-build.sh b/ci/arrow-build.sh
index 247df096..a6fa4f43 100644
--- a/ci/arrow-build.sh
+++ b/ci/arrow-build.sh
@@ -9,6 +9,7 @@ sudo apt install -y -V autoconf
if [[ $ARROW_VERSION == "devel" ]]; then
git clone https://github.com/apache/arrow
cd arrow/cpp
+ git checkout 1ed608aa94f0d22cfa69c81687006b4ebaefa258
else
wget https://arrowlib.rstudio.com/dist/arrow/arrow-$ARROW_VERSION/apache-arrow-$ARROW_VERSION.tar.gz
tar -xvzf apache-arrow-$ARROW_VERSION.tar.gz
|
FYI: Setting |
I agree with @kou that this particular issue (making sure your system can "see" the installed C++ libraries) is something that developers need to be aware of (which we should document adequately, of course) -- I'm skeptical that a one-size-fits-all and cross-platform solution can be developed. For Python we modify LD_LIBRARY_PATH to point to the custom install location, as do the GLib and Ruby tests arrow/ci/travis_script_python.sh Line 27 in d8e4763
For development it is not recommended to install development libraries into default system locations because it contaminates any packages installed by package managers such as |
Thanks @kou! I'll rework this patch to just update the R README with appropriate installation guidelines, and I'll take care of submitting the sparklyr Travis patch, unless you beat me to it. More generally,
|
Thanks!
If If If we build, install and use |
This may be an out of scope topic of this but it may be better that we provide an install script for Travis CI.
We can do the same thing for Apache Arrow. |
Thanks @kou for the explanation, that's really helpful. I agree that it would be a good idea to provide and manage an install script, or set of scripts, for use in other projects on Travis and otherwise. I'll make a new Jira for that. I'll limit this ticket to be about the R installation docs/README. One last thought here: what if |
I'm sorry but I couldn't understand. |
I meant, let's assume this workflow for installing, adapted from the R package README git clone https://github.com/apache/arrow.git
mkdir arrow/cpp/build
cd !$
cmake .. -DARROW_PARQUET=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BOOST_USE_SHARED:BOOL=Off
make install
cd ../../r
R CMD INSTALL . That will apparently fail unless I do |
The build system isn't able to modify the environment like that. Such a modification would not be permanent, anyway, so it would be lost the next time you open a terminal. It's better to put such modifications in a shell script that is executed like |
@nealrichardson do you really need to build arrow from source? Perhaps you can install the arrow for linux from the bintray repositories as described here: https://arrow.apache.org/install/ After that you can simply use Maybe the readme should emphasize that building R from source is only needed if you are hacking on libarrow itself. For developing the R bindings, it is safer to use the libarrow binaries. |
@jeroen yeah I do need to install from source unless there are nightly builds available for download somewhere. I was looking into adding bindings for features in the C++ library that have been added since the 0.13 release. Plus I was interested to see if the installation instructions were still valid... @wesm alright, I'll update the README here and ticket any future work on installation scripts. Thanks all for your patience and help. |
@nealrichardson Thanks. I understand. We can't do it as @wesm said. How about changing the instruction to the following in README.md? cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/local/arrow -DARROW_PARQUET=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BOOST_USE_SHARED:BOOL=Off
make install
export LD_LIBRARY_PATH=$HOME/local/arrow/lib:$LD_LIBRARY_PATH
echo "export LD_LIBRARY_PATH=\$HOME/local/arrow/lib:\$LD_LIBRARY_PATH" >> ~/.$(basename $SHELL)rc @jeroen I agree to use Arrow C++ package for using the R bindings. But I don't agree to use Arrow C++ package for developing the R bindings. Because Arrow C++ API may be changed and added after the latest release. The R bindings should follow these changes and new features. |
476274d
to
e21732a
Compare
Thanks all for your help here. I've taken a different approach and just updated the README to reflect what appears to be best practice now, plus some pointers to help if things go wrong. No build changes or paths hard-coded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update the title of this pull request?
|
||
On macOS, you may install using homebrew: | ||
The current release of Apache Arrow is 0.13. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the current release version automatically on our release process when we want to embed the current release version.
https://github.com/apache/arrow/blob/master/dev/release/00-prepare.sh#L96-L110
https://github.com/apache/arrow/blob/master/dev/release/00-prepare-test.rb#L126-L139
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created https://issues.apache.org/jira/browse/ARROW-5415 for this.
|
||
```r | ||
# install.packages("remotes") # Or install "devtools", which includes remotes | ||
remotes::install_github("apache/arrow/r", ref="76e1bc5dfb9d08e31eddd5cbcc0b1bab934da2c7") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use tag name instead of commit ID?
Does remotes::install_github("apache/arrow/[email protected]")
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we can, but there were commits after the 0.13.0 arrow release that were needed as Romain, Jeroen, and Javier tried to get the package released and accepted by CRAN, so unfortunately this commit is effectively the 0.13.0 R package release point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that any artifact that has not been voted on by the Arrow PMC is not an official release. I think it's fine for CRAN builds to be "unofficial" until we get the development workflow sorted out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted; just trying to be pragmatic and give installation instructions that work. We'll be sure to keep R on the "official" path going forward.
r/README.Rmd
Outdated
brew install apache-arrow | ||
``` | ||
|
||
### From source | ||
On Linux, see the [Arrow project installation page](http://arrow.apache.org/install/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use https
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course, sorry
|
||
``` | ||
On macOS, you may install the C++ library using [Homebrew](https://brew.sh/): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this instruction to https://github.com/apache/arrow/blob/master/site/install.md as a separated pull request?
Then we can remove this section from this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://issues.apache.org/jira/browse/ARROW-5416 created. (Though I probably won't remove this from the R README even once it's added there, just for convenience.)
r/README.Rmd
Outdated
dlopen(/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: @rpath/libarrow.14.dylib | ||
``` | ||
|
||
try setting the environment variable `LD_LIBRARY_PATH` to wherever Arrow C++ was put in `make install`, e.g. `export LD_LIBRARY_PATH=/usr/local/lib`, and retry installing the R package. An alternative solution on macOS is to rebuild the C++ library adding the `-DARROW_INSTALL_NAME_RPATH=OFF` option to the `cmake` command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use DYLD_LIBRARY_PATH
on macOS. LD_LIBRARY_PATH
isn't used on macOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised (and also fixed the http->https), PTAL
r/README.Rmd
Outdated
|
||
try setting the environment variable `LD_LIBRARY_PATH` to wherever Arrow C++ was put in `make install`, e.g. `export LD_LIBRARY_PATH=/usr/local/lib`, and retry installing the R package. An alternative solution on macOS is to rebuild the C++ library adding the `-DARROW_INSTALL_NAME_RPATH=OFF` option to the `cmake` command. | ||
|
||
For any other build/configuration challenges, see the [C++ developer guide](http://arrow.apache.org/docs/developers/cpp.html#building). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use https
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I'm getting these http
URLs because I'm copying from the web browser, and it's not redirecting to https
like most sites do. I made https://issues.apache.org/jira/browse/ARROW-5417 for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks @kou! 🎉 |
These extra flags got dropped in a44d1f5#diff-2816240325a8e9e6df50fb7c1faa8113R21. This change resolves the issue for me locally on macOS, and I'll confirm that it fixes the Ubuntu build for sparklyr and that it doesn't break our Travis build.