Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-5332: [R] Update R package README with richer installation instructions #4322

Closed
wants to merge 4 commits into from

Conversation

nealrichardson
Copy link
Member

These extra flags got dropped in a44d1f5#diff-2816240325a8e9e6df50fb7c1faa8113R21. This change resolves the issue for me locally on macOS, and I'll confirm that it fixes the Ubuntu build for sparklyr and that it doesn't break our Travis build.

@nealrichardson
Copy link
Member Author

Confirmed that this fixes the installation on sparklyr too: https://travis-ci.org/nealrichardson/sparklyr/jobs/533045007

@javierluraschi
Copy link
Contributor

Probably worth waiting for a review from @romainfrancois... Those flags were added for the initial commit from Romain: ea8940a#diff-2816240325a8e9e6df50fb7c1faa8113 and then removed again by Romain with a comment pointing to @jeroen. I don't see other R packages using the path explicitly. So maybe there is some abnormal configuration in Travis and your machine or maybe there is something deeper going on. I did spot this article: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Unix_002dalikes -- In any case, probably worth getting more eyes on this change.

@jeroen
Copy link
Contributor

jeroen commented May 16, 2019

You shouldn't hardcode your local installation path in the generic makefile. What problem are you trying to solve exactly? This works for me on macos:

brew install apache-arrow

And then in R:

remotes::install_github("apache/arrow/r")

@nealrichardson
Copy link
Member Author

That works for me, too, but that's not the problem I'm trying to solve here. See https://issues.apache.org/jira/browse/ARROW-5332 for details. Installing the R package using the locally built C++ library is failing, not only on my machine but also on Travis on Ubuntu for the sparklyr package.

It was not my intention to hardcode my path; I merely bisected git history to find out when this stopped working, and I landed on that line in that diff.

A couple of alternatives that come to mind:

  1. Determine that -rpath location dynamically and maybe only include it when pkg-config is used
  2. Figure out what's different about how https://github.com/apache/arrow/blob/master/ci/travis_before_script_cpp.sh builds and installs and do that since that works. I noticed, for example, that it installs with -DARROW_INSTALL_NAME_RPATH=OFF. I tried that and that appeared to work locally for me, but I still got a failure to install on sparklyr Travis. Fortunately, arrow's CMake run now prints a build configuration summary so we can look to see what's different; unfortunately, the summary is over 80 lines long because there are so many flags.

@jeroen
Copy link
Contributor

jeroen commented May 17, 2019

If it works for the brewed libarrow, and for arrow 0.13, I think this is a problem in the C++ library? Either way I think hardcoding a local path in the r bindings is not the solution.

@wesm
Copy link
Member

wesm commented May 17, 2019

See relevant rpath-related build options

define_option(ARROW_RPATH_ORIGIN "Build Arrow libraries with RATH set to \$ORIGIN" OFF)

Not sure if changing those fixes the issue, but this is the first rpath issue we've seen in quite some time

@wesm
Copy link
Member

wesm commented May 17, 2019

@nealrichardson is libarrow in your LD_LIBRARY_PATH/DYLD_LIBRARY_PATH for your local install? If the library is in a non-standard location then you need to do that

@nealrichardson
Copy link
Member Author

Like I said, setting -DARROW_INSTALL_NAME_RPATH=OFF (specifically, adding it to the cmake command given on the R README), following what the Arrow Travis build does, resolves the issue for me locally, without hardcoding an rpath in the R makevars. But when I added that to sparklyr's Travis build script, it still failed. I haven't further dug into how its Travis build setup differs from the Arrow build setup.

Stepping back, I wonder if it would make sense for us to provide an installation script that compiles with the appropriate flags that the R package needs (leaving aside this path issue, it seems that R doesn't currently need Gandiva, Flight, et al.). The R README could recommend using it to install from source, and other packages like sparklyr (and I expect there will be more) could use it too. That way it would also be something we own and maintain, rather than have others' maintain their own bespoke scripts that drift and cause bug reports like this.

@nealrichardson
Copy link
Member Author

For posterity, I came across this discussion around the original hard-coding of the rpath: #2489 (comment).

@kou
Copy link
Member

kou commented May 18, 2019

ARROW_INSTALL_NAME_RPATH is for macOS. So the Travis CI failure is unrelated because it's on Linux.

On Linux, you need to add the directory where your libarrow.so to LD_LIBRARY_PATH. The following patch will fix the failure:

diff --git a/.travis.yml b/.travis.yml
index 77574796..62211779 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -74,6 +74,7 @@ matrix:
         - ARROW_ENABLED="true"
         - ARROW_VERSION="devel"
         - ARROW_SOURCE="build"
+        - LD_LIBRARY_PATH="/usr/local/lib"
         - JAVA_VERSION=oraclejdk8
     - name: "Deps Devel (tidyverse, r-lib, forge)"
       env: R_DEVEL_PACKAGES="true"
@@ -92,6 +93,7 @@ matrix:
         - ARROW_ENABLED="true"
         - ARROW_VERSION="devel"
         - ARROW_SOURCE="build"
+        - LD_LIBRARY_PATH="/usr/local/lib"
         - JAVA_VERSION=oraclejdk8
 
 before_install:

See https://travis-ci.org/kou/sparklyr/builds/534033939 for the result with the change.

Note that R build is broken on the current master because R needs follow changes at e68ca7f . You need to use 1ed608a for now:

diff --git a/ci/arrow-build.sh b/ci/arrow-build.sh
index 247df096..a6fa4f43 100644
--- a/ci/arrow-build.sh
+++ b/ci/arrow-build.sh
@@ -9,6 +9,7 @@ sudo apt install -y -V autoconf
 if [[ $ARROW_VERSION == "devel" ]]; then
   git clone https://github.com/apache/arrow
   cd arrow/cpp
+  git checkout 1ed608aa94f0d22cfa69c81687006b4ebaefa258
 else
   wget https://arrowlib.rstudio.com/dist/arrow/arrow-$ARROW_VERSION/apache-arrow-$ARROW_VERSION.tar.gz
   tar -xvzf apache-arrow-$ARROW_VERSION.tar.gz

@kou
Copy link
Member

kou commented May 18, 2019

FYI: Setting LD_LIBRARY_PATH by users is a general task when users install a library to non /usr/lib directory. It's not Arrow specific task.

@wesm
Copy link
Member

wesm commented May 18, 2019

I agree with @kou that this particular issue (making sure your system can "see" the installed C++ libraries) is something that developers need to be aware of (which we should document adequately, of course) -- I'm skeptical that a one-size-fits-all and cross-platform solution can be developed.

For Python we modify LD_LIBRARY_PATH to point to the custom install location, as do the GLib and Ruby tests

export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH

For development it is not recommended to install development libraries into default system locations because it contaminates any packages installed by package managers such as brew. To motivate why this is a problem, there is no make uninstall so if you want to remove the development libraries you have to delete them manually which is error prone

@nealrichardson
Copy link
Member Author

Thanks @kou! I'll rework this patch to just update the R README with appropriate installation guidelines, and I'll take care of submitting the sparklyr Travis patch, unless you beat me to it.

More generally,

  1. I wonder if the default for ARROW_INSTALL_NAME_RPATH is wrong, because I see lots of places that have to set it to OFF
  2. I still think we can do better at collecting our C++ dev installation best practices in one or more scripts that we recommend people to use, or at least start with. Maybe not for Arrow C++ devs who want more control but at least for developers of the Python, Ruby, and R bindings who want to work off of the latest C++ lib and can't just install the published packages, and for those building libraries on top of those bindings (like the sparklyr case). It may not be one-size-fits-all, but I think it's a small number of variations we need to care about. We have the core of this already in the scripts in /ci. I'll make a Jira ticket and we can discuss there.

@kou
Copy link
Member

kou commented May 19, 2019

I'll take care of submitting the sparklyr Travis patch

Thanks!

I wonder if the default for ARROW_INSTALL_NAME_RPATH is wrong, because I see lots of places that have to set it to OFF

ARROW_INSTALL_NAME_RPATH changes install_name of libarrow.dylib.

If ARROW_INSTALL_NAME_RPATH is used, the install_name is @rpath/libarryw.dylib. If we use@rpath, we can put libarrow.dylib anywhere. Because we can replace @rpath by DYLD_LIBRARY_PATH environment variable at run-time.

If ARROW_INSTALL_NAME_RPATH isn't used, the install_name is ${CMAKE_INSTALL_PREFIX}/lib/libarrow.dylib. If we use the static path, we can put libarrow.dylib to only ${CMAKE_INSTALL_PREFIX}/lib. (make install does this.)

If we build, install and use libarrow.dylib on the same machine, we can use the static path (we can use -DARROW_INSTALL_NAME_RPATH=OFF.) But if we build libarrow.dylib on A and install and use libarrow.dylib on B, we should use @rpath. Because we can't assume where libarrow.dylib is installed.

@kou
Copy link
Member

kou commented May 20, 2019

This may be an out of scope topic of this but it may be better that we provide an install script for Travis CI.
I did this for my product Groonga: https://github.com/groonga/groonga/blob/master/data/travis/setup.sh
Users can install Groonga by curl --silent --location https://github.com/groonga/groonga/raw/master/data/travis/setup.sh | sh: https://github.com/pgroonga/pgroonga/blob/master/.travis.yml#L43
If users want to use master instead of released version, users just define GROONGA_MASTER=yes variable:

We can do the same thing for Apache Arrow.

@nealrichardson
Copy link
Member Author

Thanks @kou for the explanation, that's really helpful.

I agree that it would be a good idea to provide and manage an install script, or set of scripts, for use in other projects on Travis and otherwise. I'll make a new Jira for that. I'll limit this ticket to be about the R installation docs/README.

One last thought here: what if make install did some version of export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ARROW_CPP_INSTALL/lib at the end? I see some version of that in 5 of our ci/ scripts and recommended in the docs for several language bindings. For the use case where someone is building/installing the C++ library in order to then build some other project, it would help, and it wouldn't hurt anything else (IIUC).

@kou
Copy link
Member

kou commented May 21, 2019

what if make install did some version of export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ARROW_CPP_INSTALL/lib at the end?

I'm sorry but I couldn't understand.
Do you want to know which is recommended, LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ARROW_CPP_INSTALL/lib or LD_LIBRARY_PATH=$ARROW_CPP_INSTALL/lib:$LD_LIBRARY_PATH?

@nealrichardson
Copy link
Member Author

I meant, let's assume this workflow for installing, adapted from the R package README

git clone https://github.com/apache/arrow.git
mkdir arrow/cpp/build
cd !$
cmake .. -DARROW_PARQUET=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BOOST_USE_SHARED:BOOL=Off
make install
cd ../../r
R CMD INSTALL .

That will apparently fail unless I do export LD_LIBRARY_PATH=/usr/local/bin before I R CMD INSTALL ., otherwise it won't find libarrow. So I can add that export as a step in the recommended instructions. But why not have make install set that env var for me?

@wesm
Copy link
Member

wesm commented May 21, 2019

The build system isn't able to modify the environment like that. Such a modification would not be permanent, anyway, so it would be lost the next time you open a terminal. It's better to put such modifications in a shell script that is executed like source arrow_set_env.sh or similar (that's how I do development locally, anyway)

@jeroen
Copy link
Contributor

jeroen commented May 21, 2019

@nealrichardson do you really need to build arrow from source? Perhaps you can install the arrow for linux from the bintray repositories as described here: https://arrow.apache.org/install/

After that you can simply use remotes::install_github("apache/arrow/r") in R.

Maybe the readme should emphasize that building R from source is only needed if you are hacking on libarrow itself. For developing the R bindings, it is safer to use the libarrow binaries.

@nealrichardson
Copy link
Member Author

@jeroen yeah I do need to install from source unless there are nightly builds available for download somewhere. I was looking into adding bindings for features in the C++ library that have been added since the 0.13 release. Plus I was interested to see if the installation instructions were still valid...

@wesm alright, I'll update the README here and ticket any future work on installation scripts.

Thanks all for your patience and help.

@kou
Copy link
Member

kou commented May 22, 2019

@nealrichardson Thanks. I understand.

We can't do it as @wesm said.

How about changing the instruction to the following in README.md?

cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/local/arrow -DARROW_PARQUET=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BOOST_USE_SHARED:BOOL=Off
make install
export LD_LIBRARY_PATH=$HOME/local/arrow/lib:$LD_LIBRARY_PATH
echo "export LD_LIBRARY_PATH=\$HOME/local/arrow/lib:\$LD_LIBRARY_PATH" >> ~/.$(basename $SHELL)rc

@jeroen I agree to use Arrow C++ package for using the R bindings. But I don't agree to use Arrow C++ package for developing the R bindings.

Because Arrow C++ API may be changed and added after the latest release. The R bindings should follow these changes and new features.

@nealrichardson
Copy link
Member Author

Thanks all for your help here. I've taken a different approach and just updated the README to reflect what appears to be best practice now, plus some pointers to help if things go wrong. No build changes or paths hard-coded.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update the title of this pull request?


On macOS, you may install using homebrew:
The current release of Apache Arrow is 0.13.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update the current release version automatically on our release process when we want to embed the current release version.
https://github.com/apache/arrow/blob/master/dev/release/00-prepare.sh#L96-L110
https://github.com/apache/arrow/blob/master/dev/release/00-prepare-test.rb#L126-L139

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


```r
# install.packages("remotes") # Or install "devtools", which includes remotes
remotes::install_github("apache/arrow/r", ref="76e1bc5dfb9d08e31eddd5cbcc0b1bab934da2c7")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use tag name instead of commit ID?

Does remotes::install_github("apache/arrow/[email protected]") work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we can, but there were commits after the 0.13.0 arrow release that were needed as Romain, Jeroen, and Javier tried to get the package released and accepted by CRAN, so unfortunately this commit is effectively the 0.13.0 R package release point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that any artifact that has not been voted on by the Arrow PMC is not an official release. I think it's fine for CRAN builds to be "unofficial" until we get the development workflow sorted out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted; just trying to be pragmatic and give installation instructions that work. We'll be sure to keep R on the "official" path going forward.

r/README.Rmd Outdated
brew install apache-arrow
```

### From source
On Linux, see the [Arrow project installation page](http://arrow.apache.org/install/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use https?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course, sorry


```
On macOS, you may install the C++ library using [Homebrew](https://brew.sh/):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this instruction to https://github.com/apache/arrow/blob/master/site/install.md as a separated pull request?
Then we can remove this section from this pull request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://issues.apache.org/jira/browse/ARROW-5416 created. (Though I probably won't remove this from the R README even once it's added there, just for convenience.)

r/README.Rmd Outdated
dlopen(/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: @rpath/libarrow.14.dylib
```

try setting the environment variable `LD_LIBRARY_PATH` to wherever Arrow C++ was put in `make install`, e.g. `export LD_LIBRARY_PATH=/usr/local/lib`, and retry installing the R package. An alternative solution on macOS is to rebuild the C++ library adding the `-DARROW_INSTALL_NAME_RPATH=OFF` option to the `cmake` command.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use DYLD_LIBRARY_PATH on macOS. LD_LIBRARY_PATH isn't used on macOS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised (and also fixed the http->https), PTAL

r/README.Rmd Outdated

try setting the environment variable `LD_LIBRARY_PATH` to wherever Arrow C++ was put in `make install`, e.g. `export LD_LIBRARY_PATH=/usr/local/lib`, and retry installing the R package. An alternative solution on macOS is to rebuild the C++ library adding the `-DARROW_INSTALL_NAME_RPATH=OFF` option to the `cmake` command.

For any other build/configuration challenges, see the [C++ developer guide](http://arrow.apache.org/docs/developers/cpp.html#building).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use https?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I'm getting these http URLs because I'm copying from the web browser, and it's not redirecting to https like most sites do. I made https://issues.apache.org/jira/browse/ARROW-5417 for that.

@nealrichardson nealrichardson changed the title ARROW-5332: [R] Fix for failure to find libarrow on R package installation ARROW-5332: [R] Update R package README with richer installation instructions May 24, 2019
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou closed this in 184b8de May 24, 2019
@nealrichardson
Copy link
Member Author

Thanks @kou! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants