Solve Gem Installation Timeout when Building Docker Image
Recently we adopted docker to automate the deployment process. Everything worked great on small projects, and we really enjoyed that.
However, we encountered gem installation timeout problem when migrating a big project. The error happened randomly, and the error message is shown below:
Gem::RemoteFetcher::UnknownHostError: timed out (https://rubygems.org/gems/<name of the gem>.gem)
The error message is quite common when installing gems directly to your local machines or servers. Normally we just need to rerun bundle install in this case, since bundle install is able to pick up the remaining installation tasks from the interrupt point. Docker image building process will also fail when this error happens, but the next docker build is not able to restart from the failed point.
We searched online and tried the following methods:
1. Try another gem source. We tried http://rubygems.org/ and https://ruby.taobao.org/, but neither of them solved the problem.
2. Install the gems into a volume. It mitigates the problem, since we are able to have "cache" of installed gems. However, it also means that we have to install gems when starting the container instead of when building the image, which can significantly slow down the startup. Moreover, the gem installation still can fail, and we have to start up the container multiple times. Therefore, this solution did not meet our expectation.
Finally, we came up a very simple solution. First, let's look at the Dockerfile:
RUN mkdir /app/
WORKDIR /app/
ADD Gemfile* /app/
ADD scripts /app/scripts/
RUN bash scripts/bundle.sh
Instead of calling bundle install directly, we call a shell script instead. The script will rerun bundle install if the previous one failed, and it will issue 5 bundle install instructions at most:
N=0
STATUS=1
until [ ${N} -ge 5 ]
do
bundle install --without development test --jobs 4 --deployment && STATUS=0 && break
echo 'Try bundle again ...'
N=$[${N}+1]
sleep 1
done
exit ${STATUS}
When the previous bundle install is failed and less than 5 bundle install are called, the next one will be able to continue from the failed point. From our observation, the whole bundle install process would normally fail about two times, so we limit the bundle install to run at most 5 times here, and it is fine to adjust this number according to the project complexity. In this way, the docker image build will fail fast if some other issues are blocking the gems to be installed correctly.
In this way, we achieved everything we want. Gems are installed during building the image, and the gem installation is able to continue from failed point if timeout error occurred. Hopefully you can also benefit from this solution in future.