Example of Selenium with Python on Docker with latest FireFox

This is an example of how to run the Selenium with Python on Docker with the latest FireFox and its geckodriver.

The official Python docker image is based on Debian 10 (Buster)

1$ docker run -i --rm python:3.9 cat /etc/os-release | grep VERSION=
2VERSION="10 (buster)"

In this image, the installable FireFox package by the default registered repository is the ESR version which stands for Extended Support Release. It’s like LTS on Ubuntu and this version is not up-to-date. At the time of this writing, the latest version is 87 but the ESR is 78.10.

1$ docker run -it --rm python:3.9 sh -c "apt update; apt search firefox | grep firefox-esr/stable"
2firefox-esr/stable 78.10.0esr-1~deb10u1 amd64

And also, geckodriver which is necessary to use by Selenium is not included. That’s why I needed to install manually. Here is a Dockerfile example.

 1FROM python:3.9
 2 
 3ENV DEBIAN_FRONTEND noninteractive
 4ENV GECKODRIVER_VER v0.29.0
 5ENV FIREFOX_VER 87.0
 6 
 7RUN set -x \
 8   && apt update \
 9   && apt upgrade -y \
10   && apt install -y \
11       firefox-esr \
12   && pip install  \
13       requests \
14       selenium \
15 
16# Add latest FireFox
17RUN set -x \
18   && apt install -y \
19       libx11-xcb1 \
20       libdbus-glib-1-2 \
21   && curl -sSLO https://download-installer.cdn.mozilla.net/pub/firefox/releases/${FIREFOX_VER}/linux-x86_64/en-US/firefox-${FIREFOX_VER}.tar.bz2 \
22   && tar -jxf firefox-* \
23   && mv firefox /opt/ \
24   && chmod 755 /opt/firefox \
25   && chmod 755 /opt/firefox/firefox
26  
27# Add geckodriver
28RUN set -x \
29   && curl -sSLO https://github.com/mozilla/geckodriver/releases/download/${GECKODRIVER_VER}/geckodriver-${GECKODRIVER_VER}-linux64.tar.gz \
30   && tar zxf geckodriver-*.tar.gz \
31   && mv geckodriver /usr/bin/
32 
33COPY ./app /app
34 
35WORKDIR /app
36 
37CMD python ./main.py

And this is an example using Selenium in Python code. [app/scraper.py]

 1from selenium import webdriver
 2from selenium.webdriver.firefox.options import Options
 3from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
 4import time
 5 
 6# FireFox binary path (Must be absolute path)
 7FIREFOX_BINARY = FirefoxBinary('/opt/firefox/firefox')
 8 
 9# FireFox PROFILE
10PROFILE = webdriver.FirefoxProfile()
11PROFILE.set_preference("browser.cache.disk.enable", False)
12PROFILE.set_preference("browser.cache.memory.enable", False)
13PROFILE.set_preference("browser.cache.offline.enable", False)
14PROFILE.set_preference("network.http.use-cache", False)
15PROFILE.set_preference("general.useragent.override","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/20100101 Firefox/72.0")
16 
17# FireFox Options
18FIREFOX_OPTS = Options()
19FIREFOX_OPTS.log.level = "trace"    # Debug
20FIREFOX_OPTS.headless = True
21GECKODRIVER_LOG = '/geckodriver.log'
22 
23class Scraper:
24   def __init__(self):
25	ff_opt = {
26		firefox_binary=FIREFOX_BINARY,
27		firefox_profile=PROFILE,
28		options=FIREFOX_OPTS,
29		service_log_path=GECKODRIVER_LOG
30	}
31       self.DRIVER = webdriver.Firefox(**ff_opt)
32 
33   def scrape(self, link):
34       try:
35           self.DRIVER.get(link)
36           time.sleep(5) # just in case
37           html = self.DRIVER.page_source
38 
39           return html
40 
41       except Exception as e:
42           print(e)

Execute the above class like this. [app/main.py]

1from scraper import Scraper
2 
3pretty_html = SCRAPER.scrape(link).prettify()

One more tips, on ubuntu:20.04 docker image it’s possible to install the latest FireFox package, not ESR. So, another way is setting up Python environment based on this image.

1$ docker run -it --rm ubuntu:20.04 sh -c "apt update; apt search firefox | grep ^firefox/"
2firefox/focal-updates,focal-security 88.0+build2-0ubuntu0.20.04.1 amd64

Reference