Browser automation using docker and python
A few weeks ago, I was working on a new report for a customer's game. The platform providing campaign reports didn't have a public API to generate reports on demand with any kind of developer access key.
However, it was possible to request these reports through their dashboard. While it might seem odd to rely on a UI for downloading such reports, it was the only way to access the customer's valuable data.
Let's define the requirements for this project:
- It should be a standalone Python script for easy execution and integration with existing ETL libraries
- It shouldn't require extra software on the server except for the Docker package (which is quite flexible)
Now we're ready to build something functional. In this post, I'll use specific libraries to access the Docker process due to the version of packages installed in CentOS (in my example).
My requirements.txt:
docker==2.1.0
splinter==0.7.7
timeout-decorator==0.3.3
Splinter is a nice library that wraps browser drivers for automating interactions with web pages.
Let's define a class for running a Google Chrome
container. Later, we'll use it to access web pages via the splinter
library.
class _ChromeContainer:
'''
_ChromeContainer handles running a Chrome Docker container
in the background.
Requires Docker service on the machine to pull and run images.
'''
def __init__(self):
self.__image_name = "selenium/standalone-chrome:3.10.0"
self.__client = docker.from_env()
def run(self):
'''
Starts up a Docker container with chromedriver and waits for it to reach running state
'''
client = self.__client
self.container = client.containers.run(self.__image_name,
detach=True,
ports={'4444/tcp': None})
@timeout_decorator.timeout(120)
def waiting_up(client: docker.client.DockerClient, container):
while True:
container.reload()
if container.status == "running":
break
time.sleep(1)
waiting_up(client, self.container)
def quit(self):
'''
Kills and deletes the container
'''
self.container.kill()
@property
def public_port(self):
container = self.__chrome_container.container
return container.attrs["NetworkSettings"]["Ports"]["4444/tcp"][0]["HostPort"]
Now we're ready to use splinter
and _ChromeContainer
to automate our task.
import timeout_decorator
import docker
from splinter import Browser
class Worker:
def __init__(self):
self.__chrome_container = _ChromeContainer()
def process(self):
self.__chrome_container.run()
self.__web_client = Browser('remote',
url="http://127.0.0.1:{}/wd/hub".format(self.__chrome_container.public_port),
browser='chrome')
# Example for login request:
try:
self.__login()
finally:
self.__web_client.quit()
self.__chrome_container.quit()
def __login(self):
self.__web_client.visit("http://www.example.com/login")
self.__web_client.fill('developer_session[email]', 'EXAMPLE_USERNAME')
self.__web_client.fill('developer_session[password]', 'EXAMPLE_PASSWORD')
button = self.__web_client.find_by_id('developer_session_submit')
button.click()
This is just an example, and you can extend it with similar steps like __login
in your Worker
class to accomplish more complex automation tasks.
Thank you for reading! :)