A few weeks ago, I was working on a new report for a customer's game. The platform providing campaign reports didn't have a public API to generate reports on demand with any kind of developer access key.
However, it was possible to request these reports through their dashboard. While it might seem odd to rely on a UI for downloading such reports, it was the only way to access the customer's valuable data.
Let's define the requirements for this project:
- It should be a standalone Python script for easy execution and integration with existing ETL libraries
- It shouldn't require extra software on the server except for the Docker package (which is quite flexible)
Now we're ready to build something functional. In this post, I'll use specific libraries to access the Docker process due to the version of packages installed in CentOS (in my example).
My requirements.txt:
docker==2.1.0
splinter==0.7.7
timeout-decorator==0.3.3
Splinter is a nice library that wraps browser drivers for automating interactions with web pages.
Let's define a class for running a Google Chrome container. Later, we'll use it to access web pages via the splinter library.
class _ChromeContainer:'''_ChromeContainer handles running a Chrome Docker containerin the background.Requires Docker service on the machine to pull and run images.'''def __init__(self):self.__image_name = "selenium/standalone-chrome:3.10.0"self.__client = docker.from_env()def run(self):'''Starts up a Docker container with chromedriver and waits for it to reach running state'''client = self.__clientself.container = client.containers.run(self.__image_name,detach=True,ports={'4444/tcp': None})@timeout_decorator.timeout(120)def waiting_up(client: docker.client.DockerClient, container):while True:container.reload()if container.status == "running":breaktime.sleep(1)waiting_up(client, self.container)def quit(self):'''Kills and deletes the container'''self.container.kill()@propertydef public_port(self):container = self.__chrome_container.containerreturn container.attrs["NetworkSettings"]["Ports"]["4444/tcp"][0]["HostPort"]
Now we're ready to use splinter and _ChromeContainer to automate our task.
import timeout_decoratorimport dockerfrom splinter import Browserclass Worker:def __init__(self):self.__chrome_container = _ChromeContainer()def process(self):self.__chrome_container.run()self.__web_client = Browser('remote',url="http://127.0.0.1:{}/wd/hub".format(self.__chrome_container.public_port),browser='chrome')# Example for login request:try:self.__login()finally:self.__web_client.quit()self.__chrome_container.quit()def __login(self):self.__web_client.visit("http://www.example.com/login")self.__web_client.fill('developer_session[email]', 'EXAMPLE_USERNAME')self.__web_client.fill('developer_session[password]', 'EXAMPLE_PASSWORD')button = self.__web_client.find_by_id('developer_session_submit')button.click()
This is just an example, and you can extend it with similar steps like __login in your Worker class to accomplish more complex automation tasks.
Thank you for reading! :)