Selenium Grid 4: Do you really need it?

Picture created by mamewmy — www.freepik.com

Hi there! We continue our cycle of articles related to efficient browser automation infrastructure. Full list of articles is published on our website. During the last months we have been asked a lot about the upcoming Selenium Grid 4 release. People are often asking how our tools differ from Selenium Grid and how this new release being developed for two years already can affect their work. Today we are going to dive into the details of Selenium implementation to understand whether you need Selenium Grid at all.

Selenium Protocol

Selenium exists since 2004 and how your tests send commands to Selenium almost did not change since that time. In Selenium test code you describe test automation scenario in browser using basic commands: starting a browser, opening a page, searching for elements on the page, typing or clicking on elements, executing Javascript code, taking screenshot and so on. When you invoke every such command behind the scene your Selenium library (also called Selenium client) sends an HTTP request to running Selenium server and receives a response with result. For example a typical request for starting a browser will look like the following:

# Request from the test
POST /session HTTP/1.1
Content-Type: application/json; charset=utf-8
Host: localhost:4444
Connection: Keep-Alive
Accept-Encoding: gzip

{
"capabilities": {
"alwaysMatch": {
"browserName": "chrome"
}
}
}


# Response from Selenium
HTTP/1.1 200 OK
Content-Length:680
Content-Type:application/json; charset=utf-8
cache-control:no-cache

{"value":{"capabilities":{"acceptInsecureCerts":false,"browserName":"chrome","browserVersion":"87.0.4280.88","chrome":{"chromedriverVersion":"87.0.4280.88 (89e2380a3e36c3464b5dd1302349b1382549290d-refs/branch-heads/4280@{#1761})","userDataDir":"/tmp/.com.google.Chrome.u7j35F"},"goog:chromeOptions":{"debuggerAddress":"localhost:46289"},"networkConnectionEnabled":false,"pageLoadStrategy":"normal","platformName":"linux","proxy":{},"setWindowRect":true,"strictFileInteractability":false,"timeouts":{"implicit":0,"pageLoad":300000,"script":30000},"unhandledPromptBehavior":"dismiss and notify","webauthn:virtualAuthenticators":true},"sessionId":"ad4ce7d25e8c33f578cf1464592e0970"}}

As you can see commands and results are being transferred as JSON and every command has a separate HTTP endpoint. For example:

POST /session # Start a browser
GET /session/<id>/url # Open a page in browser
GET /session/<id>/screenshot # Take a screenshot
DELETE /session/<id> # Stop a browser

Full list of endpoints and JSON data format being used constitute Selenium protocol. Two versions of Selenium protocol exist. Initial version of the protocol existed since the creation of Selenium and because of JSON being used as serialization format was called simply Selenium JSON Wire Protocol. When Selenium became de-facto worldwide browser automation standard its protocol was reconsidered and published as W3C standard in May 2018. This updated version of the protocol is usually referred as W3C WebDriver Protocol. You can take a look at this standard here.

What are the differences between two protocols? Frankly speaking, there are two main changes:

  1. A bit different list of HTTP endpoints. The most popular commands like starting a browser or taking a screenshot have just the same HTTP endpoints. Some endpoints were renamed. For example GET /session/:sessionId/element/:id/size command for getting page element dimensions was renamed to GET /session/:sessionId/element/:id/rect. Some redundant endpoints were removed. For example, POST /session/:sessionId/moveto endpoint for moving mouse to specified position was removed in favor to more generic actions API POST /session/:sessionId/actions allowing to initiate complicated sequences of actions on the web page.
  2. A bit different JSON data format. For example this is how a new browser request looks like in Selenium JSON Wire protocol:
# Request from the test
POST /wd/hub/session HTTP/1.1
Content-Type: application/json; charset=utf-8
Host: localhost:4444
Connection: Keep-Alive
Accept-Encoding: gzip

{
"desiredCapabilities": {
"browserName": "firefox"
}
}

# Response from Selenium
HTTP/1.1 200 OK
Date: Tue, 23 Feb 2021 12:59:30 GMT
Server: Jetty/5.1.x (Linux/3.13.0-24-generic amd64 java/1.8.0_151
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache
Cache-Control: no-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-Length: 546
Content-Type: application/json; charset=utf-8

{"state":null,"sessionId":"085a40cc-896e-425d-b0d8-be1dbeb7ff88","hCode":361822176,"value":{"applicationCacheEnabled":true,"rotatable":false,"handlesAlerts":true,"databaseEnabled":true,"version":"47.0.1","platform":"LINUX","nativeEvents":false,"acceptSslCerts":true,"webdriver.remote.sessionid":"085a40cc-896e-425d-b0d8-be1dbeb7ff88","webStorageEnabled":true,"locationContextEnabled":true,"browserName":"firefox","takesScreenshot":true,"javascriptEnabled":true,"cssSelectorsEnabled":true},"class":"org.openqa.selenium.remote.Response","status":0}

Compare this to the previous example using W3C protocol and you will notice how they differ. In older protocol request JSON has desiredCapabilities key whereas a newer one is using just capabilities. In newer protocol all contents of the response are wrapped with value key which differs from what we are seeing in JSONWire protocol.

Selenium 4 vs Selenium 3

I hope you now understood the differences between two Selenium protocol versions. Knowing this let’s now compare how Selenium 4 differs from Selenium 3. Adopting a new Selenium protocol version means that every browser maintainer like Google, Mozilla, Microsoft or Apple should support this protocol on the browser side. This adoption process can take months and even years so latest releases of Selenium 3 were supporting both older JSONWire protocol and newer W3C protocol. Now this adoption process seems to be complete and thus Selenium 4 compared to Selenium 3 has the following changes:

  1. Selenium 4 is using W3C WebDriver protocol only. This is mainly a constatation of the fact that all browser vendors now fully support this protocol version. From the user point of view there is no change. This should be only considered by developers of Selenium infrastructure solutions and programming libraries using Selenium.
  2. More features in Selenium client-side libraries. For example official libraries now have support for relative locators which seem to be a syntactic sugar on top of W3C WebDriver protocol. Also an experimental support of Chrome Developer Tools protocol was added. We were speaking about this protocol in detail in one of our previous articles.
  3. Improved documentation. This is a must have for such mature project like Selenium. Lack of documentation was a technical debt of previous releases.
  4. Selenium 4 comes with a new version of Selenium Grid. We’ll describe how it differs from Selenium Grid v3 in the next section.

Looking at this list again the first three items don’t seem to be complicated to implement. Supporting a new JSON data format, adding syntactic sugar or generating a client-side library from ready-to-use specification are all medium-complexity tasks for a middle software engineer. We know that Selenium 4 is being developed since 2018. So what feature could take more than two years to implement? The biggest architectural change in Selenium is an updated Selenium Grid. Let’s understand how Selenium Grid 4 and Selenium Grid 3 differ.

Selenium Grid 4 vs Selenium Grid 3

Any test automation engineer who tried to deploy Selenium Grid at least once knows that current stable version (Selenium Grid 3) consists of two main parts: a hub and a node.

A hub is a service that handles test requests and automatically forwards them to a node where a browser is actually running. We can say that a hub is a kind of load balancer whereas a node is a kind of worker. Our first attempts to deliver highly available Selenium infrastructure date by 2012. Initially we were playing with Selenium Grid and quickly understood that Selenium Grid does not fit well for this task:

  1. Selenium Hub is too hungry. Usually a load balancer is a lightweight component not consuming a lot of computing resources. Java-based Selenium Hub was consuming 2Gb of memory for a dozen of browsers running in parallel. We now know that Selenium protocol is using JSON over HTTP and an average traffic for one browser session is something like from hundreds of bytes to a dozen of kilobytes per second. That was a mystery for us how such small amount of traffic could consume so much memory. (A deep dive into Selenium Hub source code showed that because of inefficient implementation the same request data was copied several times in memory.)
  2. Selenium Hub is too slow. We were shocked how slow Selenium hub was. Later we understood the reason. Selenium Grid is using two-way (duplex) communication between hub and nodes. Nodes should connect (register) to hub during startup. After that hub starts to send frequent ping requests to nodes to make sure they are alive. Such approach clutters network communication channel and slows down things.
  3. Selenium Hub is a single point of failure. Selenium Grid 3 architecture implies that only one Selenium Hub can run simultaneously. All nodes and all tests connect to the same hub. Sooner or later with the growing number of tests being executed your hubs runs out of computing resources and browser automation conveyor completely stops. This also prevents users from deploying fault-tolerant Selenium Grid installations in multiple datacenters. Even if you are able to distribute Selenium nodes across datacenters, Selenium hub will only run in one of them.
  4. A list of running browsers is stored in Selenium Hub memory. Having a hub as a single point of failure could be not so harmful if a list of running browsers would be stored in external storage. In that case in theory we could resume running browser sessions in case of hub crash or freeze. Unfortunately Selenium Hub is storing the list of browser sessions in memory and in case of crash you have to restart all affected tests from scratch.
  5. Impossible to reconfigure Selenium Grid without restart. In a big cluster all configuration should be applied without interrupting running processes. In case of Selenium Grid any change required restarting services and thus breaking currently running tests.

Having all these issues and needing a reliable Selenium cluster in 2015–2016 we created and open-sourced several solutions (GridRouter published in 2015 and Ggr published in 2016) for creating such cluster. The main ideas behind these solutions were:

  1. Load balancer instead of hub. Instead of two-way communication we switched to one-way communication from the load balancer to the machine with the browser.
  2. Stateless architecture. Browsers list was not stored at all: neither in memory, nor in database or key-value storage. This allowed to run as many browsers as needed.
  3. Multiple instances. Because of stateless approach we were able to deploy as many copies of the load balancer as needed.
  4. Hot configuration reload. Modern Unix-based operating systems provide signals as one of ways to tell running application to do something. We used this feature to automatically reload and apply load balancer configuration without stopping it.

More on this can be found in one of our first articles (one, two) published 4 years ago (!). A new Selenium Grid 4 architecture is said to be designed to overcome all the issues listed above. Knowing that an efficient solution of existing Selenium Grid issues was open-sourced 5 years ago, let’s now take a look at how Selenium Grid 4 differs from Selenium Grid 3.

This is Selenium Grid 4 architecture diagram from its official documentation. You can see that instead of hub and nodes that existed in Selenium Grid 3 in Selenium Grid 4 there are up to 6 independent components: a Router, a Session Queue, a Distributor, a Session Map, an Event Bus and one or several Nodes. The purpose of these components is as follows:

  1. Nodes still have the same goal — they launch available browsers and execute test commands. This is where your test is actually being executed. A new feature of the Node is that they can be short-living (one node for one browser session) and run in Kubernetes cluster (so-called “one shot nodes”).
  2. A Session Map is responsible for storing a list of running browser sessions. By default this component is still storing this list in memory. For distributed cluster this component can use reliable storages like Redis or a relational database (MySQL, PostgreSQL and so on).
  3. A Session Queue is storing a list of new browser sessions requests that are waiting for free slot. This list is always stored in memory.
  4. A Distributor takes requests from the Session Queue and distributes them across all available nodes. This component is stateless and relies on the stability of the Session Map.
  5. An Event Bus is used for intercommunication between Session Map, Distributor and Nodes. By default events are also being stored in memory. For distributed cluster this component is expected to work with a reliable messaging server called ZeroMQ.
  6. Finally a Router is an entry point to the cluster similar to Nginx in reverse proxy mode. Its main purpose is to proxy new browser requests to the Session Queue and Distributor, existing browser requests (e.g. click on an element) to respective Node and some service requests like managing active Nodes and session queue state to the Distributor and Session Queue respectively.

Knowing how Selenium Grid 4 architecture looks like let’s whether it is now suitable for reliable Selenium clusters:

  1. Selenium Grid 4 is still hungry. Yes, instead of clumsy and buggy Jetty web server Selenium Grid is using Netty which should be slightly faster. But it is still plain old Java requiring a complete operating system thread for every HTTP request and still requiring at least 1 CPU and 1 GB RAM for each process. If we need a distributed cluster having at least two copies of each component installed in separate datacenters, such cluster would consume a dozen of CPUs just in idle mode. In case of Kubernetes short-living nodes every browser session will additionally require up to 1 CPU for the node process. So hungry literally means expensive.
  2. Selenium Grid 4 requires more experience than you could imagine. In fact in distributed setup in addition to 6 components you will need to maintain a database and a messaging server. That means you should have experience in configuring database replication and ways of delivering fault-tolerance for messaging server. These domains are one of the most complicated and require deep understanding of how they work. This is usually not what a test automation or devops engineer wants to do. Databases and messaging servers also frequently exchange with service information and replication traffic. In case of running in multiple datacenters you will have to additionally pay for this traffic.
  3. Selenium Grid 4 is too complicated. In 2016 we already had a running Selenium cluster with 5000+ browsers running in parallel and using only two reliable components for that: several instances of Ggr load balancer and several hundreds of instances of the Selenoid server. This installation required no storage at all and was using simple text configuration files.
  4. Selenium Grid 4 is still not fault-tolerant. Currently session queue is being stored completely in memory thus making impossible deploying two or more copies of this queue. While all other parts including databases and messaging servers are reliable, this component is a domino that can lead to crash of the entire cluster.
  5. Tightly coupled architecture. Although it is still impossible to reconfigure Selenium Grid without restart, the main problem of the new architecture is that it is tightly coupled. There are too many relations between components and any change in configuration of one component can potentially require restarting other ones.

So it looks like Selenium Grid developers created a clutch of microservices because they wanted to play the microservices game which is à la mode nowadays. Instead of simplifying things and working on speed and stability, they progressively added features like distributed tracing and GraphQL that are only needed because of overcomplicated architecture and give no value in browser automation domain. What you get is a solution that could be more complicated to deploy than your tested environment. Our cluster with 5000+ browsers running in parallel added tracing just by uploading text log files to Elastic Search and sending requests with Kibana. This required no single line of code and worked out of the box.

Moon vs Selenium Grid 4

Returning to the beginning of this article let’s now compare Selenium Grid 4 with our flagship solution called Moon.

  • Simplicity. First of all take a look at Moon architecture:

Instead of 6 independent components we have one that does not require anything else. You simply start it and it works.

  • No external components required. You don’t need to start neither a database nor a messaging server. Moon is completely stateless. That means when you run several copies of Moon in different datacenters they do not share information about running browser sessions. Such approach is reliable and allows to run as many copies of Moon behind load balancer as needed. You start and stop copies during test execution and no single test will be interrupted.
  • Automated Scaling. Moon is running in Kubernetes cluster only and launches a clean browser in container for every browser request from your tests. Every such browser is super-fast and compared to Selenium Grid node does not require any registration procedure to start working. So it will continue to work even in case of network connectivity issues in Kubernetes. The most attractive feature of Moon is automated scaling. Browsers are usually consuming a lot of computing resources. Kubernetes cluster with Moon automatically shrinks and grows depending on the number of running tests. This allows to pay only for computing resources you are really using.
  • Lightweight and not resources consuming. In idle mode Moon needs maximum 1 CPU per replica. So for a fault-tolerant cluster you need 2 CPUs. Compare this to 12 CPUs of Selenium Grid 4.
  • A collection of images for all recent browser versions. Selenium Grid is distributed in Docker images based on the version of Selenium having some recent Chrome or Firefox version packed inside. But your real life scenarios you often need to test in Firefox X and Chrome Y. Moon comes with a big collection of ready-to-use images including Firefox, Chrome, Opera, Microsoft Edge, Safari and Android. Once a new browser version is released we immediately build a new image ourselves.
  • Powerful user interface. Selenium Grid 4 has still very poor user interface only showing basic information about running browser sessions.

Moon comes with powerful user interface allowing to interact with browsers:

More than than you can easily create sessions manually to quickly do exploratory testing in desired browser version.

  • Reconfiguration with no downtime. Contrarily to Selenium Grid any manipulations with Moon: adding new browser versions, users and even updating Moon version can be done with no downtime. Your tests continue to work as usually.
  • Playwright and Puppeteer support. Moon is the only solution in the world supporting parallel execution of Playwright and Puppeteer tests out of the box. We described this in our previous articles (one, two).
  • First-class mobile emulation support. Instead of spending a lot of money for real phones or Android emulators you can easily catch up to 80% of bugs with first-class mobile emulation recently added to Moon. More on this is described here.
  • Video recording with sound. Moon seems to be the only solution allowing to record any sounds being generated by your web application. This allows not only see you application but hear how it sounds. An example of recorded video can be found here.
  • Ability to upload everything to S3 bucket. Test execution reports usually require attaching logs and videos of browser sessions. Moon automatically uploading such logs and videos to S3 compatible storage. Such type of storage is now supported by all popular cloud platforms and can also be organized in your corporate network too.
  • Enterprise-level authentication support. Moon has enterprise level security features. Authentication and authorization can vary from plain text lists of users to modern approaches like OpenID Connect integrated with your corporate LDAP \ ActiveDirectory.

Conclusion

In this article we talked about Selenium Grid 4 and how it differs from previous releases. Hope you got sufficient information to make a decision whether Selenium Grid is suitable for you. Just in case if you would like to use Moon and don’t have sufficient devops experience we have a solution called Moon Cloud. In case of Moon Cloud we deploy and maintain a dedicated Moon cluster for you and you only run the tests.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store