Integrating Automated Software Testing into CI Environments at BitSight

In this talk, Alex Marchini - Senior Test Engineer at BitSight - will explain how their team has integrated automated tests into their continuous integration environment. He will discuss the team's philosophy for setting up test environments, as well as how they create mabl tests that run efficiently in a testing pipeline.

Transcription

Alex Marchini

I'm Alex, thanks for coming to my session. I'm a Senior Test Engineer at BitSight and I’ve been there for about two and a half years - we are headquartered in Boston. If you haven't heard of BitSight, we are the market leader in cybersecurity ratings, which you can think of as a credit rating for a company's cybersecurity posture.

So today, I'm going to talk about how we integrated automated software testing into our CI environments and our CI pipeline. I'll be focusing mostly on how we've integrated the mabl tests in the presentation today. So a quick agenda. First, I'll start with some team background and give you some context in our decision-making and philosophies. Then into the meat of the pipeline itself with setting up a test environment and test data, how to kick off the mabl test, how to report results, how we make the mabl test sufficient for this type of pipeline, and then zoom out to a higher level overall developer experience when it comes to testing.

So first up some team background. So here BitSight, we don't have a dedicated QA team, we don't have a quality engineering department or any dedicated test writers or test runners, all the developers are responsible for doing all their own testing on their code and their new features. So it's my team's responsibility as a core infrastructure team to build the tools and the infrastructure that the developers will use to get all the testing done that they need. So basically, the internal developers at our company are my customers or my team's customers. So it's imperative that the software testing tools we create for them are efficient, easy to iterate with and the results are clear. So that means creating new testers, easy testing environments are configurable and easy to use, et cetera.

So the driving force for all our decisions, basically, in how we build these is the developer experience, the developer experience to us is just as important as the actual quality of the tests themselves. If the developers are not engaged with the test, if they see them as a hindrance, rather than an asset, they'll become disengaged from testing and the quality of your product is going to suffer as a result. So that's kind of a theme, I'm going to keep up as I explain each part of our pipeline. But it is the driving force behind most of how we build our integration testing tools.

So to get into the meat of the test pipeline a little bit, first, I want to explain the two types of environments we use to support testing and a pipeline like this. So first, a static environment. So you probably have something like this, it's probably your dev or your QA environment, or maybe a staging environment. It's just an environment that's always available, it's always up. It's continually updated or deployed to buy multiple teams or every team in the company maybe. For us, we run a small set of acceptance tests against this environment during our prod deployment. So every time we deploy to prod, we also deploy to this static test environment or a dev environment. Then there should basically be a small acceptance test to make sure the environment isn't down or something, but it's not a real functional test.

Then we also have these ephemeral environments. So these ephemeral environments are only brought up while they're needed, on-demand by one developer or a small team of developers, and then they're destroyed automatically when they're not needed anymore. These are the environments we actually use to run tests before we merge any code. So with that in mind, first is a visualization of our static environment. So we use Kubernetes. If you haven't seen Kubernetes before, basically, you have a cluster with a bunch of namespaces in it, just to give context to a group of pods. So that group of pods would be our development environment or a group of containers. So in our example environment, here, we have the main API, we have front-end assets, and we have a bunch of microservices. This has given a static URL.

So in our example, we're going to use dev.environment.com. So if you're behind our VPN, or on our firewall, you can go to Chrome type in dev.environment.com and you'll hit this static dev environment. So this is running 24/7. It's there for a lot of reasons, but we use it. One of the reasons is to support the testing pipelines that we have. So I mentioned that we have to be behind a VPN or a firewall to actually access this environment. So how do mabl test do it, because they're obviously external? So I'm calling this stage zero in our pipeline because we have this running 24/7 before any tests are kicked off. We have our static dev environment. We also have this tool called mabl link. So the mabl link is a secure linking tool provided by mabl that allows network traffic to tunnel from the external mabl cloud into your internal environments. So I'm not going to go over how to deploy the mabl link. It's really simple and mabl has tons of docs for like a generic Docker deploy, obviously, for Kubernetes for easy and also like for Windows. Before you can run any tests against internal environments, you need this mabl link deployed somewhere to tunnel all the network traffic. So again, stage zero, this is what we have running 24/7 in our Kubernetes cluster, our static test environment, and our mabl link ready to go.

So the next thing would be a developer actually requests that some tests get run. So we're calling that stage one in our pipeline. So we have two ways that developers can kick off tests, they can either run a command on their local machine with a CLI tool, or they can just go to our Jenkins server and manually kick off this pipeline. Either way, it's going to be the pipeline. But the first thing that happens when they kick it off is that an ephemeral test environment gets created. So that pipeline takes a few parameters. But the most important ones are that the developer selects which services and which branches are deployed to their ephemeral test environment. So this keeps the deployment efficient. So the least efficient way to do this obviously would be for every test one run, we deploy a complete environment with its own main API, its own front end assets, and every single microservice that makes our application function. But yeah, again, that's extremely inefficient and if the developer hasn't made changes to most of those services, what's the point of deploying them to this test environment? But at the same time, the environment you want to test you want to actually function, you don't want a bunch of broken pages on your application, because you haven't deployed half your microservices.

So what we've done is we've set up our test environments so that if any service that isn't deployed to that environment will fall back to communicate with the static dev environment. So in our example, here, and our femoral test environment, which by the way, also gets its own unique URL. So in our example, here, we have test 1234.environment.com. So if you type that into a browser, on our VPN, you'd hit this ephemeral test environment. If you come to a page on that test environment that says it has a chart on that page, and the data that it gets for that chart, is calculated by microservice B, our application knows that it doesn't have microservice B available in its own namespace, it will automatically fall back to the host at microservice B in our static dev environment. So that's how our ephemeral environments, we keep them small and as light as possible, and only kind of testing what developers want to test, while also maintaining full functionality of our app, by falling back to this dev environment where we'll have the latest code that it should also be on production.

Now we have our test environment ready. But there's one issue with it still, it's been deployed, but it has a kind of a generic minified version of our database. Depending on the test that might get run, you don't know what kind of data you need to have. So you're probably not going to have all the right data in the database you've deployed this new environment with. So the next thing we have to do is actually set up some test data that we know we're going to need for the use cases that we're going to be testing. So for us, we came up with an internal data seeding app to do this.

It took some time to develop this internal app. But basically, it's just a Node.js module that will call the real API. So the API that's been deployed to our test environment, to actually set up all the data and it sets up the data in a real way, the real way that it would be set up on prod by hitting real endpoints that exist on prod as well. This may seem like a big extra project that you have to do to create this pipeline, but it's well worth it. It's paid dividends because in my experience, and definitely, at BitSight, we have a lot of different customer types. They have different billing codes, they have tons of different feature flags that they could have turned on different preferences. Each use case that you test may need a different set of those preferences.

So for example, let's say your application has an enterprise-type customer and a personal-type customer. If a developer goes to create a new test, and they need one of those types of customers, you don't want them to need to know all the background that goes into creating those customers in a real way. It's much easier to just have it all contained in an app that can be updated and maintained separately by a testing team or QA department and always ready to go not just for mabl tests or end to end tests, but for any application that might need some quick data created on any environment on an ephemeral test environment or even if you wanted to create something quick on your staging or dev environment.

At this stage, we run a script that executes our data seeding app that creates a bunch of test users and test customers in our ephemeral l test environment. Just as an example, this is kind of what it looks like we pull in a Node.js package that we've developed and it has some functions that are simple for developers creating new tests to use, like create enterprise tester, you give it an email. So each one of these will create a separate customer and now we have usernames that we know we can use to log in, during our mabl test to actually execute the use cases we want.

Now at this point, we have our ephemeral environment ready to go, we've seeded a bunch of test data in it. So now we just need to kick off the test. So kicking off mabl tests, mabl provides a bunch of different ways to actually kick off the test remotely. They have a command line interface, API, and a bunch of various CI plugins for Jenkins, et cetera. Just for us, the way we orchestrate our CI environments, the easiest way for us is just to use the API. Obviously, if you're building something like this, you need to kick off the mabl test, just choose the tool that's easiest for you. But whatever tool you use, you're pretty much going to be passing in the same parameters in the same data. So going along with my example using the API, we pass in the environment and application ID. So these are the IDs set and mabl just to tell mabl which environment and application you're actually executing the test against.

So those are just static strings, you're going to pull out of mabl. Then the plan labels are the actual test plans in mabl, again, that you want to execute. So for our example, we're just going to be executing our end-to-end test. Then you can override some variables of that test plan. So first, we only run our end-to-end test on Chrome. I'll talk more about cross-browser testing a little bit later. For this, we just run Chrome. Then this URL parameter is kind of the most important one. So the test plan, maybe by default, just executes is set to execute against your static dev or test environment. But obviously, we want the test to execute against the ephemeral environment that has the actual new code that the developer wants to test. So we are just passing the URL of the test environment that we just created. So for our example, it was to test 1234.environment.com, then choose the test that actually executes against the correct application. Then we have our link agent label. So this is just a static string that you've given your link agent when you deployed it.

Then, a source control tag. So this is an important feature of mabl, that's really important for developer experience. Actually, if you're not aware, mabl has its own branching feature. So within mabl itself, you can create a branch off of master and then make any updates to tests or create new tests. And it won't affect what all your other developers are running by default at the main branch. So this allows a developer so if they have a new feature they're building, they need to build a new mabl test for it or say they're changing a used case or editing a feature. It changes one of the tests on mabl, they can develop both at the same time on their feature branch and the mabl branch. Then when their feature goes to main or it goes to production, or wherever, they can also merge their mabl branch and that makes it really simple to keep your mabl tests and your production or your source code and sync with each other.

Then the last part is, of course, just calling the API. So we call this an event slash deployment endpoint, we give it our mabl API key, and we pass the data we just defined, and that kicks off the tests. So this is basically where we're at now in the pipeline, we have mabl cloud with a bunch of Chrome browsers running our end-to-end test. Those all communicate with the mabl link, which has access to our internal environments that the browsers can actually communicate with our internal environment and, and display our application.

The next step is polling for test results. So if you're using the command line interface or a CI plugin, I think you can skip this step, I think those will just automatically wait for your test to complete and give you the results back. But for the API, basically, you call one endpoint to kick them off, and then you have to call on a separate endpoint to get the status of the test that you kicked off. So we can just pull every 10 seconds to see the status of the test we just kicked off.

At the bottom, here is an example of the state that gets returned every time we pull it. So we have the state of the test. Some of them are running, some are completed once failed. We basically just wait for all of those to no longer run. Can we know that the tests we kicked off have been completed in the last step actually reporting the result to the developer? So this closes the loop for the developer that went and kicked off tests. This closes this report back to them with the status of the tests they built it so the first thing we do is update a build status on the developer's branch with just a pass-fail. So basically, they just get a green check or a red X on their branch status. For that build, if it passed or failed, and they're actually not allowed to merge anything if it has any failures.

Then if the developer has a merge request or a pull request open, we also use a tool called Jinkies, which is just a really simple Jenkins plugin tool that allows you to just call the API, so we just call the get lab API, to post a comment on the merge request, with a link back to the Jenkins job, and then just a quick status on how many tests passed and how many failed. Actually, if any mabl test failed, we also post a link back to the mabl failure. So the developer has a link right there, if anything, failed to go click and go straight to the table and see what failed. So that's pretty much the end of the pipeline.

To recap, we have the dev environment and the mabl link running at all times. Whenever a developer requests the run test to run, the first thing that happens is our ephemeral test environment gets created with just the services and the code that the developer changed or actually wants to test. Then we set up the test data in the ephemeral environment that the mabl tests need to actually execute. We kick off the tests, we pull for the results, and then we report the results to the developer. So that whole process takes about 30 minutes.

But we know that end-to-end tests are the slowest and the most expensive test executed. We want this to be as efficient as possible because a lot of times the developer is going to be kicking off these tests and just sitting there basically waiting to get the results to know if their code is good to merge or not. So we need to keep that in mind when we're building the mabl tests themselves.

Next, I want to talk about how we make the mabl test efficient to run in a pipeline like this. So basically, for the end-to-end test, we organize the test based on business focus. So for this, if you're coming from a Selenium or something and transitioning to mabl, you definitely want to drop any page objects mentality when it comes to your, your basic end to end tests, you need to focus strictly on business use cases for these tests and then organize your tests according to that.

If we go back to my example of having personal and enterprise users, you can think of those as two different lines of business. Those are two different types of use cases. So we've separated into four tests, which cover two enterprise use cases and two personal customer type use cases. Then we run them all in parallel in one stage. So this is what the test plan looks like. Basically, it's a bunch of tests executed in one stage all in parallel. So obviously, running them in parallel is faster. But it also makes ownership of each individual test much more clear. So again, interpreting results is really important for a developer's experience, when they see a test fail, if they see, for example, an enterprise test fail, it's very easy to understand which team owns that test, because it's tied directly to a line of business. If these were all just kind of jumbled into one test and run at once, in sequence, obviously, it would not only be slower, but if a step failed, it would be at least one or two extra steps for a developer to dig into it, figure out which test failed, which part of the application, it's executing. Which team owns that part of the application wherever this part is pretty obvious just right off the bat.

Then I mentioned before that we also don't mix any cross-browser testing into our end-to-end test. It's definitely tempting, especially with a tool like mabl where you can just say, hey, run this on every browser. But you want to keep your tests as focused and as pinpointed as possible. So these are specifically tested to validate business use cases that have been defined. So we split our cross-browser testing into a completely separate test plan with separate tests. So with these, we still adopt that page mentality. These are really just pages where we either have some legacy code where we know or we're worried about them being flaky or failing on newer versions of browsers, or we've had regressions in the past on certain pages where they fail on one browser, but not another. So these tests are just really simple. Just going to different pages in our application, making sure that they work in every browser, but are not tied to any specific use case, we still split them by line of business because in general certain lines of business may own certain pages. That's not always completely the case. But we still try to split them up and organize them by a line of business as much as possible. So it's easy to assign ownership. So that's pretty much it.

Next, I want to zoom out a little bit and go to a higher level of what our developer experience looks like. We've accomplished pipelines like this. So basically at the start, a developer requests all the tests are run. Then every testing suite that we have is kicked off in parallel. So the two on the right are the ones I've talked about today, the mabl and cross-browser run tests. We also have other basic tests like static analysis, security, and our basic unit and integration tests. But all of this is done in a 30-40 minute runtime. Like I mentioned before, end and end tests can easily get out of hand and run for a really long time. To deploy them can be really long if you're trying to deploy a full environment, since it is an end-to-end test, you want everything to run efficiently. So it's as close to the real use case as possible.

But if you're setting your tests up like this, you can't have end-to-end tests be the weak link in this chain. When developers kick off tests, they're going to be waiting on each one of these to complete before they have full confidence that their pull request or their feature in their new code is validated. So they do have to weigh on all of them to complete before they can merge anything. If they're just waiting on end, and end tests, while everything else is completed, that feels like kind of a waste of time for a developer. So once all these are complete, they're all reported to the developers to merge requests for their branch, just like I showed in a separate comment, and then if they all passed, then their new code is validated, and they can merge their branch. If any one of these fails, they can kick each one of these off individually then have to start the whole pipeline over. So that's pretty much our developer experience for our testing pipeline.

So lastly, I just want to talk about some key takeaways, for just our philosophies on how we build this. Now, I think quality engineering departments should build their testing tools. So I want to go back to that old quality assurance trope of higher testability is going to equate to higher quality. So every single step of this just having this pipeline in general, having environments that are fast to bring up, being able to create any test data you need, quickly and easily without knowing all the underlying systems, and then also just understanding results of tests that you know what's broken, is all part of increasing testability. We know that increasing testability is going to lead to higher quality.

That also ties in of course with developer experience. So the developer, do they get their test results fast, are test results flaky, can they have confidence in them? Are the results easy to interpret? These are all part of developers' experience when it comes to testing. If they have a bad experience with testing, they're going to be disengaged, they're going to see the test as a gatekeeper hindrance. Then get new features and new code out, instead of seeing them as an asset, which it really should be. So you build your testing tools to be an asset, build this pipeline to be an asset and not a big slow flaky hindrance.

Then when you build tools, also, don't build them and forget them. Embrace your inner Steve Ballmer, as I like to say, always think about your developers, always engage with them, if you need to do blind, anonymous surveys, or polls to discover what their pain points are with your testing experience, and address those pain points directly. So that's pretty much it. So I'll open it for questions right now. But I also want to plug some positions in engineering here at BitSight. So my team specifically, we're hiring a performance engineer. So we're growing like crazy, we just closed 250 million in funding from Moody's, we have a ton of new customers, we have bigger customers coming on, and if you think you're up to the challenge of helping us scale to keep our application performance with more and bigger customers, feel free to reach out to me on LinkedIn or check out bitsight.com/careers. We also have tons of full-stack, front and back-end security engineers with data science positions. So I encourage you if you're interested in the company at all, check out the website. I'll wait for any questions.

Gabe Alvarez-Millard

Thank you, Alex, I loved how you flip the QA script here a little to talk about developer experience. The very powerful insight that we don't hear as often. So what questions does everyone have? Please remember to submit those in the Q&A panel on the right side of your screen. We have a few questions already lined up. So let's get started. So Alex, how long does the testing cycle take, including spinning up the environment that you outlined?

Alex Marchini

So the whole thing is about 30-40 minutes. It can depend a little bit on what the developers deploying how much stuff they need to deploy, but a general timeline is 30-40 minutes, and we kind of keep strictly to that. We run tests every so often to keep track of our test velocity and if it starts kind of creeping up, we start taking on tasks that try to either build more concurrency into our pipelines or just reduce the overall testing time.

Gabe Alvarez-Millard

Another viewer asks, curious, how many tests do you have in your end-to-end tests?

Alex Marchini

So we have six tests and then we also split, like the use cases up into flow in our mabl tests. So I think the number of actual use cases we're testing is 65-ish. In our mabl test, and it's a bunch less, I think it's like 20-30 in our cross problem, it will test but for end to end, we're testing about 65 used cases.

Gabe Alvarez-Millard

Great. Thank you. Another question was: how do you onboard new team members when it comes to mabl?

Alex Marchini

Honestly, we wait for them to have to interact with mabl. One of the reasons we chose mabl over at first was we had our built-in end-to-end test framework or homegrown end-to-end test framework with just puppeteer. But with mabl it is much easier to interpret results they provide the side by side screenshots of the last time the test passed. So the only time we really need to onboard someone from mabl is if they're creating a new test, or they actually need to update a test, we don't really need to onboard people for just interpreting mabl results because mabl makes it pretty easy. So we just play it by ear when someone needs to interact with mabl, we help them out.

Gabe Alvarez-Millard

Awesome. Thank you. Another question is: how long did it take your team to get mabl integrated into your pipeline?

Alex Marchini

So we had the environment set up and everything was already good to go. I think the thing that took the longest when we switched from our homegrown and then framework from mabl was rewriting the tests because like I mentioned, we dropped that page object mentality we had before. So again, it was a process of engaging with developers and the different lines of business, to understand the use cases that they felt were most important to their line of business and that were most important that they felt they needed to validate to gain confidence that they're not breaking important features. So that was the longest part which involved engaging with them to figure out what they think they use most important use cases to test for and then building those in mabl. Other than that, just plugging it into our already existing pipeline was very easy because we can just call the API to kick off tests and wait for results. That's a really small part of it.

Integrating Automated Software Testing into CI Environments at BitSight

Transcription

Alex Marchini

Gabe Alvarez-Millard

Alex Marchini

Gabe Alvarez-Millard

Alex Marchini

Gabe Alvarez-Millard

Alex Marchini

Gabe Alvarez-Millard

Alex Marchini

Quality Engineering Resources

The mabl MCP Server: The Foundation for a New Dev Workflow

Automation Testing Benefits That Improve Engineering Efficiency

How mabl Completes Your AI Tech Stack for Automated Testing