At Chewy, the most trusted and convenient online destination for pet parents, our growth over the last few years left us with a test process that didn’t scale. We relied on manual testing and a set of tools that no longer met our needs. Learn about how Chewy changed its practices and brought in mabl to meet its evolving needs.

Transcription

Russell

Hi everyone. Hopefully, most of you know about Chewy, but if not, our mission is to be the most trusted and convenient online destination for pet parents and partners everywhere. You're probably familiar with our blue boxes that you see quite often or you are a parent and a customer and if so I thank you for it. You might not know about all our different offerings. So just before we get into it, I'll use this time to give a little background on us. We are an online e-commerce solution for pet parents. It's not just dogs and cats, we do have selections for other pets like birds, and lizards, and animals like that.

Service is an area that we put a lot of effort into and it's why we are the leader. We have gotten into our pet health experiences. So we have a ‘connect with the vet’ service - we have pharmacy compounding. We just recently launched an adoption finder and we also offer veterinarian-based information through petmd.com. 

Then just quickly about me, one first, I'm a pet parent myself. If you ever have pets, or you ever have children, and they're trying to push you and convince you to get a pet taking a job at Chewy, seems to give them the impetus that they think there's some divine intervention happening. But that is my puppy Abby, who will be joining us on our journey today. I'm a Director of Engineering here at Chewy and my organization represents the content platform, which is our content management system or digital asset manager. I also handle - once we serve up that content, the search engine optimization and search engine marketing. So basically, any content interaction with the search engine is where we optimize and then my org runs two content sites that Chewy runs. One is BeChewy, which is a digital lifestyle magazine for pet parents and the other one is PetMD, which we just talked about. Prior to Chewy, I worked for Amazon Alexa, where I ran the data lake associated with the Alexa utterances and processing of them. I've worked at EZCater, which is basically an online e-commerce site for business users. It's kind of like a commercial version of DoorDash or GrubHub and then I was an ad tech at Jumptap/Millennial Media earlier in my career.

I need to use this plug. We are hiring my team and also across the spectrum. So we have 1450 openings today. Specifically, we're looking for engineers, managers, data scientists, and product managers. Our four key locations for the technical side of our world are near Beach, Florida, Boston, Minneapolis, and Seattle. Our career site is available for you, if you're interested in learning more just reach out to me.

So getting all that context out of the way, let's talk about the challenge that we faced when I joined Chewy. I had four teams rolled up into me, I had one quality engineer, she basically broke up her time across those four teams, manually testing functionality across three different sites, and as the functionality completed, she basically would go in and validate it, and then move on to the next piece. She did an amazing job doing this. But talking to her she was like, this is a really challenging position. We were mobile-first. We support five different browsers on both desktop and mobile and PC versus desktop. 

So you can see the permutation starts getting quite high and having one quality engineer to three different sites became very challenging. The cadence for chewy.com at the time was weekly, and PetMD and BeChewy were daily. So this created quite a challenge. It looked to me that we were not going to be able to expand out our testing without significantly hiring more quality engineers and we basically, were doing all manual testing in my domain. So this seemed to be a problem. My dog Abby here, she's kind of looking at this a little funny as we were hiking and she was about to cross some water. She's like, "Are you sure you want me to do this?" So it was definitely a challenge that we could have done a better way.

So that was a challenge that we faced, but we also unfortunately we weren't staying static, we were moving forward. So, like most startups that have become successful, we started off with a monolith, we are moving to single-page applications across the chewy.com ecosystem. So the code that we would have to test specifically on the SEO side, would multiply as we operate in a somewhat in a way team fashion across different code bases across the chewy.com website, for example, the product listing page and a product detail page are going to be two separate code bases and some of the functionality is core, some of it is unique to each of those pages. So this is Abby with a friend, just trying to show that there are multiple dogs and we have increased challenges when you have multiple codebases. That weekly release was almost all manual. There was some automation that was going on there and there was a lot of sign-off required to get that weekly release off.

So this is one of the wiki pages that we create weekly, or did create weekly showing all the people in all the domains and all the areas and all the browsers that need to be signed off for us to release to chewy.com. As you can see my dog here, Abby, she's just exhausted just thinking about trying to wrangle all those people and we rotate who the release manager is, and the release manager there isn't really focused on quality, they're focused on wrangling 25 Different people to sign up. This was a nightmare and I looked at this, the team looked at this, and we're like, we need to do better. This is just not something that is scalable for us and as we continue to write functionality, we were going to be testing a narrower and narrower percentage of our codebase. So you can see why Abby's exhausted, I'm exhausted just looking at the sheet. It's been a while since I've actually been a part of this release cycle. This was not something that we could sustain. So we need to do something better. I challenged my team to do so.

So the first thing we did was like, what were the tools we were using internally. We were doing manual testing for the most part in our team. But we heard anecdotal stories of what was going on elsewhere within the company. So we wanted to see what toys we had. Here's Abby, playing with a bunch of toys that she has. Notice we've got a couple of balls here, I'll come back to that in a little bit. But we found five different tools that were being used and poked around. One was BrowserStack. TestProject seemed to be the one that had the most usage within it. Some people were testing APIs with RestAssured and there are some people who are dabbling with Cypress and TestCafe. In talking to those different teams, what we found was, they love the open source aspect of it, who can complain about free and having a community working against it. But then we were starting to have issues, we were trying to do mobile web testing, for example, and all we wanted to do was be able to integrate it into our build, and be able to adjust the viewport so that we could test different browsers and test them different sizes and see how responsive worked. 

We had trouble with that. We also had trouble trying to do all that plus use VPN access. Most of what we were getting back from TestProject was going to use emulators or layer BrowserStack on top of it. So we tried this, you know, we were having trouble. We try to reach out to both of those products. They were pointing at each other, they were pointing us to forums. It was email support, it was just really, really a struggle for us and in the end, we gave up. We couldn't find a solution that worked for us. We also found that the mobile web browsers that were supported were lacking across many of the products. We also found analytics was lacking, that we just didn't have an ability to see how well the end-to-end tests would work, repeat other runs over time and integrate it into the tools that we were working on. So we really struggled with that, like, hey, there's got to be a solution here. This does not seem like we are inventing a new problem.

So we decided to run a bake-off. I had used Ghost Inspector in a previous life and really positive experience with it. We had heard about mabl. Actually, I knew about mabl, because when I was at ezCater, we moved out of office space and mabl moved in. So that got me up on my roadmap, just one of those random things. We looked at mabl's webpage, we saw that Lola was a customer which just recently got bought up by Capital One and I had some context over there. So we reached out to listen to hear about their experience with mabl and we had heard a lot of positive things and areas that we would need to work on as well. So we decided to run a bake-off. So here's Abby, she's all ready to get going and digging in. So we tested all four of these products, we created a matrix with dimensions on our criteria. Things like mobile web support, the ability to hit our lower environments through VPN, what analytics support integrations to our existing tooling’s of compliance, Jenkins and Slack. The level of support we can get and a couple of other criteria. But in general, these were the main criteria. We ran it, and mabl came ahead as a clear winner.

So once we chose mabl and we procure it, we now decided what we needed to do, like, how are we going to roll this out, how we're going to be successful. This is Abby, the first time she saw snow, she was a little apprehensive. But she really thought hastily was ice cream falling from the sky and it has since loved snow to this day. The approach we took was to roll it out to a single team, identify the base functionality that we wanted to test. The SEO team was a perfect one, as we had test cases already written associated with that weekly build and we basically built a test, a base test suite in both desktop and mobile, ran it for a while just to make sure that we were happy with how it was running and then we attempted to change the culture of the team to instead of relying on the quality engineer, instead of trying to get the team to write tests and own that instead of throwing it over the fence, trying to be more agile versus waterfall. So get them on board, get them bought into it and once we felt successful with one team, we moved on to the second and the third and we're towards the end of the third one and we'll move on to the fourth team relatively soon. So this has been successful for us. We shared our learnings throughout the organization and with leadership. And we also have an automated test scale and we share some of our learnings there as well. 

So some of the results that we found. We found some bugs that we had not known about, which is always good. Specifically, when you're dealing with SEO functionality, bugs can really have a detrimental effect, the amount of traffic getting sent our way. This is Abby, she found a tennis ball there. So she's pretty excited.

 This is Abby, she's pretty happy because we became much more efficient and we're covering more functionality. So the amount of testing required to do that weekly storefront release was about two to three engineering days. We got it down to about an hour per week to run that base core functionality. So basically, we've recouped about half a week of engineering time to work on other things, on just testing core functionality. So we've applied that to running more functionality. So we're covering more functionality on a weekly basis and now at this point, the quality engineer is actually working on other things in other programs that she was unable to do because we just did not have that effect. The one hour is because, as of last week, we were kicking this test off manually and running it. We've actually since integrated with Jenkins and now it's running automatically each week. So that's even better. So awesome stuff there and it's why I'm happy, Abby's happy, all good in that regard.

We changed the culture. This is Abby on her first vacation. It took a little while to understand that we were away from home. But she loved being by the beach. She did not like the sign that said stay off the beach and near the water. But we did change the culture. The teams, at least, are mine. They have become self-sufficient so they don't need a day-to-day quality engineer. They are reading tests. They understand the value of it and they're thinking more end to end in terms of their responsibility as engineers and not, "Hey, I take it until it's Code Complete and then I hand it up to somebody else." But they are owning up to being an engineer end-to-end which I think is positive across the board and some of them have really taken to the tool and are really looking for ways to integrate in other ways and talking to other teams about the value of it. So all positive in that regard.

So let's talk about the lessons learned here. I talked about a lot of positive things. But there are some things that we've taken away, where we've hit road bumps and require some thought in our part on how to do things differently. So here's Abby. This is the first day of school, my son is across the street there. She did not take well to not having people in the house every day, all day long that she did until over the last year. So she's adapted, we've had to adapt. So one of the things we've learned is, interest is high, adoption is slower. So like I talked about, we went to the automated software testing guild, we demonstrated what we've been doing and said, "Hey, this is great. You should all try it," and they're like, "That's cool. That's great. Yeah, I would like a demo," and mabl's been great about doing individualized demos, through our customer success representative. But teams are set in their ways. So we've really struggled with expanding this much more outside of my teams. There are a couple of teams that are starting to pick it up. But in general, it's been really hard. 

The other thing about it is, there's a short term versus long term, I got to get my release out, I don't have time for this, I don't want to fight to add it to the agenda. So that's been a struggle. We took a long-term view on it and one of our operating principles internally is to think big, and that is what we've done. But in some teams, it's a little bit harder than different deadlines and pressures and so that has been a struggle. Changing culture is hard, even just starting off with the quality engineering team, I think she looked at it was like, well, if we're going automation, what does this mean, for my role? One of my new responsibilities is not necessarily what I signed up for. So there's an emotional impact of changing culture. Some of the engineers were a little bit resistant to getting up and running on this. It's like testing, I've never had a test before. So thinking about how you roll things out emotionally is a really big challenge. People's roles do change and changing how things worked and the way things have always operated, takes some effort, and I would encourage you, and we constantly battle it, to think about how to do that.

So one of the lessons we received from some teams is that they really like open source, and they really like writing code. So going back to the picture before where my dog had four or five different balls that we have gotten from Chewy for her. She still loves tennis balls, they're free, they're always at the park and she just loves them. So some teams that we have worked with really love open source and love writing code instead of the recording aspect of it. We have been unable to change that so we have multiple solutions that are running around within Chewy. I'm not trying to impose our solution on it but I think there are some benefits that we have uncovered that some of the other tools that they're using lack in and it's an area that we constantly have conversations about. 

As I've mentioned, changing culture is hard. This is Abby, she's hiding under the blanket. I brought in a couple of different tools across the spectrum that I brought in, that I had success with in the past, and some have really taken off like wildfire and some have been a lot harder to get a stranglehold, mabl has been one of those. My teams have really been enthusiastic about it and really have taken to it and I really appreciate their open mind on it. But like I said, some of the other teams that we've faced, have really struggled with changing tools, they've kind of bounced around on it. They really haven't put priority into thinking about it. They're still many of those teams on that signup sheet that is either doing manual testing, or some combination where they keep bouncing around on different tools. We've demoed this to probably five different organizations and one or two of them have really picked up on it. We've given demos to leadership and leadership has been excited about it. But they also want us to continue to move to software, and in some cases, they've really pushed to move towards automated software testing. But when other priorities have come up, sometimes it has taken a backburner. So it's not an easy problem. I think that's probably the biggest area when you're trying to change your software development lifecycle and the tools that you use and people's roles. I mean, I can't emphasize this enough. But that's probably the biggest challenge that we face across the board in trying to change how we release. So I wish you all luck. I'm happy to answer some questions about our usage of mabl, our journey, and how we've done it. Any questions about Chewy or my dog, Abby, but I do appreciate you listening in and I'm happy to help in any way possible.

Leah Pemberton  

Thank you so much, Russell. That was great. We do have a couple of questions from the audience so far. So just to dive in, could you expand on your strategy for how you work to change the mentality of your team to incorporate testing a little bit more? If you could talk about how you saw the evolution of that idea and how you got by with engineers, that would be great.

Russell  

Sure. So for us, I think one, me being enthusiastic about it, in my org probably helped. I got to buy-in from my leaders, and they understood the benefit of it. So having leadership onboard really helped. There were a couple of engineers who basically became evangelists. So we gave them demos, we gave them access. We had them partner with the QE and together, they really started demonstrating the value and really jumping in and writing tests and showing the team and just showing leadership on the technical side. I think in general if you start small and grow from there and really work to get a couple of your senior or influential engineers onboard that really helps, as they lead by example.

Leah Pemberton  

Great, thank you. For our next question: could you describe what is your quality engineer's responsibility day today since you've started to implement mabl?

Russell  

Sure. So she's still working on the 14 to write the base set of tests. But she's gone from basically writing tests on an ad hoc basis, time slicing from teams to now running this test suite. She's going to morph into a role where I've been calling her a digital librarian, where she will own the platform, ensure the tests are well organized, that there's a set of components that are shared like we should not be rewriting or re-recording how to login to our website, for example, and she should be responsible for test data. She will be responsible for increasing our test coverage, and analytics, and such. Basically, I view her over time as a digital librarian and a product manager of our automated software test suite.

Leah Pemberton  

Great, thank you. The next question I have for you here is: how would you determine whether something in your application would need to be tested automatically or are you still incorporating manual software testing into your strategy? Are there cases for both of those?

Russell  

So I do get pushback quite a bit from either the engineers or some of the managers who report to me that they think there is value. Actually, I missed that - there is value in quality engineers. I somehow missed that side. I apologize. So they do provide quite a bit of value and they look at things a little bit differently than engineers do. So there are instances where I think they do bring a lot of value to write unique or manual tests. But it's more than actually creating the test case, it's not actually about running the test itself. It's just they have a different way, they have a different Spidey sense of looking at a problem and like, "We should look at this edge case, and we should get this edge case." In general, I try to push back as hard as possible that there should be a way to automate always. Now, there are certain edge cases that there is difficulty in actually crafting automation for it, or there's a limitation in the tool or something where there's some effort. Now, there have been cases where we have struggled to actually automate a test, we've gone back to mabl, they've taken the feedback and incorporated it back into the product, or given us strategies on how to automate it. But in general, I try to push my team to always automate it, because even if you do it manually, you can do it once or only a few times, and not come back to it and you're basically adding your own testing debt.

Leah Pemberton  

That makes sense. Thank you. You mentioned your base test suite earlier in the presentation. How did you prioritize what was in that initial suite of tests that you automated?

Russell  

So we basically had a set of manual software tests anyway, but we worked with the product managers and our quality engineer and we basically looked at, what is our core set of functionality and what would be a set of sanity tests across a set of browsers, both mobile and desktop? Basically, if we were to run a test right before we released production, what would it be even if it was manual or automated?

Leah Pemberton  

You mentioned also a test scaled earlier. What topic do you typically talk about and are you meeting frequently? What is that meeting cadence look like and what does the agenda look like?

Russell  

Sure. So we meet monthly. The agenda is supplied by members of The Guild. It goes across all of our engineering teams. It's a voluntary organization of either software leaders or quality engineers and basically, the last couple of months have been about tools and strategies that different teams have been using. So we did a demo. What we have been up to recently, the topic for yesterday was actually integrated into our build and some of the different tooling that people have done with releasing the automation into Jenkins. A couple of months ago, we looked at Rest Assured and how people were doing API testing, for example.

Leah Pemberton  

That's great. Thank you. Collaboration is a huge topic as well. So it's always great to hear that sort of thing. Probably have time for just one or two more questions. So if anything else pops in, or pops into your mind, definitely ask that now. But Russell, what were some of the criteria you used in your tool bakeoff evaluation?

Russell  

Sure. So we wanted to be able to test across all of our environments and browsers. So browser support, we did not want to use an emulator, because we thought the weight of that and the issues associated with integrating into our build were just a non-starter. So needed to be able to test on all of our browsers, both desktop and mobile, we need to be able to test our lower environments, so VPN access, or some sort of whitelisting needed to occur. Some analytics support was important to us. I wanted the ability to basically be able to record tests. If we could write in code as well, that would be a bonus. Some componentization of it was important to us. So like I said, I don't want to be basically rewriting and re-recording the ability to navigate parts of our site where you should be able to have a component library. Support that wasn't a forum or some sort of sync-based support was also a criterion for me.