Corporate travel solutions provider ITS transforms the complex (and headache-inducing) booking process into a simple, delightful experience for business travelers. To do so, they need to streamline complicated travel policies and complex algorithms into the best options for business travelers. In this session, VP of Engineering Barti Somaasundaram will share insights into the ITS approach to testing a critical, re-architected web application, and the impact it had on business travelers.
Hello, everyone, we are happy to have you back for today's second virtual session of experience. And I'm really excited to be hosting this session with ITS, all about building better business travel experiences, all through optimizing your functional test coverage.
So before we get started, I will just cover a few housekeeping items. If you have any questions throughout the presentation, drop those in our q&a panel. Additionally, you're able to upvote questions. So if there's one that really interests you, and you want us to prioritize it towards the end of today's presentation. We'll do that. And for any comments, discussion, chatting you want to do, feel free to use the chat feature. And you'll find those both on the right side of the session module when you have your video minimized. So we'll leave time at the end for questions. And with that, Barti, I will hand it over to you, take it away.
Thank you Leah. Hello, everyone. Thank you all for joining this session. Today, we will be focusing on some of the needs of online travel. And I'd like to give you a little bit of view about some of the engineering and testing aspects that we have at ITS. So if we look at the travel industry as a whole, you can imagine it is a very vast industry with several products needed to manage different functionalities. For the industry, like, for example, if you look at air travel, and then you look at airlines, you have inventory to manage your reservation, you have revenue management, you have crew management, so you have many different aspects within the travel industry.
So ITS focuses on the corporate travel and all the functionalities needed to manage the corporate travel. So this discussion will be focusing on sort of like an online and digital experience and then how ITS has managed to do that.
So to give a brief introduction about me, I lead the software development and test engineering teams at ITS. And I've been with ITS for about a year and a half now. And prior to that I was with Sabre and Amadeus. And I mean all of my background is in software development. I'm going to the next slide here. Okay, so just a quick view on ITS. So we are a 40 year old organization. So we started off as a travel agency, and in 1983. And since then we have evolved into many different aspects of travel technology. Our core products are providing solutions for corporate travel. We are headquartered out of Dallas, and we have our technology teams based out of Dallas, India, and also in Bolivia. So software competencies or maybe product lines, right. So what we have is our core product is corporate travel and all aspects of corporate travel.
So we have a booking engine. And you can imagine it to be something like an Orbitz or Expedia but tailor made for corporate travel. So there are several policies and several rules which need to be applied for corporate travel. We have an event management system where if an organization wants to manage an event like for example, if Amazon would like to host an event, and if they want, let's say find employees travel from all over the world, they will be able to register their event within our application and they will be able to send out domains to all the attendees and the attendees will be able to make their online bookings and then manage their bookings through our product.
We have an IROPS platform, IROPS is basically a disruption management platform. So if you're an airline, and if you have a flight that gets canceled, let's say due to bad weather or mechanical failure, right? In that case, all the 200 passengers within a plane will need to be accommodated and rebooked into different itineraries and different planes, right. So that process is managed by our product. So if you have a mid-office/ticketing platform, which is after a booking is made, before booking is ticketed, there are several quality checks which needs to be applied on the booking. So if you make a personal travel, I mean most of the times you would have noticed that as soon as you make the booking, you get a notification or mail saying that your booking is confirmed, you will get an email after like 10 minutes saying that your booking is a ticket. So between those 10-15 minutes, all of these quality checks gets applied on your booking. So we have a tool or platform to manage that. And then we are exploring blockchain technology for some of the use cases we have. And we believe that these use cases are Game Changer within the travel industry. And that this product is in an Inception state. I would probably call it as a startup within our own teams. So those are some of the key product lines, what we have at ITS.
So moving on, so some of the customers so we have American Airlines as one of our key customers. So we have the business.a.com powered by ITS, we have spirit and frontier use our tool for their IROPS or disruption management we have recently working on an implementation with EasyJet the first carrier in Europe some we have several corporate customers, Amazon has managed some of their events through our platform, just to give you a view on where we are at, and who are our customers.
Okay, so, this topic, I mean, this discussion will be focusing more on a platform called tripeasy. So, this is our Corporate Online Booking Engine. So, as I mentioned before, I mean it is like an Expedia or an Orbitz but very specifically made for corporate travel. So users can come to the site, they can enter the origin destination, select the dates, make an air booking, hotel booking, look at the itineraries, choose the itinerary with which they like, make their booking. So, that's on a high level. But in the background, there are several things which goes on, which we will take a look at in the next few slides.
So if you look at the product flow, I can categorize the whole tripeasy, let's say into like five different categories or maybe five different product flows. Within the entire application. One is air travel. So if you're a business traveler, almost always you will have air travel. So the user will come to the site. Selected origin destination request the itineraries, look at the itineraries, choose itinerary, make a booking. Sometimes the users will have a hotel to book or maybe a car to book, and sometimes it might be a hotel only, or sometimes it might be a multi product which is an air, car and hotel. Corporations would like their employees to use specific companies for hotels, specific companies for cars, where they will have a negotiated rate or maybe a discounted rate which they would like to choose, mainly from a cost perspective or maybe from a preferred advantage perspective. Then one of the key things is a rules or policy engine.
So when it comes to corporate travel, there are several policies and several rules which gets applied on your itineraries. For example, one of the rules could be that if you're flying within the US, you cannot spend more than $500. Or you can only book business if your travel exceeds let's say, six hours. So every corporation will have different set of rules. It is important that we apply all of those rules on top of the itineraries and then show the results based off of the policies. And once the user makes a selection then we go to the booking process where the chosen itineraries sample repricing, the price is rechecked and then a PNRl is made. A PNR is a passenger name record which is sort of like a document for your entire travel, and we have an admin section where the users will be able to manage their profiles, update their parts, address and all those things and then for our travel managers or our travel arranger. So, in some of the smaller companies it is normal to have a travel manager who will manage or maybe make the bookings for other employees. So, they will be able to manage the company's updated profile of the company etc. So, these are some of the high level flows, what we have within tripeasy.
So, I wanted to give you a quick view about our backend functionality. So, the users will come into the application through a browser. So we are a browser based application. And then we have a mobile app iOS and Android. And the users can come to the site through in-site as well. And we have a couple of other flows where there is an authorization required. So once a booking is made, it goes through an authorization manager or an approver approves the request. So once the user comes into our site, then these are some of the major functionalities, what we have in the backend or in the middleware, one of the major things is the shopping, right so where we were, we contact the third party vendors to get the itineraries sort the itineraries and apply the business rules and all those things.
So I would say about like 70% of our application is basically shopping, and then and then the content related to shopping. Then a couple of other examples is that we have an arranger or maybe an administrative role functionality where the users will be able to manage their profiles, the rules that they would like to apply and things like that, we have something called as a product marketplace on which is a preference an organization would like to help like, for example, your organization will, would like you to travel only using a specific airline, or maybe stay in a specific hotel, right. So all those settings, it could be applied there, then we have several databases to store all of our content. And one of the key things, that we do is that we get our content from many different third party vendors, like, for example, the air itineraries will always come from a GDS or global distribution system like Sabre or Amadeus. And we have several third party vendors for hotel and cars.
So, when the user makes a request, in real time, we contact all of these API's, get the data, consolidate the data, apply the business rules, and then report it back to the user. So all of these things are done, while the user is waiting for the next set of results to show up on the screen.
Okay, so our application was written, let's say, about 15 years ago. And I'd say I mean, for most of our application, we can call the application a legacy application. And most of the applications sat within a monolith. And as part of the modernization efforts, we have decided to go ahead and re-architect some of the key components of the application. We've been working on this for about a year. And one of the major parts that we were that we went out and re-architected is the air path.
So, as I mentioned, air is about 65%-70% of our whole search and booking experience, and that is a significant part to go and re-architect redesign and then modernize the application. So the requirement here is to make sure that the application performance is improved, meaning that because all of this back in processing is happening in real time, we want to make sure that the number of seconds, we reduce the number of seconds drastically from what we have before. And also the other thing is that we want to make sure that the application scales to unpredictable traffic patterns.
So on a regular day you will have let's say like 10 to 50 users on the site at any given time. In case of disruption or maybe a flight cancellation, we can expect several users to come into the site at the same exact time. So we want to make sure that we do all of those NFRC in terms of application performance, and then to scale the application. And anytime we go into a major re-architecture, it is absolutely important that we ensure that the user experience and then the functionality, what we have in the application, prior to the architecture, and then after after the architecture, it needs to be exactly the same experience. So that is the other other important thing that we wanted to manage. And as though our application is mostly off the shelf, we have certain customer specific use cases that we needed to ensure and that is the other requirement as well.
So overall, we plan the development effort for six months with about two months of testing. And we relied heavily on automation. And that's where mabl comes into the picture. So today the application is live. And with some of our first customers migrated into the application, we are still working on migrating the other customers and then we plan to migrate the entire set of all of our customers in the next few months. And by the end of January, the whole airflow should be on this new architecture.
For some of you who might be interested in the engineering or maybe the architecture behind the scenes, I just wanted to give a high level view. So it is based out of microservices. And we containerize the application and we split the air monolith into few different micro services. And we used Node.js to do most of our back-end development. And we are an AWS shop and we use a lot of AWS components and we use the AWS ECR to publish and then store container images, we use AWS fargate, which is a version of Kubernetes to deploy and then manage the containers.
In runtime, we use Redis cache to store the transient data. And the performance we get out of Redis Cache is extraordinary, it is capable of handling several 1000 Or maybe up to several 100,000s of transactions per second. So relying on that, we use DynamoDB to store our static data. So with all of this, I think our core aim is to rely very, very less on rds, or maybe traditional relational database.
And, one of our goals is that by the time the user hit search, when we go through all of these backend transactions, we should not hit the data store. And then our goal is to make sure that we keep everything in memory, and then make sure that the overall response time is extraordinarily fast. So this is the backend architecture, just to give you a high level view.
Okay, so in terms of functional coverage, so we have nine websites to manage. And we have tripeasy.com, which is our core functionality. And then we have a site for American Airlines and Spirit. So we need to test, as part of every release, we need to make sure that we test all of these, every single site. And we have three environments: we have QA stage and production QA is, once the build comes out of the iteration, it goes into QA, then stage and, and then to production. And we need to validate several roles. Every role has certain privileges and permissions within the application. So we need to make sure that works. And we need to make sure that the UI is validated. So we have several sorting filters, and we have different logos for each of each of the customers.
We need to make sure that we test all of that, in terms of the test plan. So we have 80 plans, I mean 80 plans and mabl and we have 800 test cases overall. And we've been using mabl for about three years now and we have been improvising our overall test coverage. And today the regression runs in about two and a half hours. And this is a reduction of about let's say a day and a half of manual testing to run the regression. And this is very, very important as, as we get the results of the build quality in about three hours. And we of course, add new test cases as and when we have new releases going into production. Okay, so the mabl usage in a release cycle, so we do bi weekly releases, so we do release into production every two weeks. And I know that several organizations will have different release cycles, and we are very big on speed to market.
And it is very important that we release often, so we do it every two weeks. And our automation runs in three hours. And we get the results of our build quality in three hours, right, and we do have about a week's worth of the cycle. But, but I mean, as in when we find any issues in QA or stage, we will be able to run the whole regression and then get the output within a short amount of time.
And we run our full regression in, in our QA and staging environments, we also run more or less a regression in production also, but it is sort of like a minimized version of regression or whatever we can run in production. And the other thing is that we do three deployments into our QA environment every single day. So as the developers are working in an iteration and they are checking code, the deployment to QA environment happens every three hours. And we run a sort of like a mabl sanity automation on top of these builds, and we get the output of the smoke tests almost immediately. It gives like immediate feedback loop to the developers to make sure that the build quality is stable and, and at any given time, we will be able to move into a QA or stage environment. So that's how we use mabl, from a release cycle perspective.
These are some of the graphs I'd like to quickly go through here. This gives a view about I mean, how our sanity runs look like in our QA environment, so we run three runs a day, some of these you can see like some some days, it's, it's fair, some days, it's all good. So we look at this dashboard on a regular basis. And here, just a similar metric in a different view. And we integrated it with Outlook. So all the developers get an email showing the quality of the build, and if something breaks, they can go check and then work on a fix on whatever they have to do. And this is one of the new functionalities I think mabl has introduced which is integration with teams. So our sanity and automation is integrated with the team so we have visibility from the perspective, and also on teams. And from a metrics perspective, we use New Relic. So these graphs are from New Relic. I'm not sure how much you can see. But we use mabl sort of like a performance and also sort of like a load testing as well. So we bump users into the site, we increase the users from 10-50-100 users or so and we are able to see how the system utilization is.
So once we run our tests using mabl, we compare the results in New Relic and then we want to make sure from release or release we get predictable metrics on New Relic as well. These are some of the graphs what we have in New Relic.
Okay, so this is probably my last slide. So, in terms of the summary and retrospective, some of some of the lessons learned or maybe what I'd like to give you guys from our experience of using mabl and then going into the the architecture is that I think anytime you go in re architecture or even in a regular release perspective, having a core sanity run on a regular basis is very very important. Now I think different organizations use different ways to measure that and then get the constant feedback loop back into the development organization. But for us, I think we even though we have other other means as well, we are using mabl to run the automation on a day to day basis to, to run on a release.
And one of the other things is that automation is absolutely important. So even though we, we went through the re-architecture over a period of let's say, four to six months of development, after the first three months of development, and once the build is more or less, getting stable, there was a need where we we had to run like several regression runs in a day. And I think that would be impossible without automation. And I think in our case, we are 100 percentage automation. And I think in your automation journey, you might be in different stages. But in my view, I would encourage all of us to go towards 100 percentage automation as much as possible.
And the other thing is that I mean, for us measuring the APA performance and the AP performance is absolutely required as well. Our application we rely on several, several internal API's as well as third party API's. And that is important to make sure that we provide the same performance and scalability over to an end user. So that is something that is important on a regular release basis. When we go through the architecture, what we felt is that having a dedicated scrum team helped us a lot. And I think anytime you have a dedicated scrum team. I mean, the tendency is that we need to pull the top developers and then top QA engineers into it. But I think it might not be possible in many cases, but I think I would, I would strongly encourage and I think based off of what we saw, it is important to have a dedicated team undisturbed by anything else to be fully focused on this modernization or re-architecture efforts.
And in our automation journey, what we're looking at as next steps is that AP automation. And I think mabl has introduced the Jenkins integration as well. That's something that we're looking at next. The other next thing that we're looking at will be mobile automation. So I guess those are the key points that I wanted to share with you guys today. That is the presentation. And if you have any questions I can take those.
Thank you so much. I have a pile of questions for you here already. So how about I just dive in? If that sounds good? Sure. So with this big complex application you have, do you know off the top of your head, how many automated tests you're running on a regular basis? How often those are running and how long they run on average?
Yes, yes. So as of now the have sanity runs, which happened three times a day. So we have about, I would say like 10 to 15 test cases, which are the core sanity test cases, which are,, if any of those cases break it's an absolute break, right. So we run those, like 10-15 test cases on a daily basis three times a day after every deployment. From a release perspective, we run about 800 test cases, across like 80 pans, right. So that's, that's, that's what we have.
Great, thank you. The next question here is how did you determine the best test cases to automate?
So in my view, there are these core falls, right. Okay to give an example, in our case, a user will come to the site, they will, they will enter the origin destination, go through the booking process, select an itinerary, make a booking, so we automate the whole flow. So as if the user whatever the user will do on a normal basis, we automate that flow right. So we have these call flows like an air flow, hotel flow, car flow. Likewise, there are all these core flows need to be automated. In my view, I think every single edge case needs to be automated, right. And it might be an extreme or maybe a stretch goal. But that's that's the core benefit of automation, right. I mean, on a regular basis, the things that we will not run through a manual testing that should be automated. And then any kind of boring or maybe mundane tasks must also be automated. And, and I guess I mean, it depends on every company and where they stand in that automation process. But if you ask me, I mean, I would go towards 100% automation.
Thanks, Barti, I will have time for one more question here. You've got a couple others. So maybe we'll follow up with some folks after the session today. But you got a couple of questions around production environments. I would love to dig into this a little bit. So It's always tricky running automated tests on production environments, except for some sanity tests. How do you manage that test data in other data accounts for automation?
Yes, yeah. So when it comes to the production environment, I mean, there are certain things that we cannot do. But we ensure that we prepare our cases that will be run in production, in such a way that we still achieve the functionality. But we still create the production environment, let's say for the most part, right. So one of one of the examples that I can give is that in our, in our example, like if a user's making through an air booking, in a QA or staging environment, we can we can go to the entire booking process, we can actually make a booking, we can actually ticket the booking, we can do all of this, when it comes to a production environment, we cannot do that, right. Because the actual booking is real booking, real dollar spent right. So in this case, what we do is that we go up to the payment page, and we just stop at that point. So this way, we are able to test I would say 80% of the functionality. And also we are able to use automation to the maximum extent in production.
Great, thank you so much. We'll follow up with folks with some of the other questions that really got a lot of upvotes here, but thanks for everybody joining today's session. There's a couple other starting in a few minutes. So hope to see you there. Thanks for joining everybody.
Sounds good. Thank you very much. Thank you, mabl.