Tuesday, July 3, 2018

Testing Tour Stop #14: Pair Evaluating a Visual Regression Testing Framework with Mirjana

Yesterday I enjoyed yet another stop on my testing tour. This time I was joined by Mirjana Andovska. In the beginning of our session she shared that she was the lone tester at her company, just as I have been for a long time. She saw my tweets about my testing tour and knew she had to try that as well, so she signed up. How awesome is that?!

When asking what she would like to pair on, Mirjana answered automation. I found out that she has a lot more experience in automation than I have, so I was really glad that she wanted to pair with me. She was especially interested in visual regression testing, mutation testing, and automated security testing. All great areas of expertise I would love to learn more about! In the end we decided to go for visual regression testing. Mirjana was so nice to even prepare playground projects so we could start our session right away and get to know the available frameworks better.

Learning While Walking Through

As Mirjana kindly took over preparation, our first step was to walk through the available projects and see what's already there and what she learned so far. She had revived an older playground project of hers using the Galen Framework, as well as created a new project to try out Gemini as visual regression testing tool. Unfortunately, it wasn't as easy as expected to get Gemini running on her computer, as it shows a dependency towards node-gyp which has a known issue on her operating system Windows 10 requiring permission to install further software.

We decided to go with the Galen playground project first and learn more about this framework before maybe trying to get Gemini running on my laptop running macOS. But first of all: why Galen and Gemini? Mirjana referred to a quite recent overview of the different frameworks available for visual regression testing. Based on her team's needs and what she read about the tools, she found that Galen and Gemini looked most interesting to check out first.

The playground project Mirjana provided was based on the getting started documentation and the first project tutorial for the Galen Framework. She had already implemented a first test for a sample website. We only had to run Selenium locally as standalone server, execute the tests and see how it was working. Just by Mirjana walking me through the playground project, we both learned more about how the framework works, while extending our knowledge on the way.

Which Framework to Chose?

The main purpose of our session was to learn more about the Galen framework and evaluate it. We wanted to discover pros and cons, as well as learn the answers to the following questions.
  • How easy is it to set it up and configure?
  • How easy is it to write a test?
  • How easy is it to maintain the tests?
  • How easily can you understand the resulting reports? Can you quickly see what is wrong so you can fix it based on this information?
With every step we found out more. Here's what we learned.
  • The spec files included CSS locators for objects as well as the test specification. We noted for later to find out whether we could also have the locators separate from the test specification.
  • Galen takes screenshots of the whole page as well as of the single elements to be located. Using images of only part of the page for comparison was pretty nice. However, when looking at the different menu items of a sample navigation bar, we found that the images were cut at different places, sometimes even cutting the menu item text. We felt this was quite strange, so we added it to our list of things to be investigated later.
  • The test report can be defined. We tried the html report, which included a heat map visualizing what the tests covered; pretty nice. However, the report captured the console output only from the framework, but not from the application itself which would make it easier to see the information needed to reproduce a bug.
  • The test run didn't close the browser in the end, so we noted that we would need to take care of this ourselves.
  • We wondered how to add more specs in one test runner file. We postponed this question for later investigation.
  • We learned that we can specify tests in JavaScript as well as in other languages like Java.
  • We saw options to test for desktop or mobile. We decided to not dive deeper here for now.
  • We found that it's really easy to run the tests not only locally but also on different servers, or on services like BrowserStack or similar.
  • We ran the tests on BrowserStack, and the tests failed due to differences. At first we assumed that the differences were due to the different operating systems Windows 7 and 10 as the Chrome version was the same. However, when looking at the compared images, we saw that the expected version showed a scrollbar where the actual image did not.
  • This led to the question how the expected comparison image was created. Maybe on first run? Maybe when running the "dump" command we found? Or did it take the expected image from the last test run?
  • We had a deeper look at the command to create a page dump.
    • The Galen documentation told us that "With a page dump you can store information about all your test objects on the page together with image samples."
    • We ran the command and waited. With time passing, we began to wonder whether it was still running or not, especially as it didn't provide any log output anymore. We decided to let it run a bit more and found that it took about three minutes.
    • We learned that the dump command, unlike the test command, did indeed close the Chrome browser after it ran through. The dumping processes generated lots of files: html, json, png, js, css files and more.
    • We discovered that the dump report gave us spec suggestions, but only if we selected two or more defined areas like the viewport and a navigation bar element. It seemed the provided suggestions should always refer to how an area related to another one.
    • Mirjana thought this would come in handy when a developer missed to align all navigation elements. I added that we could use this feature to explore the page. As the tests are quite slow in nature, we might only automate a part of them as a smoke test and explore around that manually.
    • If you'd like to have a look yourself, here's the official example page of a dump report, ready to be played with:
  • Checking out the official documentation, we discovered the "check" command to run a single page test.
    • Here we learned that there are also options to create a TestNG, JSON or JUnit report.
    • The html report resulting from the check command showed a different structure than when executing "test", which we found interesting.
  • We still had not seen how the reference images were really created and wanted to test our assumptions. Sometimes you need to recheck the basics also at a later stage.
    • Documentation told us that the dump creates images that should be used for comparison.
    • When experimenting, we found that for "check" the stored reference images were actually used for comparison. However, when running the spec files as "test", it seemed to take the image of the last run as reference. Might this be a bug? It's always interesting to check out the reported issues when evaluating a framework.
At the end of our session, I asked Mirjana about the pros and cons she sees regarding the Galen Framework from what we learned so far.

LikedSo-soDidn't like
  • Easy to change configuration and run parameters, providing the flexibility to run the tests anywhere and in different combinations
  • Specifications for objects
  • Suggested specs feature; she shared that I opened her eyes for how to use it for exploring as well
  • Java or other options to write tests; depending on qualification or the type of project it's good to have the freedom to chose
  • Easy to run on Jenkins
  • Easy framework setup
  • Really nice documentation
  • A little bit confusing tests, maybe related to not being used to JavaScript too much
  • JavaScript files as additional layer of configuration besides the Galen config, the run command and the spec
  • Structure of the tests; we kind of tackled them but need to read more as we would also need to maintain them
  • You have to be careful with indentation when writing tests as you get an exception if it differed from expectation; especially if you use another editor for fast editing
  • Different reporting depending on the command used

All in all, Mirjana shared that from her perspective much depends on the purpose of these tests in the first place. Each test should give us an answer to a question, manual as well as automated. Sometimes it's not a yes or no. Developers might not be used to that, but reports can give us much more information than true or false, like a percentage of how much the result differs from the expectation.

As shared earlier, Mirjana had also set up a Gemini playground project. However, the test project failed to run due to missing dependencies which need to be installed what is not easy if you lack the required permissions. Also, it was hard to search and find this kind of information. She allocated the same timebox to explore both frameworks, but she didn't get as far with Gemini as she did with Galen.

In her experience, you only have a limited timebox like one day to try things out. You don't have infinite time or budget or resources. This is usually an important parameter when evaluating frameworks. However, it also depends on the skills of the one who tries them. This means, if you don't have the most knowledgeable person do it, it's a trade-off. I think that this is a puzzle you cannot solve as you only discover what kind skills, knowledge, experience you really need while you're already at it, not before. Mirjana gave the valuable advice that when you want to evaluate a tool, you need to have a familiar environment as system under test. At best it's a real project which is very simple so you can write simple tests, otherwise you will lose time inspecting the elements and learn about the application.

The Value of Learning Together

In the middle of the session Mirjana asked me: "Did you check the time? Because I'm having fun! :D" In the end, we spent 150 minutes testing together instead of the originally allocated 90 minutes.

I found the session really interesting and valuable. Mirjana had it really well prepared which I really appreciate a lot! So far I had only watched demos of visual regression testing but never saw it running for an actual product. By pairing up, I now had this opportunity and I'm thankful for it. In my experience I get way farther in less time this way than when I'm on my own.

Mirjana shared that in between her daily work she looks for ways to make her life easier. She tries to experiment, see what is new, what we can do more. Visual regression testing was one of the topics on her todo list. She had started the Galen playground project some time ago and realized by coming back to it now that she had grown a lot and learned a lot more since then. She hasn't seen the actual use of such a framework yet either, and also setting up a JavaScript project is not her everyday work. By doing it together I gained more insights about visual regression testing from her and could give more ideas back to her. Now we both have a starting point. And that's awesome.

No comments:

Post a Comment