Test Automation Architecture

Monday, October 29, 2018

Automation Pain Points I: Synchronization

Let me say first that I love what I can do with test automation. It has definitely become an art over the years I've been doing it.

One of the very first pain points I encountered was with synchronization. In those days, I had to build my own tools. This is just before SilkTest went Beta 1. I tried sleeps. They worked poorly.

And when I started at Home Depot, I got into quite the argument about using sleep statements. A co-worker, Clay, got bent at me because I had put a sleep of 1/3 second into a routine.

Now, mind you, what this routine did was to poll for multiple different controls, every third of a second, until one of several conditions were met. He insisted "no sleep statements!". So I took him for a walk inside of webdriver, where it does the exact same thing. It was a good example of somebody following a rule because there's a rule, not because it was warranted in that case.

Sleep statements are not a good solution in test automation, mostly ever. My condition was unusual because there was no way to do all these checks at the same time otherwise.

But when anybody puts a sleep in a test otherwise, I call them on it. I know just how bad careless use of sleep statements can be. I have made tests take longer than necessary by trying to extend a wait to cover all the various response times from the application under test (AUT).

So instead, I ask folks to look at "how do you as a human know the AUT is ready to continue?" Is it because a field has become populated? We have an assert_not_empty() to cover that condition. Is it because a control exists? We have assert_exist() to cover that condition. Are we waiting for it to have a specific value? We have assert_value() to cover that one. All of these validation routines take a timeout as an argument.

I don't believe in rules for their own sake. In fact, I think the fewer rules we have, the faster development goes. Everything about our framework is about velocity. Reducing the time to build and maintain tests. Don't use sleep except in the most unusual of circumstances is one I do keep.

Keeping Your Perspective

For those of us who've been in IT for many years, the number of new and exciting technologies, paradigms, methodologies and philosophies around us right now can seem overwhelming. Ideas that started in small shops, startups and incubators are now reaching in to even the most conservative of industries. Even my own industry, dental insurance, is starting to adopt agile practices.

It can be difficult to find your way through this barrage of new ideas: We want continuous integration and continuous delivery and your framework needs to support our particular flavour of agile but it also needs to support our legacy waterfall apps but we're not going to call them waterfall any more and also we need better reporting and traceability and and and ...

At times like this, I find it very helpful to take a step back and re-orient myself around a single, simple tenant:
My job, as automation engineer, is to support quality.

Let's unpack that a little. Quality means different things to different people and organizations. You might measure quality with defect metrics, you might have some sort of federally mandated guidelines, you hopefully have a set of functional and non-functional requirements your are gauging against. Regardless of how you measure it, your job, as an automation engineer, is to do everything you can to help ensure quality.

Functionally, test automation is a component of overall QA. If your core QA practices are shaky, the best automation in the world will not save you. Everything you do as an automation engineer need to ultimately serve to bolster QA. Whether you are part of a small team or have an impact across the enterprise, this holds true.

I use this tenant every day. We are in the midst of developing a new enterprise-wide automation framework. Keeping my perspective on "support quality" helps me filter through the options when choosing technologies and methodologies. It helps me to remember who the stakeholders are when I'm designing elements of the framework, such as reporting. It helps me figure out the how when tasked with something like adding testing to our CI setup. Hopefully it can help you too.

Sunday, October 21, 2018

The Pain Of Test Automation

"So what do YOU think are the biggest pain points for test automation?" Someone asked me. I've been thinking about that for the last several weeks. This is the list I came up with:

Synchronization - Making sure the test doesn't get ahead of itself, and making sure that it can continue.

Resilience - Recovering quickly after the application under test changes.

Interface Proliferation - When test automation libraries get too big.

Networking Problems - Possibly my number one issue at my current gig.

Looking Stuff Up - How much time is spent looking up how to do things. A lot more than most people think.

Data Management - Getting consistent data into the application under test, mocking external interfaces, and so on.

Artifact Aging - What test artifacts to hold on to and for how long.

Reading and Maintaining Other People's Code - Coding standards, training, and so on.

The next few posts will look at each of those in a little more depth, including how I deal with that in my current work.

Stay tuned!

Tuesday, June 6, 2017

An Oddity of History

i was watching crash course computer science on youtube. it's been kind of amusing, i have learned a few things, but there just isn't much i didn't already know... with the episodes shown so far, anyway.

but i ran across the example below, which was illustrating the 'else' clause:

if x then
print a
else
print b

and i had this weird 'wait a minute, i often don't do that in that way!'... at first, i briefly had a judgement that my way was better. when i asked myself why i thought this way was better, my cpu came back with a shrug.

this is the kind of structure i have historically often used:

r = a
if x then
r = b
print r

and as i sat there scratching my head, i realized that long ago, msdos 2.1 batch files didn't support else. in fact, the modern cmd.exe command set does support else, though i wasn't able to track down when that was added (or if it had been there all along and just poorly documented).

my approach requires extra memory. when you put all the commands on the same line, mine is slightly shorter. i don't think either of these differences qualifies as better. and i guess one of the marks of a good engineer is one that doesn't hold onto the old way of doing it just because that's what they learned. yay for having dispelled an obsolete assumption!

Tuesday, January 3, 2017

Libraries and API design

Some time back, I ran across this: http://www.softwaretestinghelp.com/test-automation-frameworks-selenium-tutorial-20/

And I was taken aback at the description of the framework, as he was ignoring the parts of framework development that seemed to me to be the most important parts.

Consider, if we use just Selenium and write our tests to it, then everybody we hire who knows selenium (and our target language) will be able to write automated tests. We will probably set up page objects, and that's about the minimal set of things to know. Right?

Well, yes, but...

From my experience, many QA organizations would prefer to be able to graduate successful testers into at least some automation. This often means training less experienced testers to write automation.

And if your framework is more complex than the above (and most of them are), then there's going to be some kind of a learning curve, even for more experienced automators.

My job as an architect is to 'design and maintain the framework'. And how well I do that design part will directly influence how productive the people who use that framework will be.

At my first automation gig, there were 3 of us, and we all were pretty good programmers, so we just started banging out tests. We had a set of shared page objects, and wrote any other functions that helped us out as we needed them.

This meant that, at different times, we each wrote something to remove all non numeric characters from a string. Because we happened to need that, and because there was no orchestration of our work. No good standards for code sharing, no limitations on what we could create. Doh!

At one point, I even ran into a test I'd written a month earlier that implemented some function or other (I no longer remember what) that I'd just implemented over again because I'd forgotten that it existed. My own code!

This was when I learned about interface proliferation. The tendency of good programmers to try and reuse code had led to more interface endpoints than we could keep track of.

From then on, I started looking at test automation interface design as it's own discipline. And not only the design of the interfaces, but how I communicate about them as well. What documentation I create and where.

These two pages have strongly influenced my API designs:

http://martinfowler.com/bliki/HumaneInterface.html
http://martinfowler.com/bliki/MinimalInterface.html

Now, I try to create minimal interfaces, and the interfaces I do create require the bare minimum of data to do the most common action. So for instance, when I started in my current gig, they had created a function called 'does_control_exist(selector)' to wrap the fact that selenium has no exist function. They also had a 'wait_for_control_to_exist(selector, time)'. Two interfaces to determine existence. What if it was just .exist([time])? The [time] is in brackets because it's optional. Two calls compressed into one.

It's also the case that when I used our framework's click(), I'd often get 'control not found' if the code tried to click on something that existed more than once. And there was nothing to tell me if a control was unique or not.

Now I have .click([time]), so a call might look like thingie.click(), which does the following:

since no timeout was provided, use 20 seconds
get the element for the selector
assert that the item exists
assert that the item is unique
assert that the item is visible
assert that the item is enabled
element.click()

And that happens each time we click on something. This is how most test automation tools worked before selenium.

So my page object may look something like this:

class PageHome(MyPageClass):
complete_page_button = ButtonClass("css=[id='Next-Button']")

def complete_page(self):
if self. complete_page_button.exist() is False:
self.invoke_page()
self.complete_page_button.click()
...

So for exist, we now have one function instead of 2. For click, we've collapsed at least 6 actions into one function. This means my test automation engineers have fewer end points to remember, and for the most common use of an endpoint, no additional parameters are required. I can get new people up to speed faster, and we can debug failed tests more quickly because we get more useful feedback in the logs.

For me, this is what framework design is all about.

Wednesday, October 12, 2016

Python and how to get the instance name of a variable

I needed the name of an instance in order to do some error reporting. I'm very lucky that each of these things I need to report on will only have one instance.

So of course, first thing I did was ask Google. And after a couple of weeks of checking now and again, I have been unable to find any solutions already published. I found a lot of "instances don't have names!", which clearly isn't true, but makes sense from the perspective that the instance name is a pointer to the thing, and there's no backwards pointing property in the thing being pointed at.

But as with most things, there's always a way. And it turns out that in python, it's actually pretty straightforward:

import gc


def instance_names(self):
    referrers = gc.get_referrers(self)
    result = []
    dict_of_things = {}
    for item in referrers:
        if isinstance(item, dict):
            dict_of_things = item
    for k, v in dict_of_things.items():
        if v == self:
            result.append(k)
    if not result:
        result = ['unnamed instance']
    return result

Returns a list of all matching instance names. Now it's possible to create an instance without a name, such as via a generator or a lambda, and I haven't tested what it will return in those cases, as that wasn't what I needed this for. It also fails for getting the name of a function, but that can gotten in other ways.

I hope this helps someone! :)

Python, Selenium and the dreaded "Timed out receiving message from renderer"

We had a problem, around 15% of the time, we were getting tests that failed, and when we went to SauceLabs to look at the problem, we just had a blank browser page with "data;" in the address field.

Folks here had been ignoring this and had just pronounced the automated tests to be flaky and unreliable. Having the experience I have, this was galling.

I looked in the selenium output, and saw "Timed out receiving message from renderer".

So I looked on Google, and found lots of folks reported having this problem. Several bugs have been written, all closed with "can't duplicate". This issue is at least 4 years old as of this writing. I tried changing timeouts, using try/except, and every other thing listed, but improvements were either nonexistent or very modest.

I have solved it, but the solution is an *awful* hack. The one thing it has going for it, our failures have dropped from 15% to 0. (Testing done using a suite with 10,000 cases in it.)

Here's the code at the center of the solution:

webdriver.get('about://blank')
my_script = 'var a = document.createElement("a");' \
            'var linkText = document.createTextNode("%s");' \
            'a.appendChild(linkText);' \
            'a.title = "%s";' \
            'a.href = "%s";' \
            'document.body.appendChild(a);' % \
            (url_to_use, url_to_use, url_to_use)
webdriver.execute_script(my_script)
webdriver.set_page_load_timeout(20)
webdriver.click_element_by_text('css=a', url_to_use)
if page.loaded() is False:
    webdriver.click_element_by_text('css=a', url_to_use)
if page.loaded() is False:
    webdriver.click_element_by_text('css=a', url_to_use)
if page.loaded() is False:
    webdriver.click_element_by_text('css=a', url_to_use)

In the above, page is my page object. The method loaded() checks controls to see if the page has finished loading. And click_element_by_text fetches matching elements and iterates through them to determine whether they have the text specified. If an element does, it clicks it. (Sorry I can't include that code, but it belongs to work, and would make this sample way too long.)

In my experiments, it came to look like the integration between the driver and the browser (at least on Chrome) creates this state where Chrome has failed to load, but never tells the driver about it. So the driver just eventually times out.

about://blank - I used this because it should render an internally generated page every time. On Chrome, it's a "This site can't be reached" error, which works just fine. Firefox and IE also generate errors or blank pages. But the assumption was that internally stored pages should load every time. And so far, they do.

So by adding the target link and clicking it, I'm bypassing that tight integration.

I've watched the code run, and I've seen it had to retry once in a while, but so far, never more than once.

Please note this has only been tested on Chrome.

I hope somebody finds this helpful! :)