Smarter Acceptance Testing with Personas

A while back, I gave a talk on the combination of Cucumber, PhantomJS, and WebDriver. There is a project on GitHub that contains the sample code. This is a small follow-up to that talk with the idea of how to manage your Cucumber scripts.

Writing BDD

To make the best use of BDD, you should write very rich steps. This is entirely the point of using behavior frameworks. In fact, if your BDD scenarios can't be used as documentation, you've done something wrong. I'm going to assume a knowledge of the Given/When/Then structure. Let's take a look at a bad example:

Scenario: User Signup
  Given A user is signing up for our site,
  When he enters his First Name,
   And enters his Last Name,
   And enters his Email,
   And re-enters his Email,
   And enters a new Password,
   And specifies his Gender,
   And enters his Birthday,
   And submits his request,
  Then an account is created,
   And account name is set as the email address.
   And a confirmation email is sent to Fred.

Now, this would be fine, but we're basically writing a bunch of steps that are quite fine-grained. Compare with something like this:

Scenario: User Signup
  Given A user is signing up for our site
    And the form is filled out correctly 
  When he submits his request
  Then an account is created
   And a confirmation email is sent.

Remember, the point of BDD stuff is to make it closer to how you would describe it when talking to somebody. You won't list out every single field when you're talking about it, so make your code-behind more powerful!

Test Data

The biggest chronic problem with testing of any sorts is test data. That is, to fully exercise a complex system, you need your data in a known state, and you need it to be consistent. Creating this data is difficult enough. A tool like Flyway can assist with setup/teardown/reset of data. But there's another problem: How do we know what pre-canned test data set to use? With enough test cases and enough scenarios to exercise, it can be difficult to remember the exact configuration of the test data. Or, if you are writing a test that uses a data set that is 99% correct, even changing that 1% can break other tests that were relying on that data to be fixed. So how do we deal with this?

In the BDD world, we have a fascinating option. Remember way back when we were all getting on board the Agile/Scrum train? The proper procedure is to write user stories in a manner like so:

As a non-administrative user, I want to modify my own schedules but not the schedules of other users.

This could be translated directly into gherkin:

Scenario: Schedule Management - User
  Given a non-administrative user
  When I attempt to modify my calendar
  Then the modification is successful

Scenario: Schedule Management - Admin
  Given a non-administrative user
  When I attempt to modify another user's calendar
  Then the modification is not successful

This looks fine, but "non-administrative user" is quite vague. And what happens when other people want to use code like that? How can we make this more extensible without sacrificing readability or maintainability? Enter personas.


The user story from above is the classic way most of us were taught to write user stories. However, when Agile was still young, the idea of personas was quite popular. From this site, we can see:

A persona [...] defines an archetypical user of a system, an example of the kind of person who would interact with it. [P]ersonas represent fictitious people which are based on your knowledge of real users.

Personas are different [from actors] because they describe an archetypical instance of an actor. In a use case model we would have a Customer actor, yet with personas we would instead describe several different types of customers to help bring the idea to life.

It is quite common to see a page or two of documentation written for each persona. The goal is to bring your users to life by developing personas with real names, personalities, motivations, and often even a photo. In other words, a good persona is highly personalized.

This is actually a very powerful construct in our BDD world. Instead of generically referring to "a user" or "an admin," we can refer to personas. Let's take that signup example again, but apply a persona:

Scenario: User Signup
  Given Fred Jacobsen is signing up for our site
    And the form is filled out correctly 
  When he submits his request
  Then an account is created
    And a confirmation email is sent.

Not very different, right? But behind the scenes can be a different story. Let's look at what it might have looked like before:

Given /^A user is singing up for our site$/ do
  user = find_user(admin: no, valid: true, ...)

Compare to:

Given /^(\w+) is signing up for our site$/ do |name|
  user = user_lookup(name)

In the first case, you need to search for a user. If the data changes underneath you, this function may not return the same result every time. In the second case, however, a user lookup by name will return you a consistent entry. Let's take a look at the Schedule Management example again, but with Personas:

Scenario: John can change his own schedule
  Given John Doe is using our app
  When John attempts to modify his calendar
  Then the modification is successful

Scenario: John cannot change Jane's schedule
  Given John Doe is using our app
  When I attempt to modify Jane Smith's schedule
  Then the modification is not successful

Again, we can look the users up by name here. But we can also have some documents behind the scenes explaining the personas. For this example, I'm going to use markdown:

# John Doe #
John Doe is a paying, non-administrative user on our site. He is a 40 year old dad of 2 boys and 1 girl, and uses our product to manage the children's activities between him and his wife, Heather Doe. He checks his schedule each morning and each evening, but does not check it throughout the day. 

## Payment Information ##
John has a monthly subscription, and is up-to-date on payments. 

## Account Setup ##
John has an avatar image and a verified email address. He has not entered his phone number for SMS updates

This gives us a wealth of information. We know that John is an active member, with up-to-date payments, has an avatar, has a verified email, and doesn't have a phone number. So the test data behind this -- what if we were smart about the way we used Flyway to manage it as well?

file: flyway/personas/001-John_Doe.sql

insert into users(name, email) values ('John Doe', '');
@id = select last_inserted_id(); /* or whatever your database supports */
insert into accounts(user_id, status) values (@id, 'active');
insert into profile(user_id, avatar) values (@id, '/images/avatars/johndoe.png');

Now we can have a 1:1 correspondence between a persona and the data that powers it. Say somebody comes along and basically wants everything John Doe has, except he wants somebody with a phone number entered? Instead of modifying John or trying to figure out if other tests will break, we just create a new user and persona:

file: flyway/personas/002-Jenny_Smith.sql

insert into users(name, email) values ('Jenny Smith', '');
@id = select last_inserted_id(); /* or whatever your database supports */
insert into accounts(user_id, status) values (@id, 'active');
insert into profile(user_id, avatar, phone) values (@id, '/images/avatars/jennysmith.png', '555-867-5309');

Moving Forward

So what have we done, really? From an abstract sense, we've created named collections of test data. They should (and ought to) be immutable. Any change that developers want to make results in a new persona. If we decide a persona is no longer useful, it also makes an easy search through your code to find all usages. Given this setup, it's a small leap to creating very complex setup:

file: resources/personas/

# Nevill Wadsworth III #
Nevill comes from an old-money family with assets in the tens of millions of dollars. He manages 5 family trusts, and uses our trading system to manage all of their assets. 

## Payment Information ##
Nevill is up-to-date on payments. Payments are deducted automatically via ACH. 

## Account Info ##
The 5 trusts that Nevill manages: 

### Trust 1: Wadsworth Unlimited ###
Wadsworth Unlimited is a small trust with the stocks of 2 companies. This is the dividend account for him. The stocks in this account are: 

  stock |    qty  | purchase price | purchase date
  MSFT  | 150,000 |         $27.45 | 10/23/1997
  BRK.A |   1,000 |      $3,544.18 | 09/16/2001

### Trust 2: Wadsworth International ###
Wadsworth International is the trust that manages all of the family's assets outside of the US. For tax purposes, they have not repatriated the money, so it can only be spent outside of the US. The assets in this account are: 

You see where I'm going. Personas don't have to be limited to humans and/or clients, as well. They could be companies, external agents like regulatory auditors, or even a pet (if you're running a vet, for example).

The promise of BDD is executable documentation. It's not hard to imaging taking these scenario files, combining them with the persona markdown, combining those with the persona SQL, and creating a fully cross-referenced site of the test cases, combined with the personas, combined with the test data generation. That's left as an exercise for the reader.