Tuesday, October 23, 2018

Testing Tour Stop #24: Pair Exploring Voice Experiences with Gem

At this stop of my testing tour I had the honor to pair test with Gem Hill. I got to know her as active podcaster and enjoyed listening to several of the already 100 episodes of Let's Talk About Tests Baby. This year, I had the pleasure to meet Gem in person for the first time at SwanseaCon. We had attended each other's talks, and right after she heard me speaking about my testing tour she scheduled a session with me!
Gem shared with me that she also experienced a pair testing session with Maaret Pyhäjärvi, just like I did on my very first tour stop. She even did a full podcast episode about it, which I can only highly recommend! Since I've met Gem, I was looking forward to our pair testing session. With good reason, I learned a bunch.

Test Setup

Gem proposed to pair test on voice experiences. As we paired remotely we agreed to use a simulator instead of a real device for testing. My experiences with voice apps are very limited, they narrow down to a simple learning project during one of my company's hackrdays last year. Also, I am not a user of voice apps myself. Therefore, I was eagerly looking forward to our session. Some interesting blog posts I came across during preparation were the following.
Gem kindly agreed to prepare a test setup. As she is currently working on voice apps at the BBC, she suggested to tackle the BBC Alexa skill which is basically a player for all BBC radio stations and podcasts. We used the Alexa Simulator and the latest released skill version, publicly available for everyone.

Challenges of Testing Voice Applications

At first, we checked the happy path to start the app and play a radio channel or podcast. Just by doing so, we discovered that the audio player functionality showed issues. Instead of playing the requested source, it triggered a warning that "AudioPlayer is currently an unsupported namespace". Interestingly, the warning message box was not completely displayed on our screen. Well, we're not testing Amazon's services here. Checking later, I found that this is a known limitation of the simulator and audio playback would work on a real device.

Then we tried to change from one radio station to another one and stumbled again. Asking to "switch" to another station opened another skill, TuneIn, attempting to play the requested station there. Unexpected and not desired! We thought maybe native speakers would not say "switch", but rather "change"; however, this command was unknown. What about "go to"? Again, we switched skills. Interestingly, the feature to switch stations is indeed advertised in the skill description.

Let's try to ask for the news, a use case we deemed quite common when thinking about radio stations. To our surprise, the player started the podcast "Breaking the news". Seems a synonym had been stored for "the news". Hm, what about only "news"? The skill told us that this radio station was unknown. We tried "weather" instead, another common information you might expect to get from a radio station. To our surprise, the skill answered with "I can't do that". Strange, we thought if the request was unknown it would fall back to the "unknown radio station" case. We tried "cheese", which again was interpreted as unknown radio station, just as expected. To test the simulator a bit, we tried a clearly misspelled input, "ra7dio 0n3", and this got recognized as unknown podcast now. But why? We tried several cases, using written words as well as voice input, however, we could not determine why the skill reacted in three different ways to unknown input values.

We looked at the JSON input and output the simulator showed us. Everything looked good here. So we moved on and tried different languages, like asking for the Welsh radio station "nan Gàidheal". The skill understood "nine gale" and asked "Should I play nine gale?" Yet another response we hadn't triggered before! We confirmed, and the skill answered that it couldn't find the station and included a long error message within the response. Interesting finding! However, it was again caused by the simulator and not a real-life example. We used written input, and the API could not deal with the special character included, which would simply not happen when using voice input.

We tried a few more things like seeing whether the station "WM 95.6" got recognized when pronouncing the "." as "point" as well as "dot", or only using "WM". All worked. We tried similarly pronounced words and unclear mumbling to see whether one could be mapped to a station. There were many more options we did not try, like checking any customer reviews on the skill, which in themselves provided lots of useful input for testing.

Throughout our session, Gem shared bits and pieces of wisdom when it comes to testing voice apps.
  • The simulator and real devices behave very differently, so normally they always test on real devices. Furthermore, the simulator is way slower than real devices; another reason to use the latter.
  • They do lots of API testing, checking the JSON output, to ensure the implemented functionality is working as expected.
  • They learned how to test efficiently. Why? Well, when you explore without automation you have to listen to the skill welcome and introduction text again and again and again and again... This does not only take a lot of time, it is also very quickly annoying. Very. Quickly. So, exploratory testing without automation can indeed by very inefficient in this case so they try to automate as much as possible on API level.
  • As the team got more and more mature, they now started to think more and more about how to increase testability. With all automation in place, you still have to test with the real devices as it's the only way to get the real experience.
  • It is really hard to design voice experiences. For example, you have no idea whether you have a very experienced user who might get annoyed by having to listen to too much navigation, or a newbie, who simply needs more instructions.
  • Always ask yourself: Are you still testing the skill, or already the device? Testing the device does not make sense, we can't change it anyways. Going through the list of things users actually say and what Alexa understands, however, does make sense as it allows you to add those as synonyms and make the skill more user-friendly.

Retro Time

We quickly found ourselves at the end of our 90 minutes time box. Gem felt the session went very well, using Zoom to share screen control was amazing, she had lots of fun and the time flew by. For another session she would spend more time in the children's skill BBC offered as it had more branching narratives.

For me the session was awesome, Gem provided a perfect test setup for us to practice. I agreed that a more complex skill would be interesting, however, it was also nice to start with such a "basic" skill as the radio player. This challenged us to come up with good test ideas and we still found issues for such a basic skill. I really appreciated Gem for sharing her knowledge with me, I learned a lot in our session and my respect for people designing and testing voice experiences grew even further. Gem shared it's a real challenge and they are learning something new every day. By the way, a fun fact I learned today: both of us are normally listening to audiobooks when at home! It was a real pleasure testing with Gem and I'm already looking forward to seeing her again in real life.

End of October, the end of my experiment, is getting closer. This was the second last stop on my testing tour 2018. One more to go, so stay tuned!

No comments:

Post a Comment