How we do automated mobile device testing at Mozilla – Part 1

Video of this presentation from Release Engineering work week in Portland, 29 April 2014

Part 1: Back to basics

What software do we produce for mobile phones?

  • Firefox for Android (Fennec)
  • Firefox OS (B2G)

What environments do we use for building and testing this software?

Building Testing
Fennec CentOS 6.2
(bld-linux64-ix-*) in-house
(bld-linux64-ec2-*) AWS
Tegra / Panda / Emulator
B2G CentOS 6.2 Emulator

So first key point unveiled:

  • We don’t build on tegras and pandas (we only test!)

Second key point:

  • Fennec is the only product we test on tegras and pandas (we don’t test B2G on real devices)

So why do we test Fennec on tegras, pandas and emulators?

To answer this, first remember the wide variety of builds and tests we perform:

Screenshot from tbpl

Screenshot from tbpl

The answer is:

  • We use tegras to test: Android 2.2 (Froyo)
  • We use pandas to test: Android 4.0 (Ice Cream Sandwich)
  • We use emulators to test: Android 2.3 (Gingerbread) and Android 4.2 (Jelly Bean)

Notice:

  • We don’t test on 3.x (Honeycomb)
  • We don’t test on 4.4 (KitKat)
  • The versions we test on emulators are not sequencial (i.e. we test 2.3 and 4.2 on emulators – with 4.0 tested on pandas – in the middle of these two versions)

What are the main differences between our tegras and pandas?

Tegras Pandas
Look like this: Look like this:
Tegra250_plugged_in 2012-08-06-10.23.28-768x1024
Racked up like this: Racked up like this:
blog_racks_in_faraday_cage 2012-11-09-08.30.50
Older Newer
Running Android 2.2 Running Android 4.0
Hanging in shoe racks Racked professionally in Faraday cages
Can only be reimaged by physically connecting them to a laptop, and pressing buttons in a magical sequence can be remotely reimaged by mozpool (moar to come later)
Not very reliable Quite reliable
Is connected to a “PDU” which allows us to programatically call an API to “pull the power” Is connected to a “relay host” which allows us to programatically call an API to “pull the power”

So as you see, a panda is a more serious piece of kit than a tegra. Think of a tegras as a toy.

So what are tegras and a pandas, actually?

Both are mobile device boards, as you see above, like you would get in a phone, but not actually in a phone.

So why don’t we just use real phones?

  1. Real phones use batteries
  2. Real phones have wireless network

Basically, by using the boards directly, we can:

  1. control the power supply (by connecting them to power units – PDUs) which we have API access to (i.e. we have an API to pull the power to a device)
  2. use ethernet, rather than wireless (which is more reliable, wireless signals don’t interfere with each other, less radiation, …)

OK, so we have phones (or “phone circuit boards”) wired up to our network – but how do we communicate with them?

Fennec historically ran on more platforms than just Android. It also ran on:

  • Windows Mobile
  • the Nokia N900 Maemo device

For this reason, it was decided to create a generic interface, which would be implemented on all supported platforms. The SUT Agent was born.

Please note: nowadays, Fennec it only available for Android 2.2+. It is not available for iOS (iPhone, iPad, iPod Touch), Windows Phone, Windows RT, Bada, Symbian, Blackberry OS, webOS or other operating systems for mobile.

Therefore, the original reason for creating a standard interface to all devices (the SUT Agent) no longer exists. It would also be possible to use a different mechanism (telnet, ssh, adb, …) to communicate with the device. However, this is not what we do.

So what is the SUT Agent, and what can it do?

The SUT Agent is a listener running on the tegra or panda, that can receive calls over its network interface, to tell it to perform tasks. You can think of it as something like an ssh daemon, in the sense that you can connect to it from a different machine, and issue commands.

How do you connect to it?

You simply telnet to the tegra or foopy, on port 20700 or 20701.

Why two ports? Are the different?

Only marginally. The original idea was that users would connect on port 20701, and that automated systems would connect on port 20700. For this reason, if you connect on port 20700, you don’t get a prompt. If you connect on port 20701, you do. However, everything else is the same. You can issue commands to both listeners.

What commands does it support?

The most important command is “help”. It displays this output, showing all available commands:

pmoore@fred:~/git/tools/sut_tools master $ telnet panda-0149 20701
Trying 10.12.128.132...
Connected to panda-0149.p1.releng.scl1.mozilla.com.
Escape character is '^]'.
$>help
run [cmdline] - start program no wait
exec [env pairs] [cmdline] - start program no wait optionally pass env
 key=value pairs (comma separated)
execcwd <dir> [env pairs] [cmdline] - start program from specified directory
execsu [env pairs] [cmdline] - start program as privileged user
execcwdsu <dir> [env pairs] [cmdline] - start program from specified directory as privileged user
execext [su] [cwd=<dir>] [t=<timeout>] [env pairs] [cmdline] - start program with extended options
kill [program name] - kill program no path
killall - kill all processes started
ps - list of running processes
info - list of device info
 [os] - os version for device
 [id] - unique identifier for device
 [uptime] - uptime for device
 [uptimemillis] - uptime for device in milliseconds
 [sutuptimemillis] - uptime for SUT in milliseconds
 [systime] - current system time
 [screen] - width, height and bits per pixel for device
 [memory] - physical, free, available, storage memory
 for device
 [processes] - list of running processes see 'ps'
alrt [on/off] - start or stop sysalert behavior
disk [arg] - prints disk space info
cp file1 file2 - copy file1 to file2
time file - timestamp for file
hash file - generate hash for file
cd directory - change cwd
cat file - cat file
cwd - display cwd
mv file1 file2 - move file1 to file2
push filename - push file to device
rm file - delete file
rmdr directory - delete directory even if not empty
mkdr directory - create directory
dirw directory - tests whether the directory is writable
isdir directory - test whether the directory exists
chmod directory|file - change permissions of directory and contents (or file) to 777
stat processid - stat process
dead processid - print whether the process is alive or hung
mems - dump memory stats
ls - print directory
tmpd - print temp directory
ping [hostname/ipaddr] - ping a network device
unzp zipfile destdir - unzip the zipfile into the destination dir
zip zipfile src - zip the source file/dir into zipfile
rebt - reboot device
inst /path/filename.apk - install the referenced apk file
uninst packagename - uninstall the referenced package and reboot
uninstall packagename - uninstall the referenced package without a reboot
updt pkgname pkgfile - unpdate the referenced package
clok - the current device time expressed as the number of millisecs since epoch
settime date time - sets the device date and time
 (YYYY/MM/DD HH:MM:SS)
tzset timezone - sets the device timezone format is
 GMTxhh:mm x = +/- or a recognized Olsen string
tzget - returns the current timezone set on the device
rebt - reboot device
adb ip|usb - set adb to use tcp/ip on port 5555 or usb
activity - print package name of top (foreground) activity
quit - disconnect SUTAgent
exit - close SUTAgent
ver - SUTAgent version
help - you're reading it
$>quit
quit
$>Connection closed by foreign host.

Typically we use the SUT Agent to query the device, push Fennec and tests onto it, run tests, perform file system commands, execute system calls, and retrieve results and data from the device.

What is the difference between quit and exit commands?

I’m glad you asked. “quit” will terminate the session. “exit” will shut down the sut agent. You really don’t want to do this. Be very careful.

Is the SUT Agent a daemon? If it dies, will it respawn?

No, it isn’t, but yes, it will!

The SUT Agent can die, and sometimes does. However, it has a daddy, who watches over it. The Watcher is a daemon, also running on the pandas and tegras, that monitors the SUT Agent. If the SUT Agent dies, the Watcher will spawn a new SUT Agent.

Probably it would be possible to have the SUT Agent as an auto-respawning daemon – I’m not sure why it isn’t this way.

Who created the Watcher?

Legend has it, that the Watcher was created by Bob Moss.

Where is the source code for the SUT Agent and the Watcher?

The SUT Agent codebase lives in the firefox desktop source tree: http://hg.mozilla.org/mozilla-central/file/tip/build/mobile/sutagent

The Watcher code lives there too: http://hg.mozilla.org/mozilla-central/file/tip/build/mobile/sutagent/android/watcher

Does the Watcher and SUT Agent get automatically deployed when there are new changes?

No. If there are changes, they need to be manually built (no continuous integration) and manually deployed to all tegras, and a new image needs to be created for pandas in mozpool (will be explained later).

Fortunately, there are very rarely changes to either component.

Summary part 1

So we’ve learned:

  • Tegras and Pandas are used for testing Fennec for Android
  • They run different versions of the Android OS (2.2 vs 4.0)
  • We don’t build anything on them
  • Tegras are older/inferior/less reliable than pandas
  • We can’t reimage tegras programmatically, but pandas we can
  • There is a SUT Agent that runs on both the tegras and the pandas, and provides a mechanism to interact with it
  • There is a Watcher that keeps the SUT Agent alive
  • Whenever a new version of SUT Agent or Watcher is required, this needs to be manually built and rolled out to devices

> Part 2

One thought on “How we do automated mobile device testing at Mozilla – Part 1

  1. Pingback: How we do automated mobile device testing at Mozilla – Part 2 | The Open Web

Leave a comment