Cross-Posted from Mozilla's Security Blog:
When fixing any bug, there is a risk of introducing new bugs, which we call regressions. Regressions caused by security fixes can be especially problematic because shipping a buggy security update can erode user trust for future updates.
Fortunately, we discover most regressions before we ship, thanks in large part to security researchers whose patience gives us time to review and test each patch well. But sometimes security releases are delayed slightly because we notice a regression as we are getting ready to release. Worse, we sometimes discover regressions after shipping a release.
This post explores patterns among regressions and suggests changes that could help us catch them earlier.Motivation
In six cases since Firefox 2, we decided regressions in minor updates were serious enough to rush a followup regression-fix update. For comparison, we shipped 38 other minor updates, mostly to fix security holes, since releasing Firefox 2 in October 2006.Regression Caused followup release Release date Several bugs Firefox 126.96.36.199 November 1, 2007 Bug 405584 – canvas.drawImage broken Firefox 188.8.131.52 November 30, 2007 Bug 425576 – topsite crash Firefox 184.108.40.206 April 16, 2008 Bug 454708 – non-ascii password issue Firefox 3.0.3 September 26, 2008 Bug 489322 – extension crash Firefox 3.0.10 April 27, 2009 Bug 535193 – proxy auth issue Firefox 3.0.17 and Firefox 3.5.7 January 5, 2010
It’s hard to see trends and patterns in a list of six bugs, so I looked at a larger set of bug reports: all bugs reports marked as regressions caused by security bug fixes. I focused on the 176 such bugs filed between December 2007 and January 2010.
Most of these regressions were fixed before release and many of them were minor. But understanding why they weren’t caught immediately can help prevent major incidents in the future.Overall regression frequency Period Regressions reported that were caused by security fixes 2008 first half 71 2008 second half 28 2009 first half 46 2009 second half 25
Regressions seem to be declining in frequency. This calls for cake!
I believe regressions are declining in frequency mostly because of improvements in automated testing. The number of automated tests increased fourfold since Firefox 3.0: every trunk checkin now results in 220,000 tests running on each platform. Regressions caught by automated tests tend to be fixed immediately, without even being filed in Bugzilla.
Further improvements to automating testing could reduce the number of regressions even more. I noticed four patterns in the bugs that suggest areas where automated testing could be improved: (1) web site breakage; (2) firefox extensions; (3) printing; and (4) document navigation.Web site breakage
About 48 of the 176 bug reports involved specific web sites. The sites ranged from private intranet sites to the most popular sites on the web. When simply loading a site is enough to reproduce the bug, it may be easy to detect it through automated testing.
Crashes, hangs, and leaks are easy to identify in automated testing. Carsten Book and Bob Clary have set up automated loading of top sites to find these bugs.
Visual problems are trickier to identify in automated testing, because no algorithm can reliably determine when a web site “looks wrong”. But determining whether a site’s appearance has changed can be done automatically. Since security fixes do not intentionally change the appearance of web sites, a tool to detect web site appearance changes could catch these regressions.
In addition to improving automated testing, we should remove support for some regression-prone, Mozilla-specific web features: signed scripts, enablePrivilege, XUL, and XBL. (Eight of the regressions affected sites using these features). These features have proven to be not especially useful to web developers, hard to keep stable, and hard to make secure. XUL and XBL, in particular, have together been responsible for over a hundred vulnerabilities.Firefox extensions
About 21 of the regressions involved Firefox extensions. One way we could detect these bugs is to run automated tests with extensions installed. We should run Firefox’s entire test suite with popular extensions, and we should run at least basic tests with every extension.
Some regressions affect a feature of an extension rather than a feature built into Firefox. To catch these, we would need to let extension developers provide tests for their extension’s features.
We can also let Firefox developers search the source code of Firefox extensions, so when developers change APIs, they can tell which extensions the changes are likely to affect. We’ve already started to do this.Printing
Five of the regressions involved printing. Printing bugs frequently make it to release users in part because most nightly users do not print web pages frequently.
Five of the regressions involve document navigation. I suspect document navigation is prone to security bugs, and prone to regressions when those security bugs are fixed, for three reasons:
First, it is a place where many web features interact. Loads can be triggered by users (back, reload, link clicks, bookmarks clicks, submitting a form) or scripts (setting location, location.replace, history.go). Loads trigger many events (unload, beforeunload, pagehide), and scripts running from those events can trigger additional navigation. Additional layers of complexity arise from restoration of form contents, frames, and in-page navigation between hashes.
Second, expectations are complicated. To prevent accidental or deliberate “traps”, the “Back” button has to skip over sequences of purely-script-triggered navigations, which are often difficult to distinguish from user-triggered navigations. At the same time, AJAX web applications have good reasons to want the “Back” button to only change the hash part of the URL rather than leave the page.
Third, document navigation semantics have evolved along with the web, rather than being designed in a holistic way. Browser developers tried to fix security problems and web compatibility problems as they were discovered. This resulted in messy expectations and even messier implementations.
I suggested to our resident exhaustive-testing expert that we try to create a specification and a corresponding set of tests for document navigation and related features. In addition, I wonder whether rewriting document navigation code could simplify it and make it less fragile.Nightly and beta testing
Firefox nightly build testers found more regressions than beta and release users combined. This is a testament to the power of the Mozilla community, which includes over ten thousand Firefox nightly testers and an incredible bug triage team.
But it’s also worrying: it suggests that in cases when a security patch can only have a few days of nightly testing, rather than several weeks, a major regression could easily slip through unnoticed. A patch might get only a few days of nightly testing if a developer feels that landing a fix effectively discloses a vulnerability, or when we’re rushing to fix a security hole that has already been disclosed.
Automated tests will never be able to catch every regression, so we want to help nightly and beta testers identify bugs as effectively as they can, whether a patch undergoes testing for weeks or days. David Boswell has been improving the nightly start page to highlight the best ways to look for bugs. Developers can now post temporary notes on this page to highlight features that need special testing.
We’re also making it easier to be a nightly tester. Automated tests now keep the worst regressions out of nightlies, making nightlies less prone to breaking. Automatic nightly updates now work for all localizations, and we’re planning to make nightly updates smoother.
To expand beta testing, we’re thinking of adding a checkbox to update preferences to let users choose to become beta testers. Beta testers are not always as active at reporting bugs as nightly users, but their numbers allow clear trends to appear in crash statistics.Raw data
Maybe you can spot patterns that I didn’t, or suggest other methods of automated testing that could catch these kinds of bugs earlier?
I focused on the symptoms and testcases of the regressions. Someone reading the patches might notice different patterns. Would more bugs have been caught by writing new custom static analyses, paying attention to new compiler warnings, or getting some testers to report assertion failures from running debug builds? Could some architectural change have prevented a class of regressions (or the security bugs whose fixes caused them)?
These links show the 176 regressions caused by security bug fixes between December 2007 and January 2010. The spreadsheet contains my guesses as to how we found each bug and how we could have found it earlier.
Thanks to Murali Nandigama for helping with Bugzilla queries. Dan Veditz, Melissa Shapiro, and Drew Ruderman provided valuable feedback on drafts of this post.
Security bug hunter