Brno Hat

Jiri Eischmann's Blog

How ABRT helped us make Fedora Workstation more stable

Last week, the official Fedora Project account asked users on social networks why Fedora is their distribution of choice. Probably the most frequent answer was that Fedora is THE GNOME distro, that it has the best supported GNOME, which really made me happy, but what made me even happier was that I found a lot of answers like “You won’t believe it, but I use Fedora for stability”. Indeed, the stability of Fedora has improved a lot since I started using it, especially in the last releases. How did we achieve it?

There are several reasons why Fedora is more stable than ever before. What plays an important role is that the significant changes have settled. GNOME 3 matures, the wild beginnings of systemd are also over, Anaconda has stabilized a lot, too. Another reasons is the Fedora QA team, which now has 10 people who test Fedora full time. This is something no other community can enjoy. If you add volunteers and the fact that the team uses more and more of automated testing, you get a lot of test coverage. What I think has also helped is focus. We created three official editions – Workstation, Server, Cloud and defined what MUST be good (the three editions) and what CAN be good (everything else – spins, labs,…). We have also changed the strategy. Fedora is supposed to be progressive, but it doesn’t mean we need to force immature features on users. However, we also doesn’t want to be too conservative and become another Debian. I think we have found a good balance. The strategy is to have stable defaults and experimental features as opt-ins that are just a few clicks away for early adopters who would like to test them (this strategy was used for DNF, and now we’re using it for Wayland). This way, Fedora is stable enough for users who just want to use it, and still fun for those who like living on the edge of future technology.

However, today I’d like to focus on a different factor behind improved stability of Fedora – ABRT, which stands for Automatic Bug Reporting Tool. It’s a tool that helps users report software problems. One of the main problems in software development is to get reports that are detailed enough so that the problem can be identified and fixed. If the report states: “I clicked a button and the window disappeared”, it doesn’t help you find the problem and it most likely won’t get fixed. But if the user attaches a backtrace and a set of relevant logs, the chances go up sharply. That was the first milestone for ABRT – to collect all relevant data in the system and help the user report it.

But the results was bugzilla flooded with ABRT reports. Developers simply didn’t have capacity to go through them and analyze them. They usually ended up filtering ABRT reports out. That was why ABRT went on to another milestone – to create statistics that would help maintainers identify which bugs affect a lot of users (and thus should be fixed) and which are just corner cases. And this finally made ABRT a very interesting aid for developers.

The statistics can be found on Retrace Server. They provide a lot of information. Not only can you find out how many crashes the bug is responsible for, which is the most important information for prioritization, but you can also learn in which release of Fedora, on which architecture etc. What is also very useful is that ABRT can group crashes together based on similarity. Then you can find out that, for instance, crashes in ten different components are caused by a bug in a single library these components are using. The number of reports in bugzilla has decreased significantly, too, because ABRT started identifying duplicates and creating reports only when enough info is collected.

abrt-hlaseni
Stats of a problem.

The desktop team started using ABRT roughly a year and half ago. Developers are told to check the stats if their components pop up in the chart of most frequent crashes. I regularly check it, too. And if I find something my team is responsible for, I notify the responsible developer about it. But it’s been quite boring lately. If you check stats from stable releases, you won’t find desktop components so easily. And ff you do find something from the desktop after all, it’s usually already marked as fixed.

But it was not always like this. Fedora is primarily a desktop distribution, so desktop components are heavily used and they were high on the list of most frequent crashes. But ABRT enabled us to prioritize and focus on the most frequent crashes. And you can see the difference in the real-life usage. I rarely experience a crash in GNOME or default Workstation applications.

After good experience with ABRT in GNOME, I also advised KDE maintainers in my team to use it to prioritize. When they went through the list, they found Plasma crashes that had an origin lower in the stack (X11 or drivers), so not easily fixable for them, but they also found quite a few trivial oneliners which affected thousands of users. The ABRT stats are also used by some of our partners. I know Intel uses them to monitor problems in their video driver (btw kernel is associated with most of the frequent problems, but in this case, the problems are not crashes, but rather kernel module oops which users don’t even notice). CentOS started using ABRT, too. That’s helpful if you want to identify frequent crashes in RHEL because if it crashes in CentOS, it most likely crashes in RHEL as well.

ABRT is also useful for users. Not only can it collect relevant information about a crash for you, and make it much easier to report it in bugzilla, but if you don’t want to deal with any bug reporting, you can at least let it send microreports which build the statistics. By doing so, you let us know that the crash that could be fully reported by someone else affects you, too. You can even go for silent microreporting which doesn’t disturb you at all. That’s what I turn on on computers of average users. They will never report a single problem themselves, but by sending microreports they still contribute to quality of Fedora.

I also use ABRT to report problems in software that is not part of Fedora repositories. ABRT collects info about a crash for me and I can pick what I need from it or send it to developers as a whole package.

ABRT has really significantly contributed to quality of Fedora, at least in the desktop part. Kudos to all who have worked on the project for that!

One response to “How ABRT helped us make Fedora Workstation more stable”

  1. […] desktop team have been using it intensively for some time. I’ve already written about it in one of my previous posts. It’s really helped us make Fedora much more […]

Leave a Reply

Your email address will not be published. Required fields are marked *