You are not logged in.

#1 2017-11-08 19:16:31

cloverskull
Member
Registered: 2015-10-01
Posts: 307

Performance tuning the boot process

Hey guys,

I have a fairly robust computer and noticed that I had to wait around 30 to 40 seconds after pressing the power button until I had a login prompt in lightdm. This thread is me documenting my efforts to tune the startup/bootstrap performance.

I should mention that I'm using helium-dev, from a barebones Debian Stretch, which I built using the Bunsenlabs net-install method.

Here's what I did to try to find some performance bottlenecks.

First, I ran systemd-analyze

ryan@bunsenlabs:~$ sudo systemd-analyze 
Startup finished in 5.928s (kernel) + 32.715s (userspace) = 38.644s

So yeah, about 40 seconds from power button to GUI. Sloooooooow...

Let's see what we can blame this on with systemd-analyze blame

ryan@bunsenlabs:~$ sudo systemd-analyze blame
         22.962s NetworkManager-wait-online.service
          3.853s systemd-udev-settle.service
          3.159s keyboard-setup.service
          2.747s dev-mapper-debian\x2d\x2dvg\x2ddebian.device
          2.415s ModemManager.service
          1.602s NetworkManager.service
          1.555s loadcpufreq.service
          1.414s networking.service
          1.245s polkit.service
           965ms rsyslog.service
           908ms systemd-udevd.service
           881ms systemd-logind.service
           845ms lm-sensors.service
           796ms alsa-restore.service
           736ms systemd-tmpfiles-setup-dev.service
           628ms exim4.service
           579ms systemd-rfkill.service
           421ms systemd-modules-load.service
           384ms systemd-remount-fs.service
           366ms dev-mqueue.mount
           365ms systemd-random-seed.service
           355ms dev-hugepages.mount
           354ms sys-kernel-debug.mount
           347ms wpa_supplicant.service
           315ms systemd-backlight@backlight:acpi_video0.service
           280ms ntp.service
           277ms lightdm.service
           266ms upower.service
           251ms systemd-tmpfiles-setup.service
           242ms systemd-journal-flush.service
           240ms systemd-journald.service
           219ms kmod-static-nodes.service
           216ms lvm2-activation-early.service
           203ms systemd-fsck@dev-sda1.service
           173ms systemd-backlight@leds:dell::kbd_backlight.service
           115ms systemd-sysctl.service
           115ms user@109.service
            93ms dev-sda2.swap
            91ms hddtemp.service
            82ms console-setup.service
            78ms systemd-udev-trigger.service
            74ms lvm2-activation.service
            73ms boot.mount
            63ms cpufrequtils.service
            52ms lvm2-monitor.service
            36ms systemd-update-utmp.service
            27ms user@1000.service
            21ms systemd-user-sessions.service
             5ms systemd-update-utmp-runlevel.service
             2ms sys-fs-fuse-connections.mount

So, it looks like NetworkManager-wait-online.service is taking a whopping ~23 seconds(?!) to start. Uh, I don't actually know if this is necessary considering I manage my network connections once logged in via nm-applet. Let's investigate further. Are things "blocking" on NetworkManager-wait-online.service? We can check what's critical with systemd-analyze critical-chain

ryan@bunsenlabs:~$ sudo systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @32.709s
└─multi-user.target @32.709s
  └─exim4.service @32.080s +628ms
    └─network-online.target @32.077s
      └─NetworkManager-wait-online.service @9.113s +22.962s
        └─NetworkManager.service @7.510s +1.602s
          └─dbus.service @6.821s
            └─basic.target @6.545s
              └─sockets.target @6.465s
                └─dbus.socket @6.465s
                  └─sysinit.target @6.465s
                    └─sys-fs-fuse-connections.mount @28.651s +2ms
                      └─systemd-modules-load.service @1.182s +421ms
                        └─system.slice @1.172s
                          └─-.slice @1.049s

So yeah, we can see that processes are waiting for this to complete. We can further visualize this in a plot with systemd-analyze plot

ryan@bunsenlabs:~$ sudo systemd-analyze plot > plot.svg

And here is mine

I'm curious to see the current status of NetworkManager-wait-online.service

ryan@bunsenlabs:~$ sudo systemctl status NetworkManager-wait-online.service
● NetworkManager-wait-online.service - Network Manager Wait Online
   Loaded: loaded (/lib/systemd/system/NetworkManager-wait-online.service; enabled; vendor preset: enabled)
   Active: active (exited) since Wed 2017-11-08 10:39:10 PST; 9min ago
     Docs: man:nm-online(1)
  Process: 638 ExecStart=/usr/bin/nm-online -s -q --timeout=30 (code=exited, status=0/SUCCESS)
 Main PID: 638 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 4915)
   CGroup: /system.slice/NetworkManager-wait-online.service

Nov 08 10:38:48 bunsenlabs systemd[1]: Starting Network Manager Wait Online...
Nov 08 10:39:10 bunsenlabs systemd[1]: Started Network Manager Wait Online.

So it looks like a fire-and-forget application to ask NetworkManager whether the network is connected. This doesn't seem super important right now, during the boot process. I'd prefer to handle this in the GUI, manually, via nm-applet. For reference, here's further reading on nm-online. I'm going to disable this in systemd and see what happens.

ryan@bunsenlabs:~$ sudo systemctl disable NetworkManager-wait-online.service
Removed /etc/systemd/system/network-online.target.wants/NetworkManager-wait-online.service.

Ok, booting felt a lot faster. I'll just paste the entirety of what I did previously in one codeblock, but understand this is for everything now that is post NetworkManager-wait-online.service removal.

ryan@bunsenlabs:~$ sudo systemd-analyze
Startup finished in 5.885s (kernel) + 10.300s (userspace) = 16.186s

ryan@bunsenlabs:~$ sudo systemd-analyze blame
          4.147s systemd-udev-settle.service
          3.468s dev-mapper-debian\x2d\x2dvg\x2ddebian.device
          3.309s keyboard-setup.service
          1.950s ModemManager.service
          1.878s NetworkManager.service
          1.452s loadcpufreq.service
          1.284s exim4.service
          1.222s polkit.service
           994ms networking.service
           874ms rsyslog.service
           833ms systemd-tmpfiles-setup-dev.service
           787ms systemd-udevd.service
           763ms lm-sensors.service
           748ms lightdm.service
           729ms systemd-logind.service
           648ms alsa-restore.service
           559ms wpa_supplicant.service
           538ms upower.service
           479ms systemd-modules-load.service
           415ms ntp.service
           232ms systemd-rfkill.service
           230ms systemd-backlight@leds:dell::kbd_backlight.service
           230ms systemd-backlight@backlight:acpi_video0.service
           226ms systemd-random-seed.service
           217ms lvm2-activation-early.service
           210ms systemd-remount-fs.service
           194ms systemd-fsck@dev-sda1.service
           176ms dev-mqueue.mount
           176ms dev-hugepages.mount
           175ms sys-kernel-debug.mount
           173ms kmod-static-nodes.service
           164ms dev-sda2.swap
           159ms systemd-tmpfiles-setup.service
           151ms hddtemp.service
           128ms systemd-journald.service
           121ms systemd-journal-flush.service
           117ms udisks2.service
           115ms user@109.service
           109ms systemd-update-utmp.service
           102ms console-setup.service
            74ms systemd-sysctl.service
            72ms lvm2-activation.service
            65ms boot.mount
            59ms systemd-udev-trigger.service
            56ms cpufrequtils.service
            46ms lvm2-monitor.service
            24ms user@1000.service
            23ms systemd-user-sessions.service
             8ms systemd-update-utmp-runlevel.service
             3ms sys-fs-fuse-connections.mount

ryan@bunsenlabs:~$ sudo systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @10.291s
└─multi-user.target @10.291s
  └─exim4.service @9.006s +1.284s
    └─network-online.target @9.005s
      └─network.target @9.005s
        └─NetworkManager.service @7.126s +1.878s
          └─dbus.service @6.633s
            └─basic.target @6.594s
              └─sockets.target @6.594s
                └─dbus.socket @6.594s
                  └─sysinit.target @6.553s
                    └─systemd-update-utmp.service @6.443s +109ms
                      └─systemd-tmpfiles-setup.service @6.283s +159ms
                        └─local-fs.target @6.281s
                          └─boot.mount @6.215s +65ms
                            └─systemd-fsck@dev-sda1.service @5.997s +194ms
                              └─local-fs-pre.target @5.996s
                                └─lvm2-monitor.service @5.949s +46ms
                                  └─dm-event.socket @1.192s
                                    └─-.mount @1.016s
                                      └─system.slice @1.148s
                                        └─-.slice @1.016s

ryan@bunsenlabs:~$ sudo systemd-analyze plot > plot-new.svg

ryan@bunsenlabs:~$ sudo systemctl status NetworkManager-wait-online.service
● NetworkManager-wait-online.service - Network Manager Wait Online
   Loaded: loaded (/lib/systemd/system/NetworkManager-wait-online.service; disabled; vendor preset: 
   Active: inactive (dead)
     Docs: man:nm-online(1)

So you can see we've cut our bootup time by 2/3. That's pretty awesome. And here's my new plot.

Removing NetworkManager-wait-online.service didn't adversely impact anything at all. Is there a reason we have it enabled?

Also, any ideas where I can go from here to further performance tune the boot process?

Offline

#2 2017-11-08 19:44:05

cloverskull
Member
Registered: 2015-10-01
Posts: 307

Re: Performance tuning the boot process

Was able to boost things a bit further...by maybe a second or so. smile

Basically, this laptop has two spinny disks configured via LVM to be one huge volume. However, by default (for power saving reasons) linux was only activating one disk at a time during bootup. This is called "staggered spin-up". You can check if you have this by running

$ sudo dmesg | grep SSS

For example, here's what I had

ryan@bunsenlabs:~$ sudo dmesg | grep SSS
[    1.820793] ahci 0000:00:1f.2: SSS flag set, parallel bus scan disabled

So it wasn't scanning the disks in parallel, it was doing them one at a time. Wasting an entire valuable second of my life!

This is super simple to switch off. We just need to modify our kernel line in grub.

Warning: You may want to add this manually as a boot parameter when you boot your computer, via grub, to test that it functions properly.

To modify the kernel line in grub permanently, just

$ sudo nano /etc/default/grub

Look for GRUB_CMDLINE_LINUX_DEFAULT and append libahci.ignore_sss=1

As an example, here's that line (plus a few surrounding lines) in my /etc/default/grub

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet libahci.ignore_sss=1"
GRUB_CMDLINE_LINUX=""

After that, we just need to use it to generate a fresh /boot/grub/grub.cfg, which can be done like so

ryan@bunsenlabs:~$ sudo grub-mkconfig -o /boot/grub/grub.cfg
Generating grub configuration file ...
Found background image: /usr/share/images/desktop-base/desktop-grub.png
Found linux image: /boot/vmlinuz-4.9.0-4-amd64
Found initrd image: /boot/initrd.img-4.9.0-4-amd64
grub background_image is BL default, setting text colors
done

And, of course (for the sake of comparison) here is all of my systemd bootup perf information, same format as previous post

ryan@bunsenlabs:~$ sudo systemd-analyze
[sudo] password for ryan: 
Startup finished in 4.358s (kernel) + 10.433s (userspace) = 14.792s

ryan@bunsenlabs:~$ sudo systemd-analyze blame
          4.206s keyboard-setup.service
          4.032s systemd-udev-settle.service
          2.702s dev-mapper-debian\x2d\x2dvg\x2ddebian.device
          2.496s ModemManager.service
          1.991s NetworkManager.service
          1.145s exim4.service
           992ms loadcpufreq.service
           905ms systemd-rfkill.service
           890ms networking.service
           884ms systemd-logind.service
           830ms systemd-tmpfiles-setup-dev.service
           743ms polkit.service
           731ms lm-sensors.service
           616ms systemd-udevd.service
           583ms upower.service
           566ms systemd-backlight@backlight:acpi_video0.service
           534ms systemd-modules-load.service
           480ms ntp.service
           399ms lightdm.service
           369ms rsyslog.service
           332ms systemd-backlight@leds:dell::kbd_backlight.service
           331ms wpa_supplicant.service
           330ms dev-hugepages.mount
           295ms systemd-tmpfiles-setup.service
           295ms dev-mqueue.mount
           288ms systemd-remount-fs.service
           270ms sys-kernel-debug.mount
           241ms lvm2-activation-early.service
           232ms dev-sda2.swap
           219ms systemd-fsck@dev-sda1.service
           218ms alsa-restore.service
           175ms systemd-udev-trigger.service
           159ms systemd-random-seed.service
           147ms systemd-journal-flush.service
           130ms kmod-static-nodes.service
           113ms systemd-journald.service
            98ms boot.mount
            83ms user@109.service
            73ms hddtemp.service
            72ms lvm2-activation.service
            68ms systemd-update-utmp.service
            65ms cpufrequtils.service
            55ms lvm2-monitor.service
            27ms user@1000.service
            27ms console-setup.service
            25ms systemd-sysctl.service
            17ms systemd-user-sessions.service
             9ms systemd-update-utmp-runlevel.service
             3ms sys-fs-fuse-connections.mount

ryan@bunsenlabs:~$ sudo systemd-analyze critical-chain 
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @10.423s
└─multi-user.target @10.423s
  └─exim4.service @9.277s +1.145s
    └─network-online.target @9.276s
      └─network.target @9.275s
        └─NetworkManager.service @7.282s +1.991s
          └─dbus.service @6.804s
            └─basic.target @6.787s
              └─sockets.target @6.787s
                └─uuidd.socket @6.787s
                  └─sysinit.target @6.786s
                    └─systemd-update-utmp.service @6.715s +68ms
                      └─systemd-tmpfiles-setup.service @6.418s +295ms
                        └─local-fs.target @6.413s
                          └─boot.mount @6.314s +98ms
                            └─systemd-fsck@dev-sda1.service @6.071s +219ms
                              └─local-fs-pre.target @6.070s
                                └─lvm2-monitor.service @6.014s +55ms
                                  └─lvm2-lvmetad.socket @1.191s
                                    └─-.mount @1.000s
                                      └─system.slice @1.131s
                                        └─-.slice @1.000s

ryan@bunsenlabs:~$ sudo systemd-analyze plot > plot-newest.svg 

And me new plot is here.

Offline

Board footer

Powered by FluxBB