Each process has three different priorities stored in structproc: p_usrpri, p_nativepri, and p_priority.
The p_usrpri member is the user priority of the process calculated from a process' estimated CPU time and
nice level.
The p_nativepri member is the saved priority used by propagate_priority(). When a process obtains a
mutex, its priority is saved in p_nativepri. While it holds the mutex, the process's priority may be
bumped by another process that blocks on the mutex. When the process releases the mutex, then its
priority is restored to the priority saved in p_nativepri.
The p_priority member is the actual priority of the process and is used to determine what runqueue(9) it
runs on, for example.
The curpriority_cmp() function compares the cached priority of the currently running process with process
p. If the currently running process has a higher priority, then it will return a value less than zero.
If the current process has a lower priority, then it will return a value greater than zero. If the
current process has the same priority as p, then curpriority_cmp() will return zero. The cached priority
of the currently running process is updated when a process resumes from tsleep(9) or returns to userland
in userret() and is stored in the private variable curpriority.
The maybe_resched() function compares the priorities of the current thread and td. If td has a higher
priority than the current thread, then a context switch is needed, and KEF_NEEDRESCHED is set.
The propagate_priority() looks at the process that owns the mutex p is blocked on. That process's
priority is bumped to the priority of p if needed. If the process is currently running, then the
function returns. If the process is on a runqueue(9), then the process is moved to the appropriate
runqueue(9) for its new priority. If the process is blocked on a mutex, its position in the list of
processes blocked on the mutex in question is updated to reflect its new priority. Then, the function
repeats the procedure using the process that owns the mutex just encountered. Note that a process's
priorities are only bumped to the priority of the original process p, not to the priority of the
previously encountered process.
The resetpriority() function recomputes the user priority of the ksegrp kg (stored in kg_user_pri) and
calls maybe_resched() to force a reschedule of each thread in the group if needed.
The roundrobin() function is used as a timeout(9) function to force a reschedule every sched_quantum
ticks.
The roundrobin_interval() function simply returns the number of clock ticks in between reschedules
triggered by roundrobin(). Thus, all it does is return the current value of sched_quantum.
The sched_setup() function is a SYSINIT(9) that is called to start the callout driven scheduler
functions. It just calls the roundrobin() and schedcpu() functions for the first time. After the
initial call, the two functions will propagate themselves by registering their callout event again at the
completion of the respective function.
The schedclock() function is called by statclock() to adjust the priority of the currently running
thread's ksegrp. It updates the group's estimated CPU time and then adjusts the priority via
resetpriority().
The schedcpu() function updates all process priorities. First, it updates statistics that track how long
processes have been in various process states. Secondly, it updates the estimated CPU time for the
current process such that about 90% of the CPU usage is forgotten in 5 * load average seconds. For
example, if the load average is 2.00, then at least 90% of the estimated CPU time for the process should
be based on the amount of CPU time the process has had in the last 10 seconds. It then recomputes the
priority of the process and moves it to the appropriate runqueue(9) if necessary. Thirdly, it updates
the %CPU estimate used by utilities such as ps(1) and top(1) so that 95% of the CPU usage is forgotten in
60 seconds. Once all process priorities have been updated, schedcpu() calls vmmeter() to update various
other statistics including the load average. Finally, it schedules itself to run again in hz clock
ticks.
The setrunnable() function is used to change a process's state to be runnable. The process is placed on
a runqueue(9) if needed, and the swapper process is woken up and told to swap the process in if the
process is swapped out. If the process has been asleep for at least one run of schedcpu(), then
updatepri() is used to adjust the priority of the process.
The updatepri() function is used to adjust the priority of a process that has been asleep. It
retroactively decays the estimated CPU time of the process for each schedcpu() event that the process was
asleep. Finally, it calls resetpriority() to adjust the priority of the process.