织梦CMS - 轻松建站从此开始!

罗索

Linux Debugging 1 - Kernel Introduction - 行风 - 博客

落鹤生 发布于 2014-12-12 18:01 点击:次 
kernel introduction, scheduler, CFS
TAG: Debugging  

1. linux kernel version

  1. /home/a/j/nomad2:uname -r 
  2. 2.6.24-22-generic 

Major.Minor.Release[extra]

Minor: Even minor numbers denote stable "production" kernel, whereas odd ones are reserved for development kernels.

API changes in the 2.6 kernel series refer to http://lwn.net/Articles/2.6-kernel-api/

解压kernel代码:

  1. >>gzip -cd linux-2.6.26.tar.gz | tar xvf - 
  2. >>tar zxvf linux-2.6.26.tar.gz 

打补丁命令:

  1. >>gzip -cd patch.gz | patch -p0 
  2. >>bzip2 -cd patch.gz | patch -p0 

2. Linux Kernel的Feature

1) Implement generic UNIX kernel structure

2) Monolithic, thread kernel

3) user threads are LWP

4) kernel is preemptive (in 2.6)

5) small footprint

6) Modular and Extensible

3. 2.6的内核实现了抢占调度,抢占点在于

- On return from Interrupt handlers

- On blocking Kernel calls

- when the Kernel preemption is reenabled (call preempt_enable)

- when a kernel thread voluntarily yields by calling schedule.

4. Schedulers

refer to : http://lwn.net/Articles/230574/

1) in 2.4 version, scheduler is O(n), while in 2.6 it is O(1), which is more efficient and provide natural CPU affinity.  It has one runqueue per processor(每个CPU都有自己的runqueue), and each runqueue contains 2 priority arrays. And it use bitmap to find the highest priority task(进程或者线程) to run.

We can check the codes from LXR.

RT task priority is from 0 to 100, and normal task priority is from 101 to 139 (from sched.h)

  1. 510/* 
  2. 511 * Priority of a process goes from 0..MAX_PRIO-1, valid RT 
  3. 512 * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH 
  4. 513 * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority 
  5. 514 * values are inverted: lower p->prio value means higher priority. 
  6. 515 * 
  7. 516 * The MAX_USER_RT_PRIO value allows the actual maximum 
  8. 517 * RT priority to be separate from the value exported to 
  9. 518 * user-space.  This allows kernel threads to set their 
  10. 519 * priority to a value higher than any user task. Note: 
  11. 520 * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO. 
  12. 521 */ 
  13. 522 
  14. 523#define MAX_USER_RT_PRIO        100 
  15. 524#define MAX_RT_PRIO             MAX_USER_RT_PRIO 
  16. 525 
  17. 526#define MAX_PRIO                (MAX_RT_PRIO + 40) 

调度算法(from sched.c)

  1. 2198new_array: 
  2. 2199        /* Start searching at priority 0: */ 
  3. 2200        idx = 0; 
  4. 2201skip_bitmap: 
  5. 2202        if (!idx) 
  6. 2203                idx = sched_find_first_bit(array->bitmap); 
  7. 2204        else 
  8. 2205                idx = find_next_bit(array->bitmap, MAX_PRIO, idx); 
  9. 2206        if (idx >= MAX_PRIO) { 
  10. 2207                if (array == busiest->expired && busiest->active->nr_active) { 
  11. 2208                        array = busiest->active; 
  12. 2209                        dst_array = this_rq->active; 
  13. 2210                        goto new_array; 
  14. 2211                } 
  15. 2212                goto out; 
  16. 2213        } 
  17. 2214 
  18. 2215        head = array->queue + idx; 
  19. 2216        curr = head->prev; 
  20. ... 

看一下sched_find_first_bit的实现,常数时间即可选择出最小的不为0的bit,所以为O(1)调度算法。

  1.  7/* 
  2.  8 * Every architecture must define this function. It's the fastest 
  3.  9 * way of searching a 140-bit bitmap where the first 100 bits are 
  4. 10 * unlikely to be set. It's guaranteed that at least one of the 140 
  5. 11 * bits is cleared. 
  6. 12 */ 
  7. 13static inline int sched_find_first_bit(const unsigned long *b) 
  8. 14{ 
  9. 15#if BITS_PER_LONG == 64 
  10. 16        if (unlikely(b[0])) 
  11. 17                return __ffs(b[0]); 
  12. 18        if (likely(b[1])) 
  13. 19                return __ffs(b[1]) + 64; 
  14. 20        return __ffs(b[2]) + 128; 
  15. 21#elif BITS_PER_LONG == 32 
  16. 22        if (unlikely(b[0])) 
  17. 23                return __ffs(b[0]); 
  18. 24        if (unlikely(b[1])) 
  19. 25                return __ffs(b[1]) + 32; 
  20. 26        if (unlikely(b[2])) 
  21. 27                return __ffs(b[2]) + 64; 
  22. 28        if (b[3]) 
  23. 29                return __ffs(b[3]) + 96; 
  24. 30        return __ffs(b[4]) + 128; 
  25. 31#else 
  26. 32#error BITS_PER_LONG not defined 
  27. 33#endif 
  28. 34} 

问题:高优先级的任务总是先被调度,低优先级的任务无法被调度,starvation.

2) In 2.6.23 release, a CFS (Completely Fair Scheduler) is added, see this from wiki (http://en.wikipedia.org/wiki/Completely_Fair_Scheduler), and we can get more info from http://kerneltrap.org/node/8059

In contrast to the previous O(1) scheduler used in older Linux 2.6 kernels, the CFS scheduler implementation is not based on run queues. Instead, a red-black tree implements a "timeline" of future task execution. Additionally, the scheduler uses nanosecond granularity accounting, the atomic units by which an individual process' share of the CPU was allocated (thus making redundant the previous notion of timeslices).
The fair queuing CFS scheduler has a scheduling complexity of O(log N), where N is the number of tasks in the runqueue. Choosing a task can be done in constant time, but reinserting a task after it has run requires O(log N) operations, because the runqueue is implemented as a red-black tree.

High priority tasks still get larger slices, but tasks are ordered by "vruntime", which means schedule cpu-starved threads first, low priority threads get their smaller slice but at least execute.

3) IO scheduler

2.6 kernel has 4 I/O schedulers, No-op I/O scheduler, Anticipatory I/O scheduler(default), Deadline I/O scheduler与CFQ I/O scheduler. In 2.4, use the "Linus Elevator" (No-op I/O scheduler).问题主要是“请求饥饿”。

  1. /home/a/j/nomad2:pwd 
  2. /sys/block/sda/queue 
  3. /home/a/j/nomad2:ls 
  4. iosched  max_hw_sectors_kb  max_sectors_kb  nr_requests  read_ahead_kb  scheduler 
  5. /home/a/j/nomad2:cat scheduler 
  6. noop anticipatory deadline [cfq]  

5. Kernel Architecture

old system call is using "int $0x80", and 2.6 use "sysenter" and the syscall number is in EAX. see syscall_table.S

6. Devices

分类:字符设备(例如:内存,终端,IO ports,键盘,鼠标等), 块设备 (buffered) device(硬盘,闪存等).

例如: mknod /dev/testdev c 254 0创建一个字符设备,major号(the number of the device registered in the kernel)是254,minor号(the number of device registered in the driver)是0.

  1. /home/a/j/nomad2:pwd 
  2. /proc 
  3. /home/a/j/nomad2:cat devices 
  4. Character devices: 
  5.   1 mem 
  6.   2 pty 
  7.   3 ttyp 
  8.   4 /dev/vc/0 
  9.   4 tty 
  10.   4 ttyS 
  11.   5 /dev/tty 
  12.   5 /dev/console 
  13.   5 /dev/ptmx 
  14.   6 lp 
  15.   7 vcs 
  16.  10 misc 
  17.  13 input 
  18.  21 sg 
  19.  29 fb 
  20. 128 ptm 
  21. 136 pts 
  22. 180 usb 
  23. 189 usb_device 
  24. 253 usb_endpoint 
  25. 254 rtc 
  26. Block devices: 
  27.   1 ramdisk 
  28.   8 sd 
  29.  11 sr 
  30.  65 sd 
  31.  66 sd 
  32.  67 sd 
  33.  68 sd 
  34.  69 sd 
  35.  70 sd 
  36.  71 sd 
  37. 128 sd 
  38. 129 sd 
  39. 130 sd 
  40. 131 sd 
  41. 132 sd 
  42. 133 sd 
  43. 134 sd 
  44. 135 sd 

 

 

(nomad2)
本站文章除注明转载外,均为本站原创或编译欢迎任何形式的转载,但请务必注明出处,尊重他人劳动,同学习共成长。转载请注明:文章转载自:罗索实验室 [http://www.rosoo.net/a/201412/17141.html]
本文出处:CSDN博客 作者:nomad2 原文
顶一下
(0)
0%
踩一下
(0)
0%
------分隔线----------------------------
相关文章
发表评论
请自觉遵守互联网相关的政策法规,严禁发布色情、暴力、反动的言论。
评价:
表情:
用户名: 验证码:点击我更换图片
栏目列表
将本文分享到微信
织梦二维码生成器
推荐内容