一、实验环境
1、软件
a) Vmware版本:Vmware Workstation 12.5.7
b) Ubuntu版本:9.10
c) 内核版本:2.6.31.14
d) gcc版本:4.4.1
e) gdb版本:7.0
2、摄像头硬件
百问网自制uvc摄像头
3、排查过程中,使用到的工具
a) printk
b) objdump
c) strace
d)gdb
二、前言
用C语言写程序时,如果定义一个带返回值的函数,但在函数体最后却缺少了return 语句, 程序编译并运行起来后,有时会产生意想不到的严重后果!这事以前只在教科书里看到过,纸上得来终觉浅,所以一直没当回事。但这次在学习韦东山嵌入式培训视频(3期项目实战之USB摄像头监控)时,真切地接受了一次教训。兹记录下整个入坑和出坑的经过,希望对自己和大家都有所助益。
三、现象描述
仿照视频教程,自己写了一个简化版的uvc摄像头驱动。在insmod my_uvc.ko,然后运行xawtv时,不幸发生了内核Oops,详细信息如下:
[ 657.966482] BUG: unable to handle kernel paging request at fffffff4
[ 657.966486] IP: [
[ 657.966491] *pde = 0081d067 *pte = 00000000
[ 657.966493] Oops: 0002 [#1] SMP
[ 657.966495] last sysfs file: /sys/devices/virtual/video4linux/video0/dev
[ 657.966498] Modules linked in: my_uvc nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc snd_ens1371 gameport
... 此处省略无关信息若干行
[ 657.966519]
[ 657.966522] Pid: 5059, comm: xawtv.bin Not tainted (2.6.31-14-generic #48-Ubuntu) VMware Virtual Platform
[ 657.966523] EIP: 0060:[
[ 657.966525] EIP is at __ticket_spin_lock+0x8/0x20
[ 657.966527] EAX: fffffff4 EBX: 00200282 ECX: fffffff4 EDX: 00000100
[ 657.966528] ESI: deb69bf4 EDI: defcba00 EBP: deb69ad0 ESP: deb69ad0
[ 657.966529] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 657.966531] Process xawtv.bin (pid: 5059, ti=deb68000 task=df319920 task.ti=deb68000)
[ 657.966532] Stack:
[ 657.966533] deb69ad8 c0127c38 deb69aec c05707da fffffff4 deb69bf4 defcba00 deb69b00
[ 657.966536] <0> c015c58d 00000000 defcba00 defcba00 deb69b24 c01f64e0 fffffff4 00012f50
[ 657.966539] <0> deb47b80 deb69b24 deb69bd0 ffffffa8 defcba00 deb69b34 e0a8b234 de979a00
[ 657.966543] Call Trace:
[ 657.966545] [
[ 657.966548] [
[ 657.966550] [
[ 657.966553] [
[ 657.966556] [
[ 657.966559] [
[ 657.966562] [
... 此处省略无关信息若干行
[ 657.966650] [
[ 657.966652] [
[ 657.966653] Code: ff ff 90 b9 5a 7a 12 c0 b8 5d 7a 12 c0 e9 59 ff ff ff 90 b9 60 7a 12 c0 b8 63 7a 12 c0 e9 49 ff ff ff 90 55 ba 00 01 00 00 89 e5 <3e> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 5d c3 8d b4 26 00
[ 657.966672] EIP: [
[ 657.966676] CR2: 00000000fffffff4
[ 657.966682] ---[ end trace 672c8069f4e9d743 ]---
四、排查过程
1、回忆起之前曾经成功用这个my_uvc驱动过xawtv,而这次发生kernel oops的代码,唯一的改动就是在myuvc_vidioc_try_fmt_vid_cap里加了一句:
printk(KERN_CRIT"frames[frame_idx].width:%d, frames[frame_idx].height:%dn",frames[frame_idx].width, frames[frame_idx].height);
但为什么一个printk会造成kernel oops呢?一头雾水。。。
2、线索1:用objdump查看反汇编代码
a) 由于缺少return语句,从而导致kernel oops的代码
0000036d
36d: 55 push %ebp
36e: 89 e5 mov %esp,%ebp
370: 53 push %ebx
371: 83 ec 0c sub $0xc,%esp
374: 89 cb mov %ecx,%ebx
376: 83 39 01 cmpl $0x1,(%ecx)
379: 75 51 jne 3cc
37b: 81 79 0c 4d 4a 50 47 cmpl $0x47504a4d,0xc(%ecx)
382: 75 48 jne 3cc
384: c7 41 04 40 01 00 00 movl $0x140,0x4(%ecx) ;f->fmt.pix.width = frames[frame_idx].width;
38b: c7 41 08 f0 00 00 00 movl $0xf0,0x8(%ecx) ;f->fmt.pix.height = frames[frame_idx].height;
392: c7 44 24 08 f0 00 00 movl $0xf0,0x8(%esp)
399: 00
39a: c7 44 24 04 40 01 00 movl $0x140,0x4(%esp)
3a1: 00
3a2: c7 04 24 00 02 00 00 movl $0x200,(%esp)
3a9: e8 fc ff ff ff call 3aa
3ae: c7 43 14 00 00 00 00 movl $0x0,0x14(%ebx) ;f->fmt.pix.bytesperline = 0;
3b5: c7 43 18 00 2e 01 00 movl $0x12e00,0x18(%ebx) ;f->fmt.pix.sizeimage = dwMaxVideoFrameSize;
3bc: c7 43 10 01 00 00 00 movl $0x1,0x10(%ebx) ;f->fmt.pix.field = V4L2_FIELD_NONE;
3c3: c7 43 1c 08 00 00 00 movl $0x8,0x1c(%ebx) ;f->fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
3ca: eb 05 jmp 3d1
3cc: b8 00 00 00 00 mov $0x0,%eax ;return 0;
3d1: 83 c4 0c add $0xc,%esp
3d4: 5b pop %ebx
3d5: 5d pop %ebp
3d6: c3 ret
b) 而加上了return语句,正常运行的代码
0000036d
36d: 55 push %ebp
36e: 89 e5 mov %esp,%ebp
370: 53 push %ebx
371: 83 ec 0c sub $0xc,%esp
374: 89 cb mov %ecx,%ebx
376: 83 39 01 cmpl $0x1,(%ecx)
379: 75 4f jne 3ca
37b: 81 79 0c 4d 4a 50 47 cmpl $0x47504a4d,0xc(%ecx)
382: 75 46 jne 3ca
384: c7 41 04 40 01 00 00 movl $0x140,0x4(%ecx) ;f->fmt.pix.width = frames[frame_idx].width;
38b: c7 41 08 f0 00 00 00 movl $0xf0,0x8(%ecx) ;f->fmt.pix.height = frames[frame_idx].height;
392: c7 44 24 08 f0 00 00 movl $0xf0,0x8(%esp)
399: 00
39a: c7 44 24 04 40 01 00 movl $0x140,0x4(%esp)
3a1: 00
3a2: c7 04 24 00 02 00 00 movl $0x200,(%esp)
3a9: e8 fc ff ff ff call 3aa
3ae: c7 43 14 00 00 00 00 movl $0x0,0x14(%ebx) ;f->fmt.pix.bytesperline = 0;
3b5: c7 43 18 00 2e 01 00 movl $0x12e00,0x18(%ebx) ;f->fmt.pix.sizeimage = dwMaxVideoFrameSize;
3bc: c7 43 10 01 00 00 00 movl $0x1,0x10(%ebx) ;f->fmt.pix.field = V4L2_FIELD_NONE;
3c3: c7 43 1c 08 00 00 00 movl $0x8,0x1c(%ebx) ;f->fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
3ca: b8 00 00 00 00 mov $0x0,%eax ;由于C代码加了return语句,导致汇编代码没有旁路掉下面这句mov $0x0,%eax
3cf: 83 c4 0c add $0xc,%esp
3d2: 5b pop %ebx
3d3: 5d pop %ebp
3d4: c3 ret
经过对比,可以看出:由于C代码缺少return 语句,导致汇编代码里在函数返回前,没有正确的给eax赋0值,从而myuvc_vidioc_try_fmt_vid_cap的调用者实际得到了一个错误的返回值。
那么,是谁调用了myuvc_vidioc_try_fmt_vid_cap呢?经查,有两处:
i) my_uvc驱动里的myuvc_vidioc_s_fmt_vid_cap
ii)xawtv通过系统调用ioctl( VIDIOC_TRY_FMT)间接调用了驱动的函数
通过strace –o /dev/ttyS1 xawtv记录的日志,发现:确实是xawtv通过ioctl( VIDIOC_TRY_FMT)调用了myuvc_vidioc_try_fmt_vid_cap,并且确实得到了一个错误的返回值,相关日志信息如下:
ioctl(4, VIDIOC_TRY_FMT, 0xbfff95b0) = 79 //myuvc_vidioc_try_fmt_vid_cap由于最后缺少return语句,导致返回了非0值(至于为什么是79,且看下文)