gcc4.8 linaro fix by torrorosso · Pull Request #1 · M1cha/android_kernel_xiaomi_aries

torrorosso · 2014-02-18T18:10:52Z

gcc4.8 fixes

M1cha · 2014-02-18T18:23:15Z

u learned fast :)

24 core Intel box's first exposure to 3.0.12-rt30-rc3 didn't go well. [ 27.104159] i7300_idle: loaded v1.55 [ 27.104192] BUG: scheduling while atomic: swapper/2/0/0x00000002 [ 27.104309] Pid: 0, comm: swapper/2 Tainted: G N 3.0.12-rt30-rc3-rt #1 [ 27.104317] Call Trace: [ 27.104338] [<ffffffff810046a5>] dump_trace+0x85/0x2e0 [ 27.104372] [<ffffffff8144eb00>] thread_return+0x12b/0x30b [ 27.104381] [<ffffffff8144f1b9>] schedule+0x29/0xb0 [ 27.104389] [<ffffffff814506e5>] rt_spin_lock_slowlock+0xc5/0x240 [ 27.104401] [<ffffffffa01f818f>] i7300_idle_notifier+0x3f/0x360 [i7300_idle] [ 27.104415] [<ffffffff814546c7>] notifier_call_chain+0x37/0x70 [ 27.104426] [<ffffffff81454748>] __atomic_notifier_call_chain+0x48/0x70 [ 27.104439] [<ffffffff81001a39>] cpu_idle+0x89/0xb0 [ 27.104449] bad: scheduling from the idle thread! This lock is taken from interrupt disabled context in the guts of idle. Convert it to a raw_spinlock. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Henroid <andrew.d.henroid@intel.com> Link: http://lkml.kernel.org/r/1323258522.5057.73.camel@marge.simson.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Currently request_irq() is called prior to fec_enet_init() and fec_ptp_init(), which causes the following crash on a mx53qsb: Unable to handle kernel NULL pointer dereference at virtual address 00000002 pgd = 80004000 [00000002] *pgd=00000000 Internal error: Oops: 5 [#1] SMP ARM Modules linked in: CPU: 0 Not tainted (3.8.0-rc7-next-20130215+ #346) PC is at fec_enet_interrupt+0xd0/0x348 LR is at fec_enet_interrupt+0xb8/0x348 pc : [<80372b7c>] lr : [<80372b64>] psr: 60000193 sp : df855c20 ip : df855c20 fp : df855c74 r10: 00000516 r9 : 1c000000 r8 : 00000000 r7 : 00000000 r6 : 00000000 r5 : 00000000 r4 : df9b7800 r3 : df9b7df4 r2 : 00000000 r1 : 00000000 r0 : df9b7d34 Ensure that such initialization functions are called prior to requesting the interrupts, so that all necessary the data structures are in place when the irqs occur. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>

bd_mutex and lo_ctl_mutex can be held in different order. Path #1: blkdev_open blkdev_get __blkdev_get (hold bd_mutex) lo_open (hold lo_ctl_mutex) Path #2: blkdev_ioctl lo_ioctl (hold lo_ctl_mutex) lo_set_capacity (hold bd_mutex) Lockdep does not report it, because path #2 actually holds a subclass of lo_ctl_mutex. This subclass seems creep into the code by mistake. The patch author actually just mentioned it in the changelog, see commit f028f3b ("loop: fix circular locking in loop_clr_fd()"), also see: http://marc.info/?l=linux-kernel&m=123806169129727&w=2 Path #2 hold bd_mutex to call bd_set_size(), I've protected it with i_mutex in a previous patch, so drop bd_mutex at this site. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>

usemap could also be allocated as compound pages. Should also consider compound pages when freeing memmap. If we don't fix it, there could be problems when we free vmemmap pagetables which are stored in compound pages. The old pagetables will not be freed properly, and when we add the memory again, no new pagetable will be created. And the old pagetable entry is used, than the kernel will panic. The call trace is like the following: BUG: unable to handle kernel paging request at ffffea0040000000 IP: [<ffffffff816a483f>] sparse_add_one_section+0xef/0x166 PGD 7ff7d4067 PUD 78e035067 PMD 78e11d067 PTE 0 Oops: 0002 [#1] SMP Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg lpc_ich mfd_core i2c_i801 i2c_core i7core_edac edac_core ioatdma e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod CPU 0 Pid: 4, comm: kworker/0:0 Tainted: G W 3.8.0-rc3-phy-hot-remove+ #3 FUJITSU-SV PRIMEQUEST 1800E/SB RIP: 0010:[<ffffffff816a483f>] [<ffffffff816a483f>] sparse_add_one_section+0xef/0x166 RSP: 0018:ffff8807bdcb35d8 EFLAGS: 00010006 RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000200000 RDX: ffff88078df01148 RSI: 0000000000000282 RDI: ffffea0040000000 RBP: ffff8807bdcb3618 R08: 4cf05005b019467a R09: 0cd98fa09631467a R10: 0000000000000000 R11: 0000000000030e20 R12: 0000000000008000 R13: ffffea0040000000 R14: ffff88078df66248 R15: ffff88078ea13b10 FS: 0000000000000000(0000) GS:ffff8807c1a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffea0040000000 CR3: 0000000001c0c000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/0:0 (pid: 4, threadinfo ffff8807bdcb2000, task ffff8807bde18000) Call Trace: __add_pages+0x85/0x120 arch_add_memory+0x71/0xf0 add_memory+0xd6/0x1f0 acpi_memory_device_add+0x170/0x20c acpi_device_probe+0x50/0x18a really_probe+0x6c/0x320 driver_probe_device+0x47/0xa0 __device_attach+0x53/0x60 bus_for_each_drv+0x6c/0xa0 device_attach+0xa8/0xc0 bus_probe_device+0xb0/0xe0 device_add+0x301/0x570 device_register+0x1e/0x30 acpi_device_register+0x1d8/0x27c acpi_add_single_object+0x1df/0x2b9 acpi_bus_check_add+0x112/0x18f acpi_ns_walk_namespace+0x105/0x255 acpi_walk_namespace+0xcf/0x118 acpi_bus_scan+0x5b/0x7c acpi_bus_add+0x2a/0x2c container_notify_cb+0x112/0x1a9 acpi_ev_notify_dispatch+0x46/0x61 acpi_os_execute_deferred+0x27/0x34 process_one_work+0x20e/0x5c0 worker_thread+0x12e/0x370 kthread+0xee/0x100 ret_from_fork+0x7c/0xb0 Code: 00 00 48 89 df 48 89 45 c8 e8 3e 71 b1 ff 48 89 c2 48 8b 75 c8 b8 ef ff ff ff f6 02 01 75 4b 49 63 cc 31 c0 4c 89 ef 48 c1 e1 06 <f3> aa 48 8b 02 48 83 c8 01 48 85 d2 48 89 02 74 29 a8 01 74 25 RIP [<ffffffff816a483f>] sparse_add_one_section+0xef/0x166 RSP <ffff8807bdcb35d8> CR2: ffffea0040000000 ---[ end trace e7f94e3a34c442d4 ]--- Kernel panic - not syncing: Fatal exception Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

When a cpu is hotpluged, we call acpi_map_cpu2node() in _acpi_map_lsapic() to store the cpu's node and apicid's node. But we don't clear the cpu's node in acpi_unmap_lsapic() when this cpu is hotremoved. If the node is also hotremoved, we will get the following messages: kernel BUG at include/linux/gfp.h:329! invalid opcode: 0000 [#1] SMP Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma e1000e i7core_edac edac_core sg acpi_memhotplug igb dca sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod Pid: 3126, comm: init Not tainted 3.6.0-rc3-tangchen-hostbridge+ #13 FUJITSU-SV PRIMEQUEST 1800E/SB RIP: 0010:[<ffffffff811bc3fd>] [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300 RSP: 0018:ffff88078a049cf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000246 RBP: ffff88078a049d38 R08: 00000000000040d0 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000b5f R12: 00000000000052d0 R13: ffff8807c1417300 R14: 0000000000030038 R15: 0000000000000003 FS: 00007fa9b1b44700(0000) GS:ffff8807c3800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fa9b09acca0 CR3: 000000078b855000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process init (pid: 3126, threadinfo ffff88078a048000, task ffff8807bb6f2650) Call Trace: new_slab+0x30/0x1b0 __slab_alloc+0x358/0x4c0 kmem_cache_alloc_node_trace+0xb4/0x1e0 alloc_fair_sched_group+0xd0/0x1b0 sched_create_group+0x3e/0x110 sched_autogroup_create_attach+0x4d/0x180 sys_setsid+0xd4/0xf0 system_call_fastpath+0x16/0x1b Code: 89 c4 e9 73 fe ff ff 31 c0 89 de 48 c7 c7 45 de 9e 81 44 89 45 c8 e8 22 05 4b 00 85 db 44 8b 45 c8 0f 89 4f ff ff ff 0f 0b eb fe <0f> 0b 90 eb fd 0f 0b eb fe 89 de 48 c7 c7 45 de 9e 81 31 c0 44 RIP [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300 RSP <ffff88078a049cf8> ---[ end trace adf84c90f3fea3e5 ]--- The reason is that the cpu's node is not NUMA_NO_NODE, we will call alloc_pages_exact_node() to alloc memory on the node, but the node is offlined. If the node is onlined, we still need cpu's node. For example: a task on the cpu is sleeped when the cpu is hotremoved. We will choose another cpu to run this task when it is waked up. If we know the cpu's node, we will choose the cpu on the same node first. So we should clear cpu-to-node mapping when the node is offlined. This patch only clears apicid-to-node mapping when the cpu is hotremoved. [akpm@linux-foundation.org: fix section error] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

The tmpfs remount logic preserves filesystem mempolicy if the mpol=M option is not specified in the remount request. A new policy can be specified if mpol=M is given. Before this patch remounting an mpol bound tmpfs without specifying mpol= mount option in the remount request would set the filesystem's mempolicy object to a freed mempolicy object. To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run: # mkdir /tmp/x # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0 # mount -o remount,size=200M nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0 # note ? garbage in mpol=... output above # dd if=/dev/zero of=/tmp/x/f count=1 # panic here Panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) [...] Oops: 0010 [#1] SMP DEBUG_PAGEALLOC Call Trace: mpol_shared_policy_init+0xa5/0x160 shmem_get_inode+0x209/0x270 shmem_mknod+0x3e/0xf0 shmem_create+0x18/0x20 vfs_create+0xb5/0x130 do_last+0x9a1/0xea0 path_openat+0xb3/0x4d0 do_filp_open+0x42/0xa0 do_sys_open+0xfe/0x1e0 compat_sys_open+0x1b/0x20 cstar_dispatch+0x7/0x1f Non-debug kernels will not crash immediately because referencing the dangling mpol will not cause a fault. Instead the filesystem will reference a freed mempolicy object, which will cause unpredictable behavior. The problem boils down to a dropped mpol reference below if shmem_parse_options() does not allocate a new mpol: config = *sbinfo shmem_parse_options(data, &config, true) mpol_put(sbinfo->mpol) sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */ This patch avoids the crash by not releasing the mempolicy if shmem_parse_options() doesn't create a new mpol. How far back does this issue go? I see it in both 2.6.36 and 3.3. I did not look back further. Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Pass the directio request on pageio_init to clean up the API. Percolate pg_dreq from original nfs_pageio_descriptor to the pnfs_{read,write}_done_resend_to_mds and use it on respective call to nfs_pageio_init_{read,write} on the newly created nfs_pageio_descriptor. Reproduced by command: mount -o vers=4.1 server:/ /mnt dd bs=128k count=8 if=/dev/zero of=/mnt/dd.out oflag=direct BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] PGD 34786067 PUD 34794067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: nfs_layout_nfsv41_files nfsv4 nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc btrfs zlib_deflate libcrc32c ipv6 autofs4 CPU 1 Pid: 259, comm: kworker/1:2 Not tainted 3.8.0-rc6 #2 Bochs Bochs RIP: 0010:[<ffffffffa021a3a8>] [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] RSP: 0018:ffff880038f8fa68 EFLAGS: 00010206 RAX: ffffffffa021a6a9 RBX: ffff880038f8fb48 RCX: 00000000000a0000 RDX: ffffffffa021e616 RSI: ffff8800385e9a40 RDI: 0000000000000028 RBP: ffff880038f8fa68 R08: ffffffff81ad6720 R09: ffff8800385e9510 R10: ffffffffa0228450 R11: ffff880038e87418 R12: ffff8800385e9a40 R13: ffff8800385e9a70 R14: ffff880038f8fb38 R15: ffffffffa0148878 FS: 0000000000000000(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000034789000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/1:2 (pid: 259, threadinfo ffff880038f8e000, task ffff880038302480) Stack: ffff880038f8fa78 ffffffffa021a6bf ffff880038f8fa88 ffffffffa021bb82 ffff880038f8fae8 ffffffffa021f454 ffff880038f8fae8 ffffffff8109689d ffff880038f8fab8 ffffffff00000006 0000000000000000 ffff880038f8fb48 Call Trace: [<ffffffffa021a6bf>] nfs_direct_pgio_init+0x16/0x18 [nfs] [<ffffffffa021bb82>] nfs_pgheader_init+0x6a/0x6c [nfs] [<ffffffffa021f454>] nfs_generic_pg_writepages+0x51/0xf8 [nfs] [<ffffffff8109689d>] ? mark_held_locks+0x71/0x99 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc] [<ffffffffa021bc25>] nfs_pageio_doio+0x1a/0x43 [nfs] [<ffffffffa021be7c>] nfs_pageio_complete+0x16/0x2c [nfs] [<ffffffffa02608be>] pnfs_write_done_resend_to_mds+0x95/0xc5 [nfsv4] [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc] [<ffffffffa028e27f>] filelayout_reset_write+0x8c/0x99 [nfs_layout_nfsv41_files] [<ffffffffa028e5f9>] filelayout_write_done_cb+0x4d/0xc1 [nfs_layout_nfsv41_files] [<ffffffffa024587a>] nfs4_write_done+0x36/0x49 [nfsv4] [<ffffffffa021f996>] nfs_writeback_done+0x53/0x1cc [nfs] [<ffffffffa021fb1d>] nfs_writeback_done_common+0xe/0x10 [nfs] [<ffffffffa028e03d>] filelayout_write_call_done+0x28/0x2a [nfs_layout_nfsv41_files] [<ffffffffa01488a1>] rpc_exit_task+0x29/0x87 [sunrpc] [<ffffffffa014a0c9>] __rpc_execute+0x11d/0x3cc [sunrpc] [<ffffffff810969dc>] ? trace_hardirqs_on_caller+0x117/0x173 [<ffffffffa014a39f>] rpc_async_schedule+0x27/0x32 [sunrpc] [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc] [<ffffffff8105f8c1>] process_one_work+0x226/0x422 [<ffffffff8105f7f4>] ? process_one_work+0x159/0x422 [<ffffffff81094757>] ? lock_acquired+0x210/0x249 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc] [<ffffffff810600d8>] worker_thread+0x126/0x1c4 [<ffffffff8105ffb2>] ? manage_workers+0x240/0x240 [<ffffffff81064ef8>] kthread+0xb1/0xb9 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65 [<ffffffff815206ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65 Code: 00 83 38 02 74 12 48 81 4b 50 00 00 01 00 c7 83 60 07 00 00 01 00 00 00 48 89 df e8 55 fe ff ff 5b 41 5c 5d c3 66 90 55 48 89 e5 <f0> ff 07 5d c3 55 48 89 e5 f0 ff 0f 0f 94 c0 84 c0 0f 95 c0 0f RIP [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] RSP <ffff880038f8fa68> CR2: 0000000000000028 Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@kernel.org [>= 3.6] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

…device" This reverts commit eb6b9a8. Above commit limits GSO capability of gre device to just TSO, but software GRE-GSO is capable of handling all GSO capabilities. This patch also fixes following panic which reverted commit introduced:- BUG: unable to handle kernel NULL pointer dereference at 00000000000000a2 IP: [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre] PGD 42bc19067 PUD 42bca9067 PMD 0 Oops: 0000 [#1] SMP Pid: 2636, comm: ip Tainted: GF 3.8.0+ #83 Dell Inc. PowerEdge R620/0KCKR5 RIP: 0010:[<ffffffffa0680fd1>] [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre] RSP: 0018:ffff88042bfcb708 EFLAGS: 00010246 RAX: 00000000000005b6 RBX: ffff88042d2fa000 RCX: 0000000000000044 RDX: 0000000000000018 RSI: 0000000000000078 RDI: 0000000000000060 RBP: ffff88042bfcb748 R08: 0000000000000018 R09: 000000000000000c R10: 0000000000000020 R11: 000000000101010a R12: ffff88042d2fa800 R13: 0000000000000000 R14: ffff88042d2fa800 R15: ffff88042cd7f650 FS: 00007fa784f55700(0000) GS:ffff88043fd20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a2 CR3: 000000042d8b9000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ip (pid: 2636, threadinfo ffff88042bfca000, task ffff88042d142a80) Stack: 0000000100000000 002f000000000000 0a01010100000000 000000000b010101 ffff88042d2fa800 ffff88042d2fa000 ffff88042bfcb858 ffff88042f418c00 ffff88042bfcb798 ffffffffa068199a ffff88042bfcb798 ffff88042d2fa830 Call Trace: [<ffffffffa068199a>] ipgre_newlink+0xca/0x160 [ip_gre] [<ffffffff8143b692>] rtnl_newlink+0x532/0x5f0 [<ffffffff8143b2fc>] ? rtnl_newlink+0x19c/0x5f0 [<ffffffff81438978>] rtnetlink_rcv_msg+0x2c8/0x340 [<ffffffff814386b0>] ? rtnetlink_rcv+0x40/0x40 [<ffffffff814560f9>] netlink_rcv_skb+0xa9/0xd0 [<ffffffff81438695>] rtnetlink_rcv+0x25/0x40 [<ffffffff81455ddc>] netlink_unicast+0x1ac/0x230 [<ffffffff81456a45>] netlink_sendmsg+0x265/0x380 [<ffffffff814138c0>] sock_sendmsg+0xb0/0xe0 [<ffffffff8141141e>] ? move_addr_to_kernel+0x4e/0x90 [<ffffffff81420445>] ? verify_iovec+0x85/0xf0 [<ffffffff81414ffd>] __sys_sendmsg+0x3fd/0x420 [<ffffffff8114b701>] ? handle_mm_fault+0x251/0x3b0 [<ffffffff8114f39f>] ? vma_link+0xcf/0xe0 [<ffffffff81415239>] sys_sendmsg+0x49/0x90 [<ffffffff814ffd19>] system_call_fastpath+0x16/0x1b CC: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>

This fixes an oops where a LAYOUTGET is in still in the rpciod queue, but the requesting processes has been killed. Without this, killing the process does the final pnfs_put_layout_hdr() and sets NFS_I(inode)->layout to NULL while the LAYOUTGET rpc task still references it. Example oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 IP: [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4] PGD 7365b067 PUD 7365d067 PMD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: nfs_layout_nfsv41_files nfsv4 auth_rpcgss nfs lockd sunrpc ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ip6table_filter ip6_tables ppdev e1000 i2c_piix4 i2c_core shpchp parport_pc parport crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd mptspi scsi_transport_spi mptscsih mptbase floppy autofs4 CPU 0 Pid: 27, comm: kworker/0:1 Not tainted 3.8.0-dros_cthon2013+ #4 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa01bd586>] [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4] RSP: 0018:ffff88007b0c1c88 EFLAGS: 00010246 RAX: ffff88006ed36678 RBX: 0000000000000000 RCX: 0000000ea877e3bc RDX: ffff88007a729da8 RSI: 0000000000000000 RDI: ffff88007a72b958 RBP: ffff88007b0c1ca8 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007a72b958 R13: ffff88007a729da8 R14: 0000000000000000 R15: ffffffffa011077e FS: 0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000080 CR3: 00000000735f8000 CR4: 00000000001407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/0:1 (pid: 27, threadinfo ffff88007b0c0000, task ffff88007c2fa0c0) Stack: ffff88006fc05388 ffff88007a72b908 ffff88007b240900 ffff88006fc05388 ffff88007b0c1cd8 ffffffffa01a2170 ffff88007b240900 ffff88007b240900 ffff88007b240970 ffffffffa011077e ffff88007b0c1ce8 ffffffffa0110791 Call Trace: [<ffffffffa01a2170>] nfs4_layoutget_prepare+0x7b/0x92 [nfsv4] [<ffffffffa011077e>] ? __rpc_atrun+0x15/0x15 [sunrpc] [<ffffffffa0110791>] rpc_prepare_task+0x13/0x15 [sunrpc] Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Weston Andros Adamson <dros@netapp.com> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Kjell Braden reported this oops: [ 833.211970] BUG: unable to handle kernel NULL pointer dereference at (null) [ 833.212816] IP: [< (null)>] (null) [ 833.213280] PGD 1b9b2067 PUD e9f7067 PMD 0 [ 833.213874] Oops: 0010 [#1] SMP [ 833.214344] CPU 0 [ 833.214458] Modules linked in: des_generic md4 nls_utf8 cifs vboxvideo drm snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq bnep rfcomm snd_timer bluetooth snd_seq_device ppdev snd vboxguest parport_pc joydev mac_hid soundcore snd_page_alloc psmouse i2c_piix4 serio_raw lp parport usbhid hid e1000 [ 833.215629] [ 833.215629] Pid: 1752, comm: mount.cifs Not tainted 3.0.0-rc7-bisectcifs-fec11dd9a0+ #18 innotek GmbH VirtualBox/VirtualBox [ 833.215629] RIP: 0010:[<0000000000000000>] [< (null)>] (null) [ 833.215629] RSP: 0018:ffff8800119c9c50 EFLAGS: 00010282 [ 833.215629] RAX: ffffffffa02186c0 RBX: ffff88000c427780 RCX: 0000000000000000 [ 833.215629] RDX: 0000000000000000 RSI: ffff88000c427780 RDI: ffff88000c4362e8 [ 833.215629] RBP: ffff8800119c9c88 R08: ffff88001fc15e30 R09: 00000000d69515c7 [ 833.215629] R10: ffffffffa0201972 R11: ffff88000e8f6a28 R12: ffff88000c4362e8 [ 833.215629] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88001181aaa6 [ 833.215629] FS: 00007f2986171700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [ 833.215629] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 833.215629] CR2: 0000000000000000 CR3: 000000001b982000 CR4: 00000000000006f0 [ 833.215629] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 833.215629] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 833.215629] Process mount.cifs (pid: 1752, threadinfo ffff8800119c8000, task ffff88001c1c16f0) [ 833.215629] Stack: [ 833.215629] ffffffff8116a9b5 ffff8800119c9c88 ffffffff81178075 0000000000000286 [ 833.215629] 0000000000000000 ffff88000c4276c0 ffff8800119c9ce8 ffff8800119c9cc8 [ 833.215629] ffffffff8116b06e ffff88001bc6fc00 ffff88000c4276c0 ffff88000c4276c0 [ 833.215629] Call Trace: [ 833.215629] [<ffffffff8116a9b5>] ? d_alloc_and_lookup+0x45/0x90 [ 833.215629] [<ffffffff81178075>] ? d_lookup+0x35/0x60 [ 833.215629] [<ffffffff8116b06e>] __lookup_hash.part.14+0x9e/0xc0 [ 833.215629] [<ffffffff8116b1d6>] lookup_one_len+0x146/0x1e0 [ 833.215629] [<ffffffff815e4f7e>] ? _raw_spin_lock+0xe/0x20 [ 833.215629] [<ffffffffa01eef0d>] cifs_do_mount+0x26d/0x500 [cifs] [ 833.215629] [<ffffffff81163bd3>] mount_fs+0x43/0x1b0 [ 833.215629] [<ffffffff8117d41a>] vfs_kern_mount+0x6a/0xd0 [ 833.215629] [<ffffffff8117e584>] do_kern_mount+0x54/0x110 [ 833.215629] [<ffffffff8117fdc2>] do_mount+0x262/0x840 [ 833.215629] [<ffffffff81108a0e>] ? __get_free_pages+0xe/0x50 [ 833.215629] [<ffffffff8117f9ca>] ? copy_mount_options+0x3a/0x180 [ 833.215629] [<ffffffff8118075d>] sys_mount+0x8d/0xe0 [ 833.215629] [<ffffffff815ece82>] system_call_fastpath+0x16/0x1b [ 833.215629] Code: Bad RIP value. [ 833.215629] RIP [< (null)>] (null) [ 833.215629] RSP <ffff8800119c9c50> [ 833.215629] CR2: 0000000000000000 [ 833.238525] ---[ end trace ec00758b8d44f529 ]--- When walking down the path on the server, it's possible to hit a symlink. The path walking code assumes that the caller will handle that situation properly, but cifs_get_root() isn't set up for it. This patch prevents the oops by simply returning an error. A better solution would be to try and chase the symlinks here, but that's fairly complicated to handle. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=53221 Reported-and-tested-by: Kjell Braden <afflux@pentabarf.de> Cc: stable <stable@vger.kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>

Benny Halevy reported the following oops when testing RHEL6: <7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644 <1>BUG: unable to handle kernel NULL pointer dereference at (null) <1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4>PGD 81448a067 PUD 831632067 PMD 0 <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled <4>CPU 6 <4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6 /ProLiant DL170e G6 <4>RIP: 0010:[<ffffffffa02a52c5>] [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4>RSP: 0018:ffff88081458bb98 EFLAGS: 00010292 <4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003 <4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760 <4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000 <4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008 <4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780 <4>FS: 00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040) <4>Stack: <4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745 <4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea <4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760 <4>Call Trace: <4> [<ffffffff81182745>] __fput+0xf5/0x210 <4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs] <4> [<ffffffff81182885>] fput+0x25/0x30 <4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360 <4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60 <4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90 <4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs] <4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs] <4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160 <4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0 <4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50 <4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0 <4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480 <4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160 <4> [<ffffffff8117de79>] do_sys_open+0x69/0x140 <4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80 <4> [<ffffffff8117df90>] sys_open+0x20/0x30 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31 <1>RIP [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4> RSP <ffff88081458bb98> <4>CR2: 0000000000000000 I think this is ultimately due to a bug on the server. The client had previously found a directory dentry. It then later tried to do an atomic open on a new (regular file) dentry. The attributes it got back had the same filehandle as the previously found directory inode. It then tried to put the filp because it failed the aops tests for O_DIRECT opens, and oopsed here because the ctx was still NULL. Obviously the root cause here is a server issue, but we can take steps to mitigate this on the client. When nfs_fhget is called, we always know what type of inode it is. In the event that there's a broken or malicious server on the other end of the wire, the client can end up crashing because the wrong ops are set on it. Have nfs_find_actor check that the inode type is correct after checking the fileid. The fileid check should rarely ever match, so it should only rarely ever get to this check. In the case where we have a broken server, we may see two different inodes with the same i_ino, but the client should be able to cope with them without crashing. This should fix the oops reported here: https://bugzilla.redhat.com/show_bug.cgi?id=913660 Reported-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

While adding and removing a lot of disks disks and partitions this sometimes shows up: WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted) Hardware name: sysfs: cannot create duplicate filename '/dev/block/259:751' Modules linked in: raid1 autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 dm_mirror dm_region_hash dm_log power_meter microcode dcdbas serio_raw amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core k10temp bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mod [last unloaded: scsi_wait_scan] Pid: 44103, comm: async/16 Not tainted 2.6.32-195.el6.x86_64 #1 Call Trace: warn_slowpath_common+0x87/0xc0 warn_slowpath_fmt+0x46/0x50 sysfs_add_one+0xc9/0x130 sysfs_do_create_link+0x12b/0x170 sysfs_create_link+0x13/0x20 device_add+0x317/0x650 idr_get_new+0x13/0x50 add_partition+0x21c/0x390 rescan_partitions+0x32b/0x470 sd_open+0x81/0x1f0 [sd_mod] __blkdev_get+0x1b6/0x3c0 blkdev_get+0x10/0x20 register_disk+0x155/0x170 add_disk+0xa6/0x160 sd_probe_async+0x13b/0x210 [sd_mod] add_wait_queue+0x46/0x60 async_thread+0x102/0x250 default_wake_function+0x0/0x20 async_thread+0x0/0x250 kthread+0x96/0xa0 child_rip+0xa/0x20 kthread+0x0/0xa0 child_rip+0x0/0x20 This most likely happens because dev_t is freed while the number is still used and idr_get_new() is not protected on every use. The fix adds a mutex where it wasn't before and moves the dev_t free function so it is called after device del. Signed-off-by: Tomas Henzl <thenzl@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

This patch fixes a regression introduced in v3.8, which causes oops like this when dm-multipath is used: general protection fault: 0000 [#1] SMP RIP: 0010:[<ffffffff810fe754>] [<ffffffff810fe754>] mempool_free+0x24/0xb0 Call Trace: <IRQ> [<ffffffff81187417>] bio_put+0x97/0xc0 [<ffffffffa02247a5>] end_clone_bio+0x35/0x90 [dm_mod] [<ffffffff81185efd>] bio_endio+0x1d/0x30 [<ffffffff811f03a3>] req_bio_endio.isra.51+0xa3/0xe0 [<ffffffff811f2f68>] blk_update_request+0x118/0x520 [<ffffffff811f3397>] blk_update_bidi_request+0x27/0xa0 [<ffffffff811f343c>] blk_end_bidi_request+0x2c/0x80 [<ffffffff811f34d0>] blk_end_request+0x10/0x20 [<ffffffffa000b32b>] scsi_io_completion+0xfb/0x6c0 [scsi_mod] [<ffffffffa000107d>] scsi_finish_command+0xbd/0x120 [scsi_mod] [<ffffffffa000b12f>] scsi_softirq_done+0x13f/0x160 [scsi_mod] [<ffffffff811f9fd0>] blk_done_softirq+0x80/0xa0 [<ffffffff81044551>] __do_softirq+0xf1/0x250 [<ffffffff8142ee8c>] call_softirq+0x1c/0x30 [<ffffffff8100420d>] do_softirq+0x8d/0xc0 [<ffffffff81044885>] irq_exit+0xd5/0xe0 [<ffffffff8142f3e3>] do_IRQ+0x63/0xe0 [<ffffffff814257af>] common_interrupt+0x6f/0x6f <EOI> [<ffffffffa021737c>] srp_queuecommand+0x8c/0xcb0 [ib_srp] [<ffffffffa0002f18>] scsi_dispatch_cmd+0x148/0x310 [scsi_mod] [<ffffffffa000a38e>] scsi_request_fn+0x31e/0x520 [scsi_mod] [<ffffffff811f1e57>] __blk_run_queue+0x37/0x50 [<ffffffff811f1f69>] blk_delay_work+0x29/0x40 [<ffffffff81059003>] process_one_work+0x1c3/0x5c0 [<ffffffff8105b22e>] worker_thread+0x15e/0x440 [<ffffffff8106164b>] kthread+0xdb/0xe0 [<ffffffff8142db9c>] ret_from_fork+0x7c/0xb0 The regression was introduced by the change c0820cf "dm: introduce per_bio_data", where dm started to replace bioset during table replacement. For bio-based dm, it is good because clone bios do not exist during the table replacement. For request-based dm, however, (not-yet-mapped) clone bios may stay in request queue and survive during the table replacement. So freeing the old bioset could cause the oops in bio_put(). Since the size of front_pad may change only with bio-based dm, it is not necessary to replace bioset for request-based dm. Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>

Tim found: WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80() Hardware name: S2600CP sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. smpboot: Booting Node 1, Processors #1 Modules linked in: Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1 Call Trace: set_cpu_sibling_map+0x279/0x449 start_secondary+0x11d/0x1e5 Don Morris reproduced on a HP z620 workstation, and bisected it to commit e8d1955 ("acpi, memory-hotplug: parse SRAT before memblock is ready") It turns out movable_map has some problems, and it breaks several things 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(&numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy. and make fall back path working. 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i < MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that.... c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes critical x86 code. It caused x86 guys did not pay attention to find the problem early. Those patches really should be routed via tip/x86/mm. 4. after that commit, following range can not use movable ram: a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed? b. initrd... it will be freed after booting, so it could be on movable... c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G anymore. d. init_mem_mapping: can not put page table high anymore. e. initmem_init: vmemmap can not be high local node anymore. That is not good. If node is hotplugable, the mem related range like page table and vmemmap could be on the that node without problem and should be on that node. We have workaround patch that could fix some problems, but some can not be fixed. So just remove that offending commit and related ones including: f7210e6 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region().") 01a178a ("acpi, memory-hotplug: support getting hotplug info from SRAT") 27168d3 ("acpi, memory-hotplug: extend movablemem_map ranges to the end of node") e8d1955 ("acpi, memory-hotplug: parse SRAT before memblock is ready") fb06bc8 ("page_alloc: bootmem limit with movablecore_map") 42f47e2 ("page_alloc: make movablemem_map have higher priority") 6981ec3 ("page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes") 34b71f1 ("page_alloc: add movable_memmap kernel parameter") 4d59a75 ("x86: get pg_data_t's memory from other node") Later we should have patches that will make sure kernel put page table and vmemmap on local node ram instead of push them down to node0. Also need to find way to put other kernel used ram to local node ram. Reported-by: Tim Gardner <tim.gardner@canonical.com> Reported-by: Don Morris <don.morris@hp.com> Bisected-by: Don Morris <don.morris@hp.com> Tested-by: Don Morris <don.morris@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

If start_this_handle() failed handle will be initialized to ERR_PTR() and can not be dereferenced. paging request at fffffffffffffff6 IP: [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290 PGD 200e067 PUD 200f067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod CPU 0 journal commit I/O error Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79 /DQ67SW RIP: 0010:[<ffffffff813c073f>] [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290 RSP: 0018:ffff880233b8ba58 EFLAGS: 00010292 RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48 RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Dave Jones <davej@redhat.com> writes: > Just hit this on Linus' current tree. > > [ 89.621770] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 > [ 89.623111] IP: [<ffffffff810784b0>] commit_creds+0x250/0x2f0 > [ 89.624062] PGD 122bfd067 PUD 122bfe067 PMD 0 > [ 89.624901] Oops: 0000 [#1] PREEMPT SMP > [ 89.625678] Modules linked in: caif_socket caif netrom bridge hidp 8021q garp stp mrp rose llc2 af_rxrpc phonet af_key binfmt_misc bnep l2tp_ppp can_bcm l2tp_core pppoe pppox can_raw scsi_transport_iscsi ppp_generic slhc nfnetlink can ipt_ULOG ax25 decnet irda nfc rds x25 crc_ccitt appletalk atm ipx p8023 psnap p8022 llc lockd sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables btusb bluetooth snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm vhost_net snd_page_alloc snd_timer tun macvtap usb_debug snd rfkill microcode macvlan edac_core pcspkr serio_raw kvm_amd soundcore kvm r8169 mii > [ 89.637846] CPU 2 > [ 89.638175] Pid: 782, comm: trinity-main Not tainted 3.8.0+ #63 Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H > [ 89.639850] RIP: 0010:[<ffffffff810784b0>] [<ffffffff810784b0>] commit_creds+0x250/0x2f0 > [ 89.641161] RSP: 0018:ffff880115657eb8 EFLAGS: 00010207 > [ 89.641984] RAX: 00000000000003e8 RBX: ffff88012688b000 RCX: 0000000000000000 > [ 89.643069] RDX: 0000000000000000 RSI: ffffffff81c32960 RDI: ffff880105839600 > [ 89.644167] RBP: ffff880115657ed8 R08: 0000000000000000 R09: 0000000000000000 > [ 89.645254] R10: 0000000000000001 R11: 0000000000000246 R12: ffff880105839600 > [ 89.646340] R13: ffff88011beea490 R14: ffff88011beea490 R15: 0000000000000000 > [ 89.647431] FS: 00007f3ac063b740(0000) GS:ffff88012b200000(0000) knlGS:0000000000000000 > [ 89.648660] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 89.649548] CR2: 00000000000000c8 CR3: 0000000122bfc000 CR4: 00000000000007e0 > [ 89.650635] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 89.651723] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 89.652812] Process trinity-main (pid: 782, threadinfo ffff880115656000, task ffff88011beea490) > [ 89.654128] Stack: > [ 89.654433] 0000000000000000 ffff8801058396a0 ffff880105839600 ffff88011beeaa78 > [ 89.655769] ffff880115657ef8 ffffffff812c7d9b ffffffff82079be0 0000000000000000 > [ 89.657073] ffff880115657f28 ffffffff8106c665 0000000000000002 ffff880115657f58 > [ 89.658399] Call Trace: > [ 89.658822] [<ffffffff812c7d9b>] key_change_session_keyring+0xfb/0x140 > [ 89.659845] [<ffffffff8106c665>] task_work_run+0xa5/0xd0 > [ 89.660698] [<ffffffff81002911>] do_notify_resume+0x71/0xb0 > [ 89.661581] [<ffffffff816c9a4a>] int_signal+0x12/0x17 > [ 89.662385] Code: 24 90 00 00 00 48 8b b3 90 00 00 00 49 8b 4c 24 40 48 39 f2 75 08 e9 83 00 00 00 48 89 ca 48 81 fa 60 29 c3 81 0f 84 41 fe ff ff <48> 8b 8a c8 00 00 00 48 39 ce 75 e4 3b 82 d0 00 00 00 0f 84 4b > [ 89.667778] RIP [<ffffffff810784b0>] commit_creds+0x250/0x2f0 > [ 89.668733] RSP <ffff880115657eb8> > [ 89.669301] CR2: 00000000000000c8 > > My fastest trinity induced oops yet! > > > Appears to be.. > > if ((set_ns == subset_ns->parent) && > 850: 48 8b 8a c8 00 00 00 mov 0xc8(%rdx),%rcx > > from the inlined cred_cap_issubset By historical accident we have been reading trying to set new->user_ns from new->user_ns. Which is totally silly as new->user_ns is NULL (as is every other field in new except session_keyring at that point). The intent is clearly to copy all of the fields from old to new so copy old->user_ns into into new->user_ns. Cc: stable@vger.kernel.org Reported-by: Dave Jones <davej@redhat.com> Tested-by: Dave Jones <davej@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

While shuting down a HVM guest with pci devices passed through we get this: pciback 0000:04:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100002) ------------[ cut here ]------------ WARNING: at drivers/pci/pci.c:1397 pci_disable_device+0x88/0xa0() Hardware name: MS-7640 Device pciback disabling already-disabled device Modules linked in: Pid: 53, comm: xenwatch Not tainted 3.9.0-rc1-20130304a+ #1 Call Trace: [<ffffffff8106994a>] warn_slowpath_common+0x7a/0xc0 [<ffffffff81069a31>] warn_slowpath_fmt+0x41/0x50 [<ffffffff813cf288>] pci_disable_device+0x88/0xa0 [<ffffffff814554a7>] xen_pcibk_reset_device+0x37/0xd0 [<ffffffff81454b6f>] ? pcistub_put_pci_dev+0x6f/0x120 [<ffffffff81454b8d>] pcistub_put_pci_dev+0x8d/0x120 [<ffffffff814582a9>] __xen_pcibk_release_devices+0x59/0xa0 This fixes the bug. CC: stable@vger.kernel.org Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Dave reported following crash : general protection fault: 0000 [#1] SMP CPU 2 Pid: 25407, comm: qemu-kvm Not tainted 3.7.9-205.fc18.x86_64 #1 Hewlett-Packard HP Z400 Workstation/0B4Ch RIP: 0010:[<ffffffffa0399bd5>] [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack] RSP: 0018:ffff880276913d78 EFLAGS: 00010206 RAX: 50626b6b7876376c RBX: ffff88026e530d68 RCX: ffff88028d158e00 RDX: ffff88026d0d5470 RSI: 0000000000000011 RDI: 0000000000000002 RBP: ffff880276913d88 R08: 0000000000000000 R09: ffff880295002900 R10: 0000000000000000 R11: 0000000000000003 R12: ffffffff81ca3b40 R13: ffffffff8151a8e0 R14: ffff880270875000 R15: 0000000000000002 FS: 00007ff3bce38a00(0000) GS:ffff88029fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fd1430bd000 CR3: 000000027042b000 CR4: 00000000000027e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qemu-kvm (pid: 25407, threadinfo ffff880276912000, task ffff88028c369720) Stack: ffff880156f59100 ffff880156f59100 ffff880276913d98 ffffffff815534f7 ffff880276913db8 ffffffff8151a74b ffff880270875000 ffff880156f59100 ffff880276913dd8 ffffffff8151a5a6 ffff880276913dd8 ffff88026d0d5470 Call Trace: [<ffffffff815534f7>] nf_conntrack_destroy+0x17/0x20 [<ffffffff8151a74b>] skb_release_head_state+0x7b/0x100 [<ffffffff8151a5a6>] __kfree_skb+0x16/0xa0 [<ffffffff8151a666>] kfree_skb+0x36/0xa0 [<ffffffff8151a8e0>] skb_queue_purge+0x20/0x40 [<ffffffffa02205f7>] __tun_detach+0x117/0x140 [tun] [<ffffffffa022184c>] tun_chr_close+0x3c/0xd0 [tun] [<ffffffff8119669c>] __fput+0xec/0x240 [<ffffffff811967fe>] ____fput+0xe/0x10 [<ffffffff8107eb27>] task_work_run+0xa7/0xe0 [<ffffffff810149e1>] do_notify_resume+0x71/0xb0 [<ffffffff81640152>] int_signal+0x12/0x17 Code: 00 00 04 48 89 e5 41 54 53 48 89 fb 4c 8b a7 e8 00 00 00 0f 85 de 00 00 00 0f b6 73 3e 0f b7 7b 2a e8 10 40 00 00 48 85 c0 74 0e <48> 8b 40 28 48 85 c0 74 05 48 89 df ff d0 48 c7 c7 08 6a 3a a0 RIP [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack] RSP <ffff880276913d78> This is because tun_net_xmit() needs to call nf_reset() before queuing skb into receive_queue Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

The following script will produce a kernel oops: sudo ip netns add v sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo sudo ip netns exec v ip link set lo up sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo sudo ip netns exec v ip link set vxlan0 up sudo ip netns del v where inspect by gdb: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 107] 0xffffffffa0289e33 in ?? () (gdb) bt #0 vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533 #1 vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087 #2 0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299 #3 0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335 #4 0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851 #5 0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752 #6 0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170 #7 0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302 #8 0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157 #9 0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276 #10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168 #11 <signal handler called> #12 0x0000000000000000 in ?? () #13 0x0000000000000000 in ?? () (gdb) fr 0 #0 vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533 533 struct sock *sk = vn->sock->sk; (gdb) l 528 static int vxlan_leave_group(struct net_device *dev) 529 { 530 struct vxlan_dev *vxlan = netdev_priv(dev); 531 struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); 532 int err = 0; 533 struct sock *sk = vn->sock->sk; 534 struct ip_mreqn mreq = { 535 .imr_multiaddr.s_addr = vxlan->gaddr, 536 .imr_ifindex = vxlan->link, 537 }; (gdb) p vn->sock $4 = (struct socket *) 0x0 The kernel calls `vxlan_exit_net` when deleting the netns before shutting down vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock` is already gone causes the oops. so we should manually shutdown all interfaces before deleting `vn->sock` as the patch does. Signed-off-by: Zang MingJie <zealot0630@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

ca22e56 (driver-core: implement 'sysdev' functionality for regular devices and buses) has introduced bus_register macro with a static key to distinguish different subsys mutex classes. This however doesn't work for different subsys which use a common registering function. One example is subsys_system_register (and mce_device and cpu_device). In the end this leads to the following lockdep splat: [ 207.271924] ====================================================== [ 207.271932] [ INFO: possible circular locking dependency detected ] [ 207.271942] 3.9.0-rc1-0.7-default+ #34 Not tainted [ 207.271948] ------------------------------------------------------- [ 207.271957] bash/10493 is trying to acquire lock: [ 207.271963] (subsys mutex){+.+.+.}, at: [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0 [ 207.271987] [ 207.271987] but task is already holding lock: [ 207.271995] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60 [ 207.272012] [ 207.272012] which lock already depends on the new lock. [ 207.272012] [ 207.272023] [ 207.272023] the existing dependency chain (in reverse order) is: [ 207.272033] [ 207.272033] -> #4 (cpu_hotplug.lock){+.+.+.}: [ 207.272044] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272056] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.272069] [<ffffffff81046ba9>] get_online_cpus+0x29/0x40 [ 207.272082] [<ffffffff81185210>] drain_all_stock+0x30/0x150 [ 207.272094] [<ffffffff811853da>] mem_cgroup_reclaim+0xaa/0xe0 [ 207.272104] [<ffffffff8118775e>] __mem_cgroup_try_charge+0x51e/0xcf0 [ 207.272114] [<ffffffff81188486>] mem_cgroup_charge_common+0x36/0x60 [ 207.272125] [<ffffffff811884da>] mem_cgroup_newpage_charge+0x2a/0x30 [ 207.272135] [<ffffffff81150531>] do_wp_page+0x231/0x830 [ 207.272147] [<ffffffff8115151e>] handle_pte_fault+0x19e/0x8d0 [ 207.272157] [<ffffffff81151da8>] handle_mm_fault+0x158/0x1e0 [ 207.272166] [<ffffffff814b6153>] do_page_fault+0x2a3/0x4e0 [ 207.272178] [<ffffffff814b2578>] page_fault+0x28/0x30 [ 207.272189] [ 207.272189] -> #3 (&mm->mmap_sem){++++++}: [ 207.272199] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272208] [<ffffffff8114c5ad>] might_fault+0x6d/0x90 [ 207.272218] [<ffffffff811a11e3>] filldir64+0xb3/0x120 [ 207.272229] [<ffffffffa013fc19>] call_filldir+0x89/0x130 [ext3] [ 207.272248] [<ffffffffa0140377>] ext3_readdir+0x6b7/0x7e0 [ext3] [ 207.272263] [<ffffffff811a1519>] vfs_readdir+0xa9/0xc0 [ 207.272273] [<ffffffff811a15cb>] sys_getdents64+0x9b/0x110 [ 207.272284] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272296] [ 207.272296] -> #2 (&type->i_mutex_dir_key#3){+.+.+.}: [ 207.272309] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272319] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.272329] [<ffffffff8119c254>] link_path_walk+0x6f4/0x9a0 [ 207.272339] [<ffffffff8119e7fa>] path_openat+0xba/0x470 [ 207.272349] [<ffffffff8119ecf8>] do_filp_open+0x48/0xa0 [ 207.272358] [<ffffffff8118d81c>] file_open_name+0xdc/0x110 [ 207.272369] [<ffffffff8118d885>] filp_open+0x35/0x40 [ 207.272378] [<ffffffff8135c76e>] _request_firmware+0x52e/0xb20 [ 207.272389] [<ffffffff8135cdd6>] request_firmware+0x16/0x20 [ 207.272399] [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode] [ 207.272416] [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode] [ 207.272431] [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode] [ 207.272444] [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100 [ 207.272457] [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4 [ 207.272472] [<ffffffff81000202>] do_one_initcall+0x42/0x180 [ 207.272485] [<ffffffff810bbeff>] load_module+0x19df/0x1b70 [ 207.272499] [<ffffffff810bc376>] sys_init_module+0xe6/0x130 [ 207.272511] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272523] [ 207.272523] -> #1 (umhelper_sem){++++.+}: [ 207.272537] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272548] [<ffffffff814ae9c4>] down_read+0x34/0x50 [ 207.272559] [<ffffffff81062bff>] usermodehelper_read_trylock+0x4f/0x100 [ 207.272575] [<ffffffff8135c7dd>] _request_firmware+0x59d/0xb20 [ 207.272587] [<ffffffff8135cdd6>] request_firmware+0x16/0x20 [ 207.272599] [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode] [ 207.272613] [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode] [ 207.272627] [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode] [ 207.272641] [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100 [ 207.272654] [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4 [ 207.272666] [<ffffffff81000202>] do_one_initcall+0x42/0x180 [ 207.272678] [<ffffffff810bbeff>] load_module+0x19df/0x1b70 [ 207.272690] [<ffffffff810bc376>] sys_init_module+0xe6/0x130 [ 207.272702] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272715] [ 207.272715] -> #0 (subsys mutex){+.+.+.}: [ 207.272729] [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0 [ 207.272740] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272751] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.272763] [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0 [ 207.272775] [<ffffffff81349114>] device_del+0x134/0x1f0 [ 207.272786] [<ffffffff813491f2>] device_unregister+0x22/0x60 [ 207.272798] [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad [ 207.272812] [<ffffffff814b6402>] notifier_call_chain+0x72/0x130 [ 207.272824] [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10 [ 207.272839] [<ffffffff81498f76>] _cpu_down+0x1d6/0x350 [ 207.272851] [<ffffffff81499130>] cpu_down+0x40/0x60 [ 207.272862] [<ffffffff8149cc55>] store_online+0x75/0xe0 [ 207.272874] [<ffffffff813474a0>] dev_attr_store+0x20/0x30 [ 207.272886] [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150 [ 207.272900] [<ffffffff8118e10b>] vfs_write+0xcb/0x130 [ 207.272911] [<ffffffff8118e924>] sys_write+0x64/0xa0 [ 207.272923] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272936] [ 207.272936] other info that might help us debug this: [ 207.272936] [ 207.272952] Chain exists of: [ 207.272952] subsys mutex --> &mm->mmap_sem --> cpu_hotplug.lock [ 207.272952] [ 207.272973] Possible unsafe locking scenario: [ 207.272973] [ 207.272984] CPU0 CPU1 [ 207.272992] ---- ---- [ 207.273000] lock(cpu_hotplug.lock); [ 207.273009] lock(&mm->mmap_sem); [ 207.273020] lock(cpu_hotplug.lock); [ 207.273031] lock(subsys mutex); [ 207.273040] [ 207.273040] *** DEADLOCK *** [ 207.273040] [ 207.273055] 5 locks held by bash/10493: [ 207.273062] #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff81209049>] sysfs_write_file+0x49/0x150 [ 207.273080] #1: (s_active#150){.+.+.+}, at: [<ffffffff812090c2>] sysfs_write_file+0xc2/0x150 [ 207.273099] #2: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81027557>] cpu_hotplug_driver_lock+0x17/0x20 [ 207.273121] #3: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8149911c>] cpu_down+0x2c/0x60 [ 207.273140] #4: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60 [ 207.273158] [ 207.273158] stack backtrace: [ 207.273170] Pid: 10493, comm: bash Not tainted 3.9.0-rc1-0.7-default+ #34 [ 207.273180] Call Trace: [ 207.273192] [<ffffffff810ab373>] print_circular_bug+0x223/0x310 [ 207.273204] [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0 [ 207.273216] [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0 [ 207.273227] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.273239] [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0 [ 207.273251] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.273263] [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0 [ 207.273274] [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0 [ 207.273286] [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0 [ 207.273298] [<ffffffff81349114>] device_del+0x134/0x1f0 [ 207.273309] [<ffffffff813491f2>] device_unregister+0x22/0x60 [ 207.273321] [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad [ 207.273332] [<ffffffff814b6402>] notifier_call_chain+0x72/0x130 [ 207.273344] [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10 [ 207.273356] [<ffffffff81498f76>] _cpu_down+0x1d6/0x350 [ 207.273368] [<ffffffff81027557>] ? cpu_hotplug_driver_lock+0x17/0x20 [ 207.273380] [<ffffffff81499130>] cpu_down+0x40/0x60 [ 207.273391] [<ffffffff8149cc55>] store_online+0x75/0xe0 [ 207.273402] [<ffffffff813474a0>] dev_attr_store+0x20/0x30 [ 207.273413] [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150 [ 207.273425] [<ffffffff8118e10b>] vfs_write+0xcb/0x130 [ 207.273436] [<ffffffff8118e924>] sys_write+0x64/0xa0 [ 207.273447] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b Which reports a false possitive deadlock because it sees: 1) load_module -> subsys_interface_register -> mc_deveice_add (*) -> subsys->p->mutex -> link_path_walk -> lookup_slow -> i_mutex 2) sys_write -> _cpu_down -> cpu_hotplug_begin -> cpu_hotplug.lock -> mce_cpu_callback -> mce_device_remove(**) -> device_unregister -> bus_remove_device -> subsys mutex 3) vfs_readdir -> i_mutex -> filldir64 -> might_fault -> might_lock_read(mmap_sem) -> page_fault -> mmap_sem -> drain_all_stock -> cpu_hotplug.lock but 1) takes cpu_subsys subsys (*) but 2) takes mce_device subsys (**) so the deadlock is not possible AFAICS. The fix is quite simple. We can pull the key inside bus_type structure because they are defined per device so the pointer will be unique as well. bus_register doesn't need to be a macro anymore so change it to the inline. We could get rid of __bus_register as there is no other caller but maybe somebody will want to use a different key so keep it around for now. Reported-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

With deferred setup for SCO, it is possible that userspace closes the socket when it is in the BT_CONNECT2 state, after the Connect Request is received but before the Accept Synchonous Connection is sent. If this happens the following crash was observed, when the connection is terminated: [ +0.000003] hci_sync_conn_complete_evt: hci0 status 0x10 [ +0.000005] sco_connect_cfm: hcon ffff88003d1bd800 bdaddr 40:98:4e:32:d7:39 status 16 [ +0.000003] sco_conn_del: hcon ffff88003d1bd800 conn ffff88003cc8e300, err 110 [ +0.000015] BUG: unable to handle kernel NULL pointer dereference at 0000000000000199 [ +0.000906] IP: [<ffffffff810620dd>] __lock_acquire+0xed/0xe82 [ +0.000000] PGD 3d21f067 PUD 3d291067 PMD 0 [ +0.000000] Oops: 0002 [#1] SMP [ +0.000000] Modules linked in: rfcomm bnep btusb bluetooth [ +0.000000] CPU 0 [ +0.000000] Pid: 1481, comm: kworker/u:2H Not tainted 3.9.0-rc1-25019-gad82cdd #1 Bochs Bochs [ +0.000000] RIP: 0010:[<ffffffff810620dd>] [<ffffffff810620dd>] __lock_acquire+0xed/0xe82 [ +0.000000] RSP: 0018:ffff88003c3c19d8 EFLAGS: 00010002 [ +0.000000] RAX: 0000000000000001 RBX: 0000000000000246 RCX: 0000000000000000 [ +0.000000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003d1be868 [ +0.000000] RBP: ffff88003c3c1a98 R08: 0000000000000002 R09: 0000000000000000 [ +0.000000] R10: ffff88003d1be868 R11: ffff88003e20b000 R12: 0000000000000002 [ +0.000000] R13: ffff88003aaa8000 R14: 000000000000006e R15: ffff88003d1be850 [ +0.000000] FS: 0000000000000000(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000 [ +0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ +0.000000] CR2: 0000000000000199 CR3: 000000003c1cb000 CR4: 00000000000006b0 [ +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ +0.000000] Process kworker/u:2H (pid: 1481, threadinfo ffff88003c3c0000, task ffff88003aaa8000) [ +0.000000] Stack: [ +0.000000] ffffffff81b16342 0000000000000000 0000000000000000 ffff88003d1be868 [ +0.000000] ffffffff00000000 00018c0c7863e367 000000003c3c1a28 ffffffff8101efbd [ +0.000000] 0000000000000000 ffff88003e3d2400 ffff88003c3c1a38 ffffffff81007c7a [ +0.000000] Call Trace: [ +0.000000] [<ffffffff8101efbd>] ? kvm_clock_read+0x34/0x3b [ +0.000000] [<ffffffff81007c7a>] ? paravirt_sched_clock+0x9/0xd [ +0.000000] [<ffffffff81007fd4>] ? sched_clock+0x9/0xb [ +0.000000] [<ffffffff8104fd7a>] ? sched_clock_local+0x12/0x75 [ +0.000000] [<ffffffff810632d1>] lock_acquire+0x93/0xb1 [ +0.000000] [<ffffffffa0022339>] ? spin_lock+0x9/0xb [bluetooth] [ +0.000000] [<ffffffff8105f3d8>] ? lock_release_holdtime.part.22+0x4e/0x55 [ +0.000000] [<ffffffff814f6038>] _raw_spin_lock+0x40/0x74 [ +0.000000] [<ffffffffa0022339>] ? spin_lock+0x9/0xb [bluetooth] [ +0.000000] [<ffffffff814f6936>] ? _raw_spin_unlock+0x23/0x36 [ +0.000000] [<ffffffffa0022339>] spin_lock+0x9/0xb [bluetooth] [ +0.000000] [<ffffffffa00230cc>] sco_conn_del+0x76/0xbb [bluetooth] [ +0.000000] [<ffffffffa002391d>] sco_connect_cfm+0x2da/0x2e9 [bluetooth] [ +0.000000] [<ffffffffa000862a>] hci_proto_connect_cfm+0x38/0x65 [bluetooth] [ +0.000000] [<ffffffffa0008d30>] hci_sync_conn_complete_evt.isra.79+0x11a/0x13e [bluetooth] [ +0.000000] [<ffffffffa000cd96>] hci_event_packet+0x153b/0x239d [bluetooth] [ +0.000000] [<ffffffff814f68ff>] ? _raw_spin_unlock_irqrestore+0x48/0x5c [ +0.000000] [<ffffffffa00025f6>] hci_rx_work+0xf3/0x2e3 [bluetooth] [ +0.000000] [<ffffffff8103efed>] process_one_work+0x1dc/0x30b [ +0.000000] [<ffffffff8103ef83>] ? process_one_work+0x172/0x30b [ +0.000000] [<ffffffff8103e07f>] ? spin_lock_irq+0x9/0xb [ +0.000000] [<ffffffff8103fc8d>] worker_thread+0x123/0x1d2 [ +0.000000] [<ffffffff8103fb6a>] ? manage_workers+0x240/0x240 [ +0.000000] [<ffffffff81044211>] kthread+0x9d/0xa5 [ +0.000000] [<ffffffff81044174>] ? __kthread_parkme+0x60/0x60 [ +0.000000] [<ffffffff814f75bc>] ret_from_fork+0x7c/0xb0 [ +0.000000] [<ffffffff81044174>] ? __kthread_parkme+0x60/0x60 [ +0.000000] Code: d7 44 89 8d 50 ff ff ff 4c 89 95 58 ff ff ff e8 44 fc ff ff 44 8b 8d 50 ff ff ff 48 85 c0 4c 8b 95 58 ff ff ff 0f 84 7a 04 00 00 <f0> ff 80 98 01 00 00 83 3d 25 41 a7 00 00 45 8b b5 e8 05 00 00 [ +0.000000] RIP [<ffffffff810620dd>] __lock_acquire+0xed/0xe82 [ +0.000000] RSP <ffff88003c3c19d8> [ +0.000000] CR2: 0000000000000199 [ +0.000000] ---[ end trace e73cd3b52352dd34 ]--- Cc: stable@vger.kernel.org [3.8] Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org> Tested-by: Frederic Dalleau <frederic.dalleau@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

We need to hand down parallel build options like the internal make --jobserver-fds one so that parallel builds can also happen when building perf from the toplevel directory. Make it so #1! Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1361374353-30385-3-git-send-email-bp@alien8.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The get_node_page_ra tries to: 1. grab or read a target node page for the given nid, 2. then, call ra_node_page to read other adjacent node pages in advance. So, when we try to read a target node page by #1, we should submit bio with READ_SYNC instead of READA. And, in #2, READA should be used. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>

Credit distribution stats is currently implemented only for SDIO. This fixes a crash in debugfs for USB interface. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<f91c2048>] read_file_credit_dist_stats+0x38/0x330 [ath6kl_core] *pde = b62bd067 Oops: 0000 [#1] SMP EIP: 0060:[<f91c2048>] EFLAGS: 00210246 CPU: 0 EIP is at read_file_credit_dist_stats+0x38/0x330 [ath6kl_core] EAX: 00000000 EBX: e6f7a9c0 ECX: e7b148b8 EDX: 00000000 ESI: 000000c8 EDI: e7b14000 EBP: e6e09f64 ESP: e6e09f30 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process cat (pid: 4058, ti=e6e08000 task=e50cf230 task.ti=e6e08000) Stack: 00008000 00000000 e6e09f64 c1132d3c 00004e71 e50cf230 00008000 089e4000 e7b148b8 00000000 e6f7a9c0 00008000 089e4000 e6e09f8c c11331fc e6e09f98 00000001 e6e09f7c f91c2010 e6e09fac e6f7a9c0 089e4877 089e4000 e6e09fac Call Trace: [<c1132d3c>] ? rw_verify_area+0x6c/0x120 [<c11331fc>] vfs_read+0x8c/0x160 [<f91c2010>] ? read_file_war_stats+0x130/0x130 [ath6kl_core] [<c113330d>] sys_read+0x3d/0x70 [<c15755b4>] syscall_call+0x7/0xb [<c1570000>] ? fill_powernow_table_pstate+0x127/0x127 Cc: Ryan Hsu <ryanhsu@qca.qualcomm.com> Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

Commit 84c1754 (ext4: move work from io_end to inode) triggered a regression when running xfstest #270 when the file system is mounted with dioread_nolock. The problem is that after ext4_evict_inode() calls ext4_ioend_wait(), this guarantees that last io_end structure has been freed, but it does not guarantee that the workqueue structure, which was moved into the inode by commit 84c1754, is actually finished. Once ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this will allow ext4_ioend_wait() to return on CPU #2, at which point the evict_inode() codepath can race against the workqueue code on CPU #1 accessing EXT4_I(inode)->i_unwritten_work to find the next item of work to do. Fix this by calling cancel_work_sync() in ext4_ioend_wait(), which will be renamed ext4_ioend_shutdown(), since it is only used by ext4_evict_inode(). Also, move the call to ext4_ioend_shutdown() until after truncate_inode_pages() and filemap_write_and_wait() are called, to make sure all dirty pages have been written back and flushed from the page cache first. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e *pdpt = 0000000030bc3001 *pde = 0000000000000000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c1754-dirty #91 Bochs Bochs EIP: 0060:[<c01dda6a>] EFLAGS: 00010046 CPU: 0 EIP is at cwq_activate_delayed_work+0x3b/0x7e EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000 ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000) Stack: f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0 f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000 Call Trace: [<c01de3d7>] cwq_dec_nr_in_flight+0x71/0xfb [<c01def1d>] process_one_work+0x5d8/0x637 [<c040a10b>] ? ext4_end_bio+0x300/0x300 [<c01e3105>] worker_thread+0x249/0x3ef [<c01ea317>] kthread+0xd8/0xeb [<c01e2ebc>] ? manage_workers+0x4bb/0x4bb [<c023a370>] ? trace_hardirqs_on+0x27/0x37 [<c0f1b4b7>] ret_from_kernel_thread+0x1b/0x28 [<c01ea23f>] ? __init_kthread_worker+0x71/0x71 Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff <8b> 13 89 f0 83 05 b8 ff 6c c1 6c c1 00 31 c9 83 EIP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84 CR2: 0000000000000000 ---[ end trace a1923229da53d8a4 ]--- Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz>

The sched_clock_remote() implementation has the following inatomicity problem on 32bit systems when accessing the remote scd->clock, which is a 64bit value. CPU0 CPU1 sched_clock_local() sched_clock_remote(CPU0) ... remote_clock = scd[CPU0]->clock read_low32bit(scd[CPU0]->clock) cmpxchg64(scd->clock,...) read_high32bit(scd[CPU0]->clock) While the update of scd->clock is using an atomic64 mechanism, the readout on the remote cpu is not, which can cause completely bogus readouts. It is a quite rare problem, because it requires the update to hit the narrow race window between the low/high readout and the update must go across the 32bit boundary. The resulting misbehaviour is, that CPU1 will see the sched_clock on CPU1 ~4 seconds ahead of it's own and update CPU1s sched_clock value to this bogus timestamp. This stays that way due to the clamping implementation for about 4 seconds until the synchronization with CLOCK_MONOTONIC undoes the problem. The issue is hard to observe, because it might only result in a less accurate SCHED_OTHER timeslicing behaviour. To create observable damage on realtime scheduling classes, it is necessary that the bogus update of CPU1 sched_clock happens in the context of an realtime thread, which then gets charged 4 seconds of RT runtime, which results in the RT throttler mechanism to trigger and prevent scheduling of RT tasks for a little less than 4 seconds. So this is quite unlikely as well. The issue was quite hard to decode as the reproduction time is between 2 days and 3 weeks and intrusive tracing makes it less likely, but the following trace recorded with trace_clock=global, which uses sched_clock_local(), gave the final hint: <idle>-0 0d..30 400269.477150: hrtimer_cancel: hrtimer=0xf7061e80 <idle>-0 0d..30 400269.477151: hrtimer_start: hrtimer=0xf7061e80 ... irq/20-S-587 1d..32 400273.772118: sched_wakeup: comm= ... target_cpu=0 <idle>-0 0dN.30 400273.772118: hrtimer_cancel: hrtimer=0xf7061e80 What happens is that CPU0 goes idle and invokes sched_clock_idle_sleep_event() which invokes sched_clock_local() and CPU1 runs a remote wakeup for CPU0 at the same time, which invokes sched_remote_clock(). The time jump gets propagated to CPU0 via sched_remote_clock() and stays stale on both cores for ~4 seconds. There are only two other possibilities, which could cause a stale sched clock: 1) ktime_get() which reads out CLOCK_MONOTONIC returns a sporadic wrong value. 2) sched_clock() which reads the TSC returns a sporadic wrong value. #1 can be excluded because sched_clock would continue to increase for one jiffy and then go stale. #2 can be excluded because it would not make the clock jump forward. It would just result in a stale sched_clock for one jiffy. After quite some brain twisting and finding the same pattern on other traces, sched_clock_remote() remained the only place which could cause such a problem and as explained above it's indeed racy on 32bit systems. So while on 64bit systems the readout is atomic, we need to verify the remote readout on 32bit machines. We need to protect the local->clock readout in sched_clock_remote() on 32bit as well because an NMI could hit between the low and the high readout, call sched_clock_local() and modify local->clock. Thanks to Siegfried Wulsch for bearing with my debug requests and going through the tedious tasks of running a bunch of reproducer systems to generate the debug information which let me decode the issue. Reported-by: Siegfried Wulsch <Siegfried.Wulsch@rovema.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304051544160.21884@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org

The function tracing control loop used by perf spits out a warning if the called function is not a control function. This is because the control function references a per cpu allocated data structure on struct ftrace_ops that is not allocated for other types of functions. commit 0a01640 "ftrace: Optimize the function tracer list loop" Had an optimization done to all function tracing loops to optimize for a single registered ops. Unfortunately, this allows for a slight race when tracing starts or ends, where the stub function might be called after the current registered ops is removed. In this case we get the following dump: root# perf stat -e ftrace:function sleep 1 [ 74.339105] WARNING: at include/linux/ftrace.h:209 ftrace_ops_control_func+0xde/0xf0() [ 74.349522] Hardware name: PRIMERGY RX200 S6 [ 74.357149] Modules linked in: sg igb iTCO_wdt ptp pps_core iTCO_vendor_support i7core_edac dca lpc_ich i2c_i801 coretemp edac_core crc32c_intel mfd_core ghash_clmulni_intel dm_multipath acpi_power_meter pcspk r microcode vhost_net tun macvtap macvlan nfsd kvm_intel kvm auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sd_mod crc_t10dif sr_mod cdrom mgag200 i2c_algo_bit drm_kms_helper ttm qla2xxx mptsas ahci drm li bahci scsi_transport_sas mptscsih libata scsi_transport_fc i2c_core mptbase scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 74.446233] Pid: 1377, comm: perf Tainted: G W 3.9.0-rc1 #1 [ 74.453458] Call Trace: [ 74.456233] [<ffffffff81062e3f>] warn_slowpath_common+0x7f/0xc0 [ 74.462997] [<ffffffff810fbc60>] ? rcu_note_context_switch+0xa0/0xa0 [ 74.470272] [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0 [ 74.478117] [<ffffffff81062e9a>] warn_slowpath_null+0x1a/0x20 [ 74.484681] [<ffffffff81102ede>] ftrace_ops_control_func+0xde/0xf0 [ 74.491760] [<ffffffff8162f400>] ftrace_call+0x5/0x2f [ 74.497511] [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f [ 74.503486] [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f [ 74.509500] [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50 [ 74.516088] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40 [ 74.522268] [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50 [ 74.528837] [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0 [ 74.536696] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40 [ 74.542878] [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50 [ 74.548869] [<ffffffff81105c67>] unregister_ftrace_function+0x27/0x50 [ 74.556243] [<ffffffff8111eadf>] perf_ftrace_event_register+0x9f/0x140 [ 74.563709] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40 [ 74.569887] [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50 [ 74.575898] [<ffffffff8111e94e>] perf_trace_destroy+0x2e/0x50 [ 74.582505] [<ffffffff81127ba9>] tp_perf_event_destroy+0x9/0x10 [ 74.589298] [<ffffffff811295d0>] free_event+0x70/0x1a0 [ 74.595208] [<ffffffff8112a579>] perf_event_release_kernel+0x69/0xa0 [ 74.602460] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40 [ 74.608667] [<ffffffff8112a640>] put_event+0x90/0xc0 [ 74.614373] [<ffffffff8112a740>] perf_release+0x10/0x20 [ 74.620367] [<ffffffff811a3044>] __fput+0xf4/0x280 [ 74.625894] [<ffffffff811a31de>] ____fput+0xe/0x10 [ 74.631387] [<ffffffff81083697>] task_work_run+0xa7/0xe0 [ 74.637452] [<ffffffff81014981>] do_notify_resume+0x71/0xb0 [ 74.643843] [<ffffffff8162fa92>] int_signal+0x12/0x17 To fix this a new ftrace_ops flag is added that denotes the ftrace_list_end ftrace_ops stub as just that, a stub. This flag is now checked in the control loop and the function is not called if the flag is set. Thanks to Jovi for not just reporting the bug, but also pointing out where the bug was in the code. Link: http://lkml.kernel.org/r/514A8855.7090402@redhat.com Link: http://lkml.kernel.org/r/1364377499-1900-15-git-send-email-jovi.zhangwei@huawei.com Tested-by: WANG Chao <chaowang@redhat.com> Reported-by: WANG Chao <chaowang@redhat.com> Reported-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Since 749fefe in v3.7 ("block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()"), the following warning appears when multipath is used with CONFIG_PREEMPT=y. This patch moves blk_queue_bypass_start() before radix_tree_preload() to avoid the sleeping call while preemption is disabled. BUG: scheduling while atomic: multipath/2460/0x00000002 1 lock held by multipath/2460: #0: (&md->type_lock){......}, at: [<ffffffffa019fb05>] dm_lock_md_type+0x17/0x19 [dm_mod] Modules linked in: ... Pid: 2460, comm: multipath Tainted: G W 3.7.0-rc2 #1 Call Trace: [<ffffffff810723ae>] __schedule_bug+0x6a/0x78 [<ffffffff81428ba2>] __schedule+0xb4/0x5e0 [<ffffffff814291e6>] schedule+0x64/0x66 [<ffffffff8142773a>] schedule_timeout+0x39/0xf8 [<ffffffff8108ad5f>] ? put_lock_stats+0xe/0x29 [<ffffffff8108ae30>] ? lock_release_holdtime+0xb6/0xbb [<ffffffff814289e3>] wait_for_common+0x9d/0xee [<ffffffff8107526c>] ? try_to_wake_up+0x206/0x206 [<ffffffff810c0eb8>] ? kfree_call_rcu+0x1c/0x1c [<ffffffff81428aec>] wait_for_completion+0x1d/0x1f [<ffffffff810611f9>] wait_rcu_gp+0x5d/0x7a [<ffffffff81061216>] ? wait_rcu_gp+0x7a/0x7a [<ffffffff8106fb18>] ? complete+0x21/0x53 [<ffffffff810c0556>] synchronize_rcu+0x1e/0x20 [<ffffffff811dd903>] blk_queue_bypass_start+0x5d/0x62 [<ffffffff811ee109>] blkcg_activate_policy+0x73/0x270 [<ffffffff81130521>] ? kmem_cache_alloc_node_trace+0xc7/0x108 [<ffffffff811f04b3>] cfq_init_queue+0x80/0x28e [<ffffffffa01a1600>] ? dm_blk_ioctl+0xa7/0xa7 [dm_mod] [<ffffffff811d8c41>] elevator_init+0xe1/0x115 [<ffffffff811e229f>] ? blk_queue_make_request+0x54/0x59 [<ffffffff811dd743>] blk_init_allocated_queue+0x8c/0x9e [<ffffffffa019ffcd>] dm_setup_md_queue+0x36/0xaa [dm_mod] [<ffffffffa01a60e6>] table_load+0x1bd/0x2c8 [dm_mod] [<ffffffffa01a7026>] ctl_ioctl+0x1d6/0x236 [dm_mod] [<ffffffffa01a5f29>] ? table_clear+0xaa/0xaa [dm_mod] [<ffffffffa01a7099>] dm_ctl_ioctl+0x13/0x17 [dm_mod] [<ffffffff811479fc>] do_vfs_ioctl+0x3fb/0x441 [<ffffffff811b643c>] ? file_has_perm+0x8a/0x99 [<ffffffff81147aa0>] sys_ioctl+0x5e/0x82 [<ffffffff812010be>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff814310d9>] system_call_fastpath+0x16/0x1b Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>

do_loopback calls lock_mount(path) and forget to unlock_mount if clone_mnt or copy_mnt fails. [ 77.661566] ================================================ [ 77.662939] [ BUG: lock held when returning to user space! ] [ 77.664104] 3.9.0-rc5+ #17 Not tainted [ 77.664982] ------------------------------------------------ [ 77.666488] mount/514 is leaving the kernel with locks still held! [ 77.668027] 2 locks held by mount/514: [ 77.668817] #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff811cca22>] lock_mount+0x32/0xe0 [ 77.671755] #1: (&namespace_sem){+++++.}, at: [<ffffffff811cca3a>] lock_mount+0x4a/0xe0 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops when lazy MMU updates are enabled, because set_pgd effects are being deferred. One instance of this problem is during process mm cleanup with memory cgroups enabled. The chain of events is as follows: - zap_pte_range enables lazy MMU updates - zap_pte_range eventually calls mem_cgroup_charge_statistics, which accesses the vmalloc'd mem_cgroup per-cpu stat area - vmalloc_fault is triggered which tries to sync the corresponding PGD entry with set_pgd, but the update is deferred - vmalloc_fault oopses due to a mismatch in the PUD entries The OOPs usually looks as so: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/fault.c:396! invalid opcode: 0000 [#1] SMP .. snip .. CPU 1 Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1 RIP: e030:[<ffffffff816271bf>] [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208 .. snip .. Call Trace: [<ffffffff81627759>] do_page_fault+0x399/0x4b0 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110 [<ffffffff81624065>] page_fault+0x25/0x30 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870 [<ffffffff81154962>] unmap_vmas+0x52/0xa0 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff81059ce3>] mmput+0x83/0xf0 [<ffffffff810624c4>] exit_mm+0x104/0x130 [<ffffffff8106264a>] do_exit+0x15a/0x8c0 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0 [<ffffffff81063177>] sys_exit_group+0x17/0x20 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the changes visible to the consistency checks. Cc: <stable@vger.kernel.org> RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737 Tested-by: Josh Boyer <jwboyer@redhat.com> Reported-and-Tested-by: Krishna Raman <kraman@redhat.com> Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com> Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

e100 uses pci_map_single, but fails to check for a dma mapping error after its use, resulting in a stack trace: [ 46.656594] ------------[ cut here ]------------ [ 46.657004] WARNING: at lib/dma-debug.c:933 check_unmap+0x47b/0x950() [ 46.657004] Hardware name: To Be Filled By O.E.M. [ 46.657004] e100 0000:00:0e.0: DMA-API: device driver failed to check map error[device address=0x000000007a4540fa] [size=90 bytes] [mapped as single] [ 46.657004] Modules linked in: [ 46.657004] w83627hf hwmon_vid snd_via82xx ppdev snd_ac97_codec ac97_bus snd_seq snd_pcm snd_mpu401 snd_mpu401_uart ns558 snd_rawmidi gameport parport_pc e100 snd_seq_device parport snd_page_alloc snd_timer snd soundcore skge shpchp k8temp mii edac_core i2c_viapro edac_mce_amd nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc uinput ata_generic pata_acpi radeon i2c_algo_bit drm_kms_helper ttm firewire_ohci drm firewire_core pata_via sata_via i2c_core sata_promise crc_itu_t [ 46.657004] Pid: 792, comm: ip Not tainted 3.8.0-0.rc6.git0.1.fc19.x86_64 #1 [ 46.657004] Call Trace: [ 46.657004] <IRQ> [<ffffffff81065ed0>] warn_slowpath_common+0x70/0xa0 [ 46.657004] [<ffffffff81065f4c>] warn_slowpath_fmt+0x4c/0x50 [ 46.657004] [<ffffffff81364cfb>] check_unmap+0x47b/0x950 [ 46.657004] [<ffffffff8136522f>] debug_dma_unmap_page+0x5f/0x70 [ 46.657004] [<ffffffffa030f0f0>] ? e100_tx_clean+0x30/0x210 [e100] [ 46.657004] [<ffffffffa030f1a8>] e100_tx_clean+0xe8/0x210 [e100] [ 46.657004] [<ffffffffa030fc6f>] e100_poll+0x56f/0x6c0 [e100] [ 46.657004] [<ffffffff8159dce1>] ? net_rx_action+0xa1/0x370 [ 46.657004] [<ffffffff8159ddb2>] net_rx_action+0x172/0x370 [ 46.657004] [<ffffffff810703bf>] __do_softirq+0xef/0x3d0 [ 46.657004] [<ffffffff816e4ebc>] call_softirq+0x1c/0x30 [ 46.657004] [<ffffffff8101c485>] do_softirq+0x85/0xc0 [ 46.657004] [<ffffffff81070885>] irq_exit+0xd5/0xe0 [ 46.657004] [<ffffffff816e5756>] do_IRQ+0x56/0xc0 [ 46.657004] [<ffffffff816dacb2>] common_interrupt+0x72/0x72 [ 46.657004] <EOI> [<ffffffff816da1eb>] ? _raw_spin_unlock_irqrestore+0x3b/0x70 [ 46.657004] [<ffffffff816d124d>] __slab_free+0x58/0x38b [ 46.657004] [<ffffffff81214424>] ? fsnotify_clear_marks_by_inode+0x34/0x120 [ 46.657004] [<ffffffff811b0417>] ? kmem_cache_free+0x97/0x320 [ 46.657004] [<ffffffff8157fc14>] ? sock_destroy_inode+0x34/0x40 [ 46.657004] [<ffffffff8157fc14>] ? sock_destroy_inode+0x34/0x40 [ 46.657004] [<ffffffff811b0692>] kmem_cache_free+0x312/0x320 [ 46.657004] [<ffffffff8157fc14>] sock_destroy_inode+0x34/0x40 [ 46.657004] [<ffffffff811e8c28>] destroy_inode+0x38/0x60 [ 46.657004] [<ffffffff811e8d5e>] evict+0x10e/0x1a0 [ 46.657004] [<ffffffff811e9605>] iput+0xf5/0x180 [ 46.657004] [<ffffffff811e4338>] dput+0x248/0x310 [ 46.657004] [<ffffffff811ce0e1>] __fput+0x171/0x240 [ 46.657004] [<ffffffff811ce26e>] ____fput+0xe/0x10 [ 46.657004] [<ffffffff8108d54c>] task_work_run+0xac/0xe0 [ 46.657004] [<ffffffff8106c6ed>] do_exit+0x26d/0xc30 [ 46.657004] [<ffffffff8109eccc>] ? finish_task_switch+0x7c/0x120 [ 46.657004] [<ffffffff816dad58>] ? retint_swapgs+0x13/0x1b [ 46.657004] [<ffffffff8106d139>] do_group_exit+0x49/0xc0 [ 46.657004] [<ffffffff8106d1c4>] sys_exit_group+0x14/0x20 [ 46.657004] [<ffffffff816e3b19>] system_call_fastpath+0x16/0x1b [ 46.657004] ---[ end trace 4468c44e2156e7d1 ]--- [ 46.657004] Mapped at: [ 46.657004] [<ffffffff813663d1>] debug_dma_map_page+0x91/0x140 [ 46.657004] [<ffffffffa030e8eb>] e100_xmit_prepare+0x12b/0x1c0 [e100] [ 46.657004] [<ffffffffa030c924>] e100_exec_cb+0x84/0x140 [e100] [ 46.657004] [<ffffffffa030e56a>] e100_xmit_frame+0x3a/0x190 [e100] [ 46.657004] [<ffffffff8159ee89>] dev_hard_start_xmit+0x259/0x6c0 Easy fix, modify the cb paramter to e100_exec_cb to return an error, and do the dma_mapping_error check in the obvious place This was reported previously here: http://article.gmane.org/gmane.linux.network/257893 But nobody stepped up and fixed it. CC: Josh Boyer <jwboyer@redhat.com> CC: e1000-devel@lists.sourceforge.net Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Michal Jaegermann <michal@harddata.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Without this, it's possible that LTO will discard the calls to set_except_vector() in the probe for the DADDI overflow bug resulting in a kernel crash like this: [...] Mount-cache hash table entries: 256 Checking for the daddi bug... Integer overflow[#1]: Cpu 0 $ 0 : 0000000000000000 0000000010008ce1 0000000000000001 0000000000000000 $ 4 : 7fffffffffffedcd ffffffff81410000 0000000000000030 000000000000003f [...] There are other similar places in the kernel so we've just been lucky that GCC's been tolerant. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

This patch fixes bug when card is present during boot. Bug was introduced due commit "mmc: mxcmmc: fix bug that may block a data transfer forever". When a card is present "mxcmci_setup_data" function is executed, but the timer is not initialized. ... i.MX SDHC driver mmc0: SD Status: Invalid Allocation Unit size. mmc0: new SD card at address b368 mmcblk0: mmc0:b368 SDC 1.91 GiB ------------[ cut here ]------------ kernel BUG at kernel/timer.c:729! Internal error: Oops - BUG: 0 [#1] PREEMPT ARM CPU: 0 Not tainted (3.9.0-rc5-next-20130404 #2) PC is at mod_timer+0x168/0x198 LR is at mxcmci_request+0x21c/0x328 ... Signed-off-by: Alexander Shiyan <shc_work@mail.ru> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Chris Ball <cjb@laptop.org>

This patch attempts to fix: https://bugzilla.kernel.org/show_bug.cgi?id=56461 The symptom is a crash and messages like this: chrome: Corrupted page table at address 34a03000 *pdpt = 0000000000000000 *pde = 0000000000000000 Bad pagetable: 000f [#1] PREEMPT SMP Ingo guesses this got introduced by commit 611ae8e ("x86/tlb: enable tlb flush range support for x86") since that code started to free unused pagetables. On x86-32 PAE kernels, that new code has the potential to free an entire PMD page and will clear one of the four page-directory-pointer-table (aka pgd_t entries). The hardware aggressively "caches" these top-level entries and invlpg does not actually affect the CPU's copy. If we clear one we *HAVE* to do a full TLB flush, otherwise we might continue using a freed pmd page. (note, we do this properly on the population side in pud_populate()). This patch tracks whenever we clear one of these entries in the 'struct mmu_gather', and ensures that we follow up with a full tlb flush. BTW, I disassembled and checked that: if (tlb->fullmm == 0) and if (!tlb->fullmm && !tlb->need_flush_all) generate essentially the same code, so there should be zero impact there to the !PAE case. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Artem S Tashkinov <t.artem@mailcity.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

The patch provides a storage for the test results in the linked list. The gathered data could be used after test is done. The new file 'results' represents gathered data of the in progress test. The messages collected are printed to the kernel log as well. Example of output: % cat /sys/kernel/debug/dmatest/results dma0chan0-copy0: #1: No errors with src_off=0x7bf dst_off=0x8ad len=0x3fea (0) The message format is unified across the different types of errors. A number in the parens represents additional information, e.g. error code, error counter, or status. Note that the buffer comparison is done in the old way, i.e. data is not collected and just printed out. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>

…line. In the PVHVM path when we do CPU online/offline path we would leak the timer%d IRQ line everytime we do a offline event. The online path (xen_hvm_setup_cpu_clockevents via x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt line for the timer%d. But we would still use the old interrupt line leading to: kernel BUG at /home/konrad/ssd/konrad/linux/kernel/hrtimer.c:1261! invalid opcode: 0000 [#1] SMP RIP: 0010:[<ffffffff810b9e21>] [<ffffffff810b9e21>] hrtimer_interrupt+0x261/0x270 .. snip.. <IRQ> [<ffffffff810445ef>] xen_timer_interrupt+0x2f/0x1b0 [<ffffffff81104825>] ? stop_machine_cpu_stop+0xb5/0xf0 [<ffffffff8111434c>] handle_irq_event_percpu+0x7c/0x240 [<ffffffff811175b9>] handle_percpu_irq+0x49/0x70 [<ffffffff813a74a3>] __xen_evtchn_do_upcall+0x1c3/0x2f0 [<ffffffff813a760a>] xen_evtchn_do_upcall+0x2a/0x40 [<ffffffff8167c26d>] xen_hvm_callback_vector+0x6d/0x80 <EOI> [<ffffffff81666d01>] ? start_secondary+0x193/0x1a8 [<ffffffff81666cfd>] ? start_secondary+0x18f/0x1a8 There is also the oddity (timer1) in the /proc/interrupts after offlining CPU1: 64: 1121 0 xen-percpu-virq timer0 78: 0 0 xen-percpu-virq timer1 84: 0 2483 xen-percpu-virq timer2 This patch fixes it. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: stable@vger.kernel.org

…y CPU online/offline While we don't use the spinlock interrupt line (see for details commit f10cd52 - xen: disable PV spinlocks on HVM) - we should still do the proper init / deinit sequence. We did not do that correctly and for the CPU init for PVHVM guest we would allocate an interrupt line - but failed to deallocate the old interrupt line. This resulted in leakage of an irq_desc but more importantly this splat as we online an offlined CPU: genirq: Flags mismatch irq 71. 0002cc20 (spinlock1) vs. 0002cc20 (spinlock1) Pid: 2542, comm: init.late Not tainted 3.9.0-rc6upstream #1 Call Trace: [<ffffffff811156de>] __setup_irq+0x23e/0x4a0 [<ffffffff81194191>] ? kmem_cache_alloc_trace+0x221/0x250 [<ffffffff811161bb>] request_threaded_irq+0xfb/0x160 [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20 [<ffffffff813a8423>] bind_ipi_to_irqhandler+0xa3/0x160 [<ffffffff81303758>] ? kasprintf+0x38/0x40 [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20 [<ffffffff810cad35>] ? update_max_interval+0x15/0x40 [<ffffffff816605db>] xen_init_lock_cpu+0x3c/0x78 [<ffffffff81660029>] xen_hvm_cpu_notify+0x29/0x33 [<ffffffff81676bdd>] notifier_call_chain+0x4d/0x70 [<ffffffff810bb2a9>] __raw_notifier_call_chain+0x9/0x10 [<ffffffff8109402b>] __cpu_notify+0x1b/0x30 [<ffffffff8166834a>] _cpu_up+0xa0/0x14b [<ffffffff816684ce>] cpu_up+0xd9/0xec [<ffffffff8165f754>] store_online+0x94/0xd0 [<ffffffff8141d15b>] dev_attr_store+0x1b/0x20 [<ffffffff81218f44>] sysfs_write_file+0xf4/0x170 [<ffffffff811a2864>] vfs_write+0xb4/0x130 [<ffffffff811a302a>] sys_write+0x5a/0xa0 [<ffffffff8167ada9>] system_call_fastpath+0x16/0x1b cpu 1 spinlock event irq -16 smpboot: Booting Node 0 Processor 1 APIC 0x2 And if one looks at the /proc/interrupts right after offlining (CPU1): 70: 0 0 xen-percpu-ipi spinlock0 71: 0 0 xen-percpu-ipi spinlock1 77: 0 0 xen-percpu-ipi spinlock2 There is the oddity of the 'spinlock1' still being present. CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

When we online the CPU, we get this splat: smpboot: Booting Node 0 Processor 1 APIC 0x2 installing Xen timer for CPU 1 BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1 Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1 Call Trace: [<ffffffff810c1fea>] __might_sleep+0xda/0x100 [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0 [<ffffffff81303758>] ? kasprintf+0x38/0x40 [<ffffffff813036eb>] kvasprintf+0x5b/0x90 [<ffffffff81303758>] kasprintf+0x38/0x40 [<ffffffff81044510>] xen_setup_timer+0x30/0xb0 [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30 [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8 The solution to that is use kasprintf in the CPU hotplug path that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify, and remove the call to in xen_hvm_setup_cpu_clockevents. Unfortunatly the later is not a good idea as the bootup path does not use xen_hvm_cpu_notify so we would end up never allocating timer%d interrupt lines when booting. As such add the check for atomic() to continue. CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Commit c079c28 ("RDMA/cxgb4: Fix error handling in create_qp()") broke SQ allocation. Instead of falling back to host allocation when on-chip allocation fails, it tries to allocate both. And when it does, and we try to free the address from the genpool using the host address, we hit a BUG and the system crashes as below. We create a new function that has the previous behavior and properly propagate the error, as intended. kernel BUG at /usr/src/packages/BUILD/kernel-ppc64-3.0.68/linux-3.0/lib/genalloc.c:340! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: rdma_ucm rdma_cm ib_addr ib_cm iw_cm ib_sa ib_mad ib_uverbs iw_cxgb4 ib_core ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables fuse loop dm_mod ipv6 ipv6_lib sr_mod cdrom ibmveth(X) cxgb4 sg ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh ibmvscsic(X) scsi_transport_srp scsi_tgt scsi_mod Supported: Yes NIP: c00000000037d41c LR: d000000003913824 CTR: c00000000037d3b0 REGS: c0000001f350ae50 TRAP: 0700 Tainted: G X (3.0.68-0.9-ppc64) MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 24042482 XER: 00000001 TASK = c0000001f6f2a840[3616] 'rping' THREAD: c0000001f3508000 CPU: 0 GPR00: c0000001f6e875c8 c0000001f350b0d0 c000000000fc9690 c0000001f6e875c0 GPR04: 00000000000c0000 0000000000010000 0000000000000000 c0000000009d482a GPR08: 000000006a170000 0000000000100000 c0000001f350b140 c0000001f6e875c8 GPR12: d000000003915dd0 c000000003f40000 000000003e3ecfa8 c0000001f350bea0 GPR16: c0000001f350bcd0 00000000003c0000 0000000000040100 c0000001f6e74a80 GPR20: d00000000399a898 c0000001f6e74ac8 c0000001fad91600 c0000001f6e74ab0 GPR24: c0000001f7d23f80 0000000000000000 0000000000000002 000000006a170000 GPR28: 000000000000000c c0000001f584c8d0 d000000003925180 c0000001f6e875c8 NIP [c00000000037d41c] .gen_pool_free+0x6c/0xf8 LR [d000000003913824] .c4iw_ocqp_pool_free+0x8c/0xd8 [iw_cxgb4] Call Trace: [c0000001f350b0d0] [c0000001f350b180] 0xc0000001f350b180 (unreliable) [c0000001f350b170] [d000000003913824] .c4iw_ocqp_pool_free+0x8c/0xd8 [iw_cxgb4] [c0000001f350b210] [d00000000390fd70] .dealloc_sq+0x90/0xb0 [iw_cxgb4] [c0000001f350b280] [d00000000390fe08] .destroy_qp+0x78/0xf8 [iw_cxgb4] [c0000001f350b310] [d000000003912738] .c4iw_destroy_qp+0x208/0x2d0 [iw_cxgb4] [c0000001f350b460] [d000000003861874] .ib_destroy_qp+0x5c/0x130 [ib_core] [c0000001f350b510] [d0000000039911bc] .ib_uverbs_cleanup_ucontext+0x174/0x4f8 [ib_uverbs] [c0000001f350b5f0] [d000000003991568] .ib_uverbs_close+0x28/0x70 [ib_uverbs] [c0000001f350b670] [c0000000001e7b2c] .__fput+0xdc/0x278 [c0000001f350b720] [c0000000001a9590] .remove_vma+0x68/0xd8 [c0000001f350b7b0] [c0000000001a9720] .exit_mmap+0x120/0x160 [c0000001f350b8d0] [c0000000000af330] .mmput+0x80/0x160 [c0000001f350b960] [c0000000000b5d0c] .exit_mm+0x1ac/0x1e8 [c0000001f350ba10] [c0000000000b8154] .do_exit+0x1b4/0x4b8 [c0000001f350bad0] [c0000000000b84b0] .do_group_exit+0x58/0xf8 [c0000001f350bb60] [c0000000000ce9f4] .get_signal_to_deliver+0x2f4/0x5d0 [c0000001f350bc60] [c000000000017ee4] .do_signal_pending+0x6c/0x3e0 [c0000001f350bdb0] [c0000000000182cc] .do_signal+0x74/0x78 [c0000001f350be30] [c000000000009e74] do_work+0x24/0x28 Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Cc: Emil Goode <emilgoode@gmail.com> Cc: <stable@vger.kernel.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>

Currently IOP3XX_PERIPHERAL_VIRT_BASE conflicts with PCI_IO_VIRT_BASE: address size PCI_IO_VIRT_BASE 0xfee00000 0x200000 IOP3XX_PERIPHERAL_VIRT_BASE 0xfeffe000 0x2000 Fix by moving IOP3XX_PERIPHERAL_VIRT_BASE below PCI_IO_VIRT_BASE. The patch fixes the following kernel panic with 3.9-rc1 on iop3xx boards: [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Linux version 3.9.0-rc1-iop32x (aaro@blackmetal) (gcc version 4.7.2 (GCC) ) #20 PREEMPT Tue Mar 5 16:44:36 EET 2013 [ 0.000000] bootconsole [earlycon0] enabled [ 0.000000] ------------[ cut here ]------------ [ 0.000000] kernel BUG at mm/vmalloc.c:1145! [ 0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 Not tainted (3.9.0-rc1-iop32x #20) [ 0.000000] PC is at vm_area_add_early+0x4c/0x88 [ 0.000000] LR is at add_static_vm_early+0x14/0x68 [ 0.000000] pc : [<c03e74a8>] lr : [<c03e1c40>] psr: 800000d3 [ 0.000000] sp : c03ffee4 ip : dfffdf88 fp : c03ffef4 [ 0.000000] r10: 00000002 r9 : 000000cf r8 : 00000653 [ 0.000000] r7 : c040eca8 r6 : c03e2408 r5 : dfffdf60 r4 : 00200000 [ 0.000000] r3 : dfffdfd8 r2 : feffe000 r1 : ff000000 r0 : dfffdf60 [ 0.000000] Flags: Nzcv IRQs off FIQs off Mode SVC_32 ISA ARM Segment kernel [ 0.000000] Control: 0000397f Table: a0004000 DAC: 00000017 [ 0.000000] Process swapper (pid: 0, stack limit = 0xc03fe1b8) [ 0.000000] Stack: (0xc03ffee4 to 0xc0400000) [ 0.000000] fee0: 00200000 c03fff0c c03ffef8 c03e1c40 c03e7468 00200000 fee00000 [ 0.000000] ff00: c03fff2c c03fff10 c03e23e4 c03e1c38 feffe000 c0408ee4 ff000000 c0408f04 [ 0.000000] ff20: c03fff3c c03fff30 c03e2434 c03e23b4 c03fff84 c03fff40 c03e2c94 c03e2414 [ 0.000000] ff40: c03f8878 c03f6410 ffff0000 000bffff 00001000 00000008 c03fff84 c03f6410 [ 0.000000] ff60: c04227e8 c03fffd4 a0008000 c03f8878 69052e30 c02f96eb c03fffbc c03fff88 [ 0.000000] ff80: c03e044c c03e268c 00000000 0000397f c0385130 00000001 ffffffff c03f8874 [ 0.000000] ffa0: dfffffff a0004000 69052e30 a03f61a0 c03ffff4 c03fffc0 c03dd5cc c03e0184 [ 0.000000] ffc0: 00000000 00000000 00000000 00000000 00000000 c03f8878 0000397d c040601c [ 0.000000] ffe0: c03f8874 c0408674 00000000 c03ffff8 a0008040 c03dd558 00000000 00000000 [ 0.000000] Backtrace: [ 0.000000] [<c03e745c>] (vm_area_add_early+0x0/0x88) from [<c03e1c40>] (add_static_vm_early+0x14/0x68) Tested-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

This patch fixes the following oops, which could be trigged by build the kernel with many concurrent threads, under CONFIG_DEBUG_PAGEALLOC. hpte_insert() might return -1, indicating that the bucket (primary here) is full. We are not necessarily reporting a BUG in this case. Instead, we could try repeatedly (try secondary, remove and try again) until we find a slot. [ 543.075675] ------------[ cut here ]------------ [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] [ 543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 543.075741] Modules linked in: binfmt_misc ehea [ 543.075759] NIP: c000000000036eb0 LR: c000000000036ea4 CTR: c00000000005a594 [ 543.075771] REGS: c0000000a90832c0 TRAP: 0700 Not tainted (3.8.0-next-20130222) [ 543.075781] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 22224482 XER: 00000000 [ 543.075816] SOFTE: 0 [ 543.075823] CFAR: c00000000004c200 [ 543.075830] TASK = c0000000e506b750[23934] 'cc1' THREAD: c0000000a9080000 CPU: 1 GPR00: 0000000000000001 c0000000a9083540 c000000000c600a8 ffffffffffffffff GPR04: 0000000000000050 fffffffffffffffa c0000000a90834e0 00000000004ff594 GPR08: 0000000000000001 0000000000000000 000000009592d4d8 c000000000c86854 GPR12: 0000000000000002 c000000006ead300 0000000000a51000 0000000000000001 GPR16: f000000003354380 ffffffffffffffff ffffffffffffff80 0000000000000000 GPR20: 0000000000000001 c000000000c600a8 0000000000000001 0000000000000001 GPR24: 0000000003354380 c000000000000000 0000000000000000 c000000000b65950 GPR28: 0000002000000000 00000000000cd50e 0000000000bf50d9 c000000000c7c230 [ 543.076005] NIP [c000000000036eb0] .kernel_map_pages+0x1e0/0x3f8 [ 543.076016] LR [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8 [ 543.076025] Call Trace: [ 543.076033] [c0000000a9083540] [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable) [ 543.076053] [c0000000a9083640] [c000000000167638] .get_page_from_freelist+0x6cc/0x8dc [ 543.076067] [c0000000a9083800] [c000000000167a48] .__alloc_pages_nodemask+0x200/0x96c [ 543.076082] [c0000000a90839c0] [c0000000001ade44] .alloc_pages_vma+0x160/0x1e4 [ 543.076098] [c0000000a9083a80] [c00000000018ce04] .handle_pte_fault+0x1b0/0x7e8 [ 543.076113] [c0000000a9083b50] [c00000000018d5a8] .handle_mm_fault+0x16c/0x1a0 [ 543.076129] [c0000000a9083c00] [c0000000007bf1dc] .do_page_fault+0x4d0/0x7a4 [ 543.076144] [c0000000a9083e30] [c0000000000090e8] handle_page_fault+0x10/0x30 [ 543.076155] Instruction dump: [ 543.076163] 7c630038 78631d88 e80a0000 f8410028 7c0903a6 e91f01de e96a0010 e84a0008 [ 543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b000000> 7f63db78 48785781 60000000 [ 543.076224] ---[ end trace bd5807e8d6ae186b ]--- Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

During shutdown it's possible for __dev_close() (which holds rtnl_lock) to clear the __LINK_STATE_START bit, and for ixgbe to then read that bit (without holding rtnl_lock), and then not fail to free irqs, etc. The result is a crash like this: ------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:313! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 1 Pid: 5910, comm: reboot Tainted: P ---------------- 2.6.32 #1 empty RIP: 0010:[<ffffffff81305c2b>] [<ffffffff81305c2b>] free_msi_irqs+0x11b/0x130 RSP: 0018:ffff880185c9bc88 EFLAGS: 00010282 RAX: ffff880219f58bc0 RBX: ffff88021ac53b00 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000246 RDI: 000000000000004a RBP: ffff880185c9bcc8 R08: 0000000000000002 R09: 0000000000000106 R10: 0000000000000000 R11: 0000000000000006 R12: ffff88021e524778 R13: 0000000000000001 R14: ffff88021e524000 R15: 0000000000000000 FS: 00007f90821b7700(0000) GS:ffff880028220000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f90818bd010 CR3: 0000000132c64000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process reboot (pid: 5910, threadinfo ffff880185c9a000, task ffff88021bf04a80) Stack: ffff880185c9bc98 000000018130529d ffff880185c9bcc8 ffff88021e524000 <0> 0000000000000004 ffff88021948c700 0000000000000000 ffff880185c9bda7 <0> ffff880185c9bce8 ffffffff81305cbd ffff880185c9bce8 ffff88021948c700 Call Trace: [<ffffffff81305cbd>] pci_disable_msix+0x3d/0x50 [<ffffffffa00501d5>] ixgbe_reset_interrupt_capability+0x65/0x90 [ixgbe] [<ffffffffa00512f6>] ixgbe_clear_interrupt_scheme+0xb6/0xd0 [ixgbe] [<ffffffffa005330b>] __ixgbe_shutdown+0x5b/0x200 [ixgbe] [<ffffffffa00534ca>] ixgbe_shutdown+0x1a/0x60 [ixgbe] [<ffffffff812f6c7c>] pci_device_shutdown+0x2c/0x50 [<ffffffff813727fb>] device_shutdown+0x4b/0x160 [<ffffffff8107d98c>] kernel_restart_prepare+0x2c/0x40 ehci timer_action, mod_timer io_watchdog [<ffffffff8107d9e6>] kernel_restart+0x16/0x60 [<ffffffff8107dbfd>] sys_reboot+0x1ad/0x200 [<ffffffff811676cf>] ? __d_free+0x3f/0x60 [<ffffffff81167748>] ? d_free+0x58/0x60 [<ffffffff8116f7c0>] ? mntput_no_expire+0x30/0x100 [<ffffffff81152b11>] ? __fput+0x191/0x200 [<ffffffff816565fe>] ? do_page_fault+0x3e/0xa0 [<ffffffff8100b132>] system_call_fastpath+0x16/0x1b Code: 4c 89 ef e8 98 8c e3 ff 4d 39 f4 48 8b 43 10 75 cf 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f c9 c3 49 8b 7d 20 e8 07 5a d3 ff eb c9 <0f> 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 ehci timer_action, mod_timer io_watchdog RIP [<ffffffff81305c2b>] free_msi_irqs+0x11b/0x130 RSP <ffff880185c9bc88> ---[ end trace 27de882a0fe75593 ]--- (This was seen on a pretty old kernel/driver, but looks like the same bug is still possible.) Signed-off-by: <akepner@riverbed.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

The following RCU splat indicates lack of RCU protection: [ 953.267649] =============================== [ 953.267652] [ INFO: suspicious RCU usage. ] [ 953.267657] 3.9.0-0.rc6.git2.4.fc19.ppc64p7 #1 Not tainted [ 953.267661] ------------------------------- [ 953.267664] include/linux/cgroup.h:534 suspicious rcu_dereference_check() usage! [ 953.267669] [ 953.267669] other info that might help us debug this: [ 953.267669] [ 953.267675] [ 953.267675] rcu_scheduler_active = 1, debug_locks = 0 [ 953.267680] 1 lock held by glxgears/1289: [ 953.267683] #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<c00000000027f884>] .prepare_bprm_creds+0x34/0xa0 [ 953.267700] [ 953.267700] stack backtrace: [ 953.267704] Call Trace: [ 953.267709] [c0000001f0d1b6e0] [c000000000016e30] .show_stack+0x130/0x200 (unreliable) [ 953.267717] [c0000001f0d1b7b0] [c0000000001267f8] .lockdep_rcu_suspicious+0x138/0x180 [ 953.267724] [c0000001f0d1b840] [c0000000001d43a4] .perf_event_comm+0x4c4/0x690 [ 953.267731] [c0000001f0d1b950] [c00000000027f6e4] .set_task_comm+0x84/0x1f0 [ 953.267737] [c0000001f0d1b9f0] [c000000000280414] .setup_new_exec+0x94/0x220 [ 953.267744] [c0000001f0d1ba70] [c0000000002f665c] .load_elf_binary+0x58c/0x19b0 ... This commit therefore adds the required RCU read-side critical section to perf_event_comm(). Reported-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: a.p.zijlstra@chello.nl Cc: paulus@samba.org Cc: acme@ghostprotocols.net Link: http://lkml.kernel.org/r/20130419190124.GA8638@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Gustavo Luiz Duarte <gusld@br.ibm.com>

Like all the other drivers, let's initialize the structure a compile time instead of init time. The states #1 and #2 are not enabled by default. The init function will check the features of the board in order to enable the state. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

…failed This bug could be triggered if 1st interface configuration fails: Apr 8 18:20:30 homeserver kernel: usb 5-1: new low-speed USB device number 2 using ohci_hcd Apr 8 18:20:30 homeserver kernel: input: iMON Panel, Knob and Mouse(15c2:0036) as /devices/pci0000:00/0000:00:13.0/usb5/5-1/5-1:1.0/input/input2 Apr 8 18:20:30 homeserver kernel: Registered IR keymap rc-imon-pad Apr 8 18:20:30 homeserver kernel: input: iMON Remote (15c2:0036) as /devices/pci0000:00/0000:00:13.0/usb5/5-1/5-1:1.0/rc/rc0/input3 Apr 8 18:20:30 homeserver kernel: rc0: iMON Remote (15c2:0036) as /devices/pci0000:00/0000:00:13.0/usb5/5-1/5-1:1.0/rc/rc0 Apr 8 18:20:30 homeserver kernel: imon:send_packet: packet tx failed (-32) Apr 8 18:20:30 homeserver kernel: imon 5-1:1.0: remote input dev register failed Apr 8 18:20:30 homeserver kernel: imon 5-1:1.0: imon_init_intf0: rc device setup failed Apr 8 18:20:30 homeserver kernel: imon 5-1:1.0: unable to initialize intf0, err 0 Apr 8 18:20:30 homeserver kernel: imon:imon_probe: failed to initialize context! Apr 8 18:20:30 homeserver kernel: imon 5-1:1.0: unable to register, err -19 Apr 8 18:20:30 homeserver kernel: BUG: unable to handle kernel NULL pointer dereference at 00000014 Apr 8 18:20:30 homeserver kernel: IP: [<c05c4e4c>] mutex_lock+0xc/0x30 Apr 8 18:20:30 homeserver kernel: *pde = 00000000 Apr 8 18:20:30 homeserver kernel: Oops: 0002 [#1] PREEMPT SMP Apr 8 18:20:30 homeserver kernel: Modules linked in: Apr 8 18:20:30 homeserver kernel: Pid: 367, comm: khubd Not tainted 3.8.3-htpc-00002-g79b1403 #23 Unknow Unknow/RS780-SB700 Apr 8 18:20:30 homeserver kernel: EIP: 0060:[<c05c4e4c>] EFLAGS: 00010296 CPU: 1 Apr 8 18:20:30 homeserver kernel: EIP is at mutex_lock+0xc/0x30 Apr 8 18:20:30 homeserver kernel: EAX: 00000014 EBX: 00000014 ECX: 00000000 EDX: f590e480 Apr 8 18:20:30 homeserver kernel: ESI: f5deac00 EDI: f590e480 EBP: f5f3ee00 ESP: f6577c28 Apr 8 18:20:30 homeserver kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Apr 8 18:20:30 homeserver kernel: CR0: 8005003b CR2: 00000014 CR3: 0081b000 CR4: 000007d0 Apr 8 18:20:30 homeserver kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Apr 8 18:20:30 homeserver kernel: DR6: ffff0ff0 DR7: 00000400 Apr 8 18:20:30 homeserver kernel: Process khubd (pid: 367, ti=f6576000 task=f649ea00 task.ti=f6576000) Apr 8 18:20:30 homeserver kernel: Stack: Apr 8 18:20:30 homeserver kernel: 00000000 f5deac00 c0448de4 f59714c0 f5deac64 c03b8ad2 f6577c90 00000004 Apr 8 18:20:30 homeserver kernel: f649ea00 c0205142 f6779820 a1ff7f08 f5deac00 00000001 f5f3ee1c 00000014 Apr 8 18:20:30 homeserver kernel: 00000004 00000202 15c20036 c07a03e8 fffee0ca f6795c00 f5f3ee1c f5deac00 Apr 8 18:20:30 homeserver kernel: Call Trace: Apr 8 18:20:30 homeserver kernel: [<c0448de4>] ? imon_probe+0x494/0xde0 Apr 8 18:20:30 homeserver kernel: [<c03b8ad2>] ? rpm_resume+0xb2/0x4f0 Apr 8 18:20:30 homeserver kernel: [<c0205142>] ? sysfs_addrm_finish+0x12/0x90 Apr 8 18:20:30 homeserver kernel: [<c04170e9>] ? usb_probe_interface+0x169/0x240 Apr 8 18:20:30 homeserver kernel: [<c03b0ca0>] ? __driver_attach+0x80/0x80 Apr 8 18:20:30 homeserver kernel: [<c03b0ca0>] ? __driver_attach+0x80/0x80 Apr 8 18:20:30 homeserver kernel: [<c03b0a94>] ? driver_probe_device+0x54/0x1e0 Apr 8 18:20:30 homeserver kernel: [<c0416abe>] ? usb_device_match+0x4e/0x80 Apr 8 18:20:30 homeserver kernel: [<c03af314>] ? bus_for_each_drv+0x34/0x70 Apr 8 18:20:30 homeserver kernel: [<c03b0a0b>] ? device_attach+0x7b/0x90 Apr 8 18:20:30 homeserver kernel: [<c03b0ca0>] ? __driver_attach+0x80/0x80 Apr 8 18:20:30 homeserver kernel: [<c03b00ff>] ? bus_probe_device+0x5f/0x80 Apr 8 18:20:30 homeserver kernel: [<c03aeab7>] ? device_add+0x567/0x610 Apr 8 18:20:30 homeserver kernel: [<c041a7bc>] ? usb_create_ep_devs+0x7c/0xd0 Apr 8 18:20:30 homeserver kernel: [<c0413837>] ? create_intf_ep_devs+0x47/0x70 Apr 8 18:20:30 homeserver kernel: [<c04156c4>] ? usb_set_configuration+0x454/0x750 Apr 8 18:20:30 homeserver kernel: [<c03b0ca0>] ? __driver_attach+0x80/0x80 Apr 8 18:20:30 homeserver kernel: [<c041de8a>] ? generic_probe+0x2a/0x80 Apr 8 18:20:30 homeserver kernel: [<c03b0ca0>] ? __driver_attach+0x80/0x80 Apr 8 18:20:30 homeserver kernel: [<c0205aff>] ? sysfs_create_link+0xf/0x20 Apr 8 18:20:30 homeserver kernel: [<c04171db>] ? usb_probe_device+0x1b/0x40 Apr 8 18:20:30 homeserver kernel: [<c03b0a94>] ? driver_probe_device+0x54/0x1e0 Apr 8 18:20:30 homeserver kernel: [<c03af314>] ? bus_for_each_drv+0x34/0x70 Apr 8 18:20:30 homeserver kernel: [<c03b0a0b>] ? device_attach+0x7b/0x90 Apr 8 18:20:30 homeserver kernel: [<c03b0ca0>] ? __driver_attach+0x80/0x80 Apr 8 18:20:30 homeserver kernel: [<c03b00ff>] ? bus_probe_device+0x5f/0x80 Apr 8 18:20:30 homeserver kernel: [<c03aeab7>] ? device_add+0x567/0x610 Apr 8 18:20:30 homeserver kernel: [<c040e6df>] ? usb_new_device+0x12f/0x1e0 Apr 8 18:20:30 homeserver kernel: [<c040f4d8>] ? hub_thread+0x458/0x1230 Apr 8 18:20:30 homeserver kernel: [<c015554f>] ? dequeue_task_fair+0x9f/0xc0 Apr 8 18:20:30 homeserver kernel: [<c0131312>] ? release_task+0x1d2/0x330 Apr 8 18:20:30 homeserver kernel: [<c01477b0>] ? abort_exclusive_wait+0x90/0x90 Apr 8 18:20:30 homeserver kernel: [<c040f080>] ? usb_remote_wakeup+0x40/0x40 Apr 8 18:20:30 homeserver kernel: [<c0146ed2>] ? kthread+0x92/0xa0 Apr 8 18:20:30 homeserver kernel: [<c05c7877>] ? ret_from_kernel_thread+0x1b/0x28 Apr 8 18:20:30 homeserver kernel: [<c0146e40>] ? kthread_freezable_should_stop+0x50/0x50 Apr 8 18:20:30 homeserver kernel: Code: 89 04 24 89 f0 e8 05 ff ff ff 8b 5c 24 24 8b 74 24 28 8b 7c 24 2c 8b 6c 24 30 83 c4 34 c3 00 83 ec 08 89 1c 24 89 74 24 04 89 c3 <f0> ff 08 79 05 e8 ca 03 00 00 64 a1 70 d6 80 c0 8b 74 24 04 89 Apr 8 18:20:30 homeserver kernel: EIP: [<c05c4e4c>] mutex_lock+0xc/0x30 SS:ESP 0068:f6577c28 Apr 8 18:20:30 homeserver kernel: CR2: 0000000000000014 Apr 8 18:20:30 homeserver kernel: ---[ end trace df134132c967205c ]--- Signed-off-by: Kevin Baradon <kevin.baradon@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

The job ring init function creates a platform device for each job ring. While the job ring is shutdown, e.g. while caam module removal, its platform device was not being removed. This leads to failure while reinsertion and then removal of caam module second time. The following kernel crash dump appears when caam module is reinserted and then removed again. This patch fixes it. root@p4080ds:~# rmmod caam.ko Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xf94aca18 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=8 P4080 DS Modules linked in: caam(-) qoriq_dbg(O) [last unloaded: caam] NIP: f94aca18 LR: f94aca18 CTR: c029f950 REGS: eac47d60 TRAP: 0300 Tainted: G O (3.8.4-rt2) MSR: 00029002 <CE,EE,ME> CR: 22022484 XER: 20000000 DEAR: 00000008, ESR: 00000000 TASK = e49dfaf0[2110] 'rmmod' THREAD: eac46000 CPU: 1 GPR00: f94ad3f4 eac47e10 e49dfaf0 00000000 00000005 ea2ac210 ffffffff 00000000 GPR08: c286de68 e4977ce0 c029b1c0 00000001 c029f950 10029738 00000000 100e0000 GPR16: 00000000 10023d00 1000cbdc 1000cb8c 1000cbb8 00000000 c07dfecc 00000000 GPR24: c07e0000 00000000 1000cbd8 f94e0000 ffffffff 00000000 ea53cd40 00000000 NIP [f94aca18] caam_reset_hw_jr+0x18/0x1c0 [caam] LR [f94aca18] caam_reset_hw_jr+0x18/0x1c0 [caam] Call Trace: [eac47e10] [eac47e30] 0xeac47e30 (unreliable) [eac47e20] [f94ad3f4] caam_jr_shutdown+0x34/0x220 [caam] [eac47e60] [f94ac0e4] caam_remove+0x54/0xb0 [caam] [eac47e80] [c029fb38] __device_release_driver+0x68/0x120 [eac47e90] [c02a05c8] driver_detach+0xd8/0xe0 [eac47eb0] [c029f8e0] bus_remove_driver+0xa0/0x110 [eac47ed0] [c00768e4] sys_delete_module+0x144/0x270 [eac47f40] [c000e2f0] ret_from_syscall+0x0/0x3c Signed-off-by: Vakul Garg <vakul@freescale.com> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Reviewed-by: Horia Geanta <horia.geanta@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

rebuild_sched_domains() might pass doms with offlined cpu to partition_sched_domains(), which results in an oops: general protection fault: 0000 [#1] SMP ... RIP: 0010:[<ffffffff81077a1e>] [<ffffffff81077a1e>] get_group+0x6e/0x90 ... Call Trace: [<ffffffff8107f07c>] build_sched_domains+0x70c/0xcb0 [<ffffffff8107f2a7>] ? build_sched_domains+0x937/0xcb0 [<ffffffff81173f64>] ? kfree+0xe4/0x1b0 [<ffffffff8107f6e0>] ? partition_sched_domains+0xc0/0x470 [<ffffffff8107f905>] partition_sched_domains+0x2e5/0x470 [<ffffffff8107f6e0>] ? partition_sched_domains+0xc0/0x470 [<ffffffff810c9007>] ? generate_sched_domains+0xc7/0x530 [<ffffffff810c94a8>] rebuild_sched_domains_locked+0x38/0x70 [<ffffffff810cb4a4>] cpuset_write_resmask+0x1a4/0x500 [<ffffffff810c8700>] ? cpuset_mount+0xe0/0xe0 [<ffffffff810c7f50>] ? cpuset_read_u64+0x100/0x100 [<ffffffff810be890>] ? cgroup_iter_next+0x90/0x90 [<ffffffff810cb300>] ? cpuset_css_offline+0x70/0x70 [<ffffffff810c1a73>] cgroup_file_write+0x133/0x2e0 [<ffffffff8118995b>] vfs_write+0xcb/0x130 [<ffffffff8118a174>] sys_write+0x64/0xa0 Reported-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>

em28xx is oopsing with some DVB devices: [10856.061884] general protection fault: 0000 [#1] SMP [10856.067041] Modules linked in: rc_hauppauge em28xx_rc xc5000 drxk em28xx_dvb dvb_core em28xx videobuf2_vmalloc videobuf2_memops videobuf2_core rc_pixelview_new tuner_xc2028 tuner cx8800 cx88xx tveeprom btcx_risc videobuf_dma_sg videobuf_core rc_core v4l2_common videodev ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM be2iscsi iscsi_boot_sysfs iptable_mangle bnx2i cnic uio cxgb4i cxgb4 tun bridge cxgb3i cxgb3 stp ip6t_REJECT mdio libcxgbi nf_conntrack_ipv6 llc nf_defrag_ipv6 ib_iser rdma_cm ib_addr xt_conntrack iw_cm ib_cm ib_sa nf_conntrack ib_mad ib_core bnep bluetooth iscsi_tcp libiscsi_tcp ip6table_filter libiscsi ip6_tables scsi_transport_iscsi xfs libcrc32c snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm tg3 snd_page_alloc snd_timer [10856.139176] snd ptp iTCO_wdt soundcore pps_core iTCO_vendor_support lpc_ich mfd_core coretemp nfsd hp_wmi crc32c_intel microcode serio_raw rfkill sparse_keymap nfs_acl lockd sunrpc kvm_intel kvm uinput binfmt_misc firewire_ohci nouveau mxm_wmi i2c_algo_bit drm_kms_helper firewire_core crc_itu_t ttm drm i2c_core wmi [last unloaded: dib0070] [10856.168969] CPU 1 [10856.170799] Pid: 13606, comm: dvbv5-zap Not tainted 3.9.0-rc5+ #26 Hewlett-Packard HP Z400 Workstation/0AE4h [10856.181187] RIP: 0010:[<ffffffffa0459e47>] [<ffffffffa0459e47>] em28xx_write_regs_req+0x37/0x1c0 [em28xx] [10856.191028] RSP: 0018:ffff880118401a58 EFLAGS: 00010282 [10856.196533] RAX: 00020000012d0000 RBX: ffff88010804aec8 RCX: ffff880118401b14 [10856.203852] RDX: 0000000000000048 RSI: 0000000000000000 RDI: ffff88010804aec8 [10856.211174] RBP: ffff880118401ac8 R08: 0000000000000001 R09: 0000000000000000 [10856.218496] R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000048 [10856.226026] R13: ffff880118401b14 R14: ffff88011752b258 R15: ffff88011752b258 [10856.233352] FS: 00007f26636d2740(0000) GS:ffff88011fc20000(0000) knlGS:0000000000000000 [10856.241626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10856.247565] CR2: 00007f2663716e20 CR3: 00000000c7eb1000 CR4: 00000000000007e0 [10856.254889] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [10856.262215] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [10856.269542] Process dvbv5-zap (pid: 13606, threadinfo ffff880118400000, task ffff8800cd625d40) [10856.278340] Stack: [10856.280564] ffff88011ffe8de8 0000000000000002 0000000000000000 ffff88011ffe9b00 [10856.288191] ffff880118401b14 00ff88011ffe9b08 ffff880100000048 ffffffff8112a52a [10856.295893] 0000000000000001 ffff88010804aec8 0000000000000048 ffff880118401b14 [10856.303521] Call Trace: [10856.306182] [<ffffffff8112a52a>] ? __alloc_pages_nodemask+0x15a/0x960 [10856.312912] [<ffffffffa045a002>] em28xx_write_regs+0x32/0xa0 [em28xx] [10856.319638] [<ffffffffa045a221>] em28xx_write_reg+0x21/0x30 [em28xx] [10856.326279] [<ffffffffa045a2cc>] em28xx_gpio_set+0x9c/0x100 [em28xx] [10856.332919] [<ffffffffa045a3ac>] em28xx_set_mode+0x7c/0x80 [em28xx] [10856.339472] [<ffffffffa03ef032>] em28xx_dvb_bus_ctrl+0x32/0x40 [em28xx_dvb] This is caused by commit c7a45e5, that added support for two I2C buses. A partial fix was applied at 3de09fb, but it doesn't cover all cases, as the DVB core fills fe->dvb->priv with adapter->priv. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

Commit 5800043 (cpufreq: convert cpufreq_driver to using RCU) causes the following call trace to be spit on boot: BUG: sleeping function called from invalid context at /scratch/rafael/work/linux-pm/mm/slab.c:3179 in_atomic(): 0, irqs_disabled(): 0, pid: 292, name: systemd-udevd 2 locks held by systemd-udevd/292: #0: (subsys mutex){+.+.+.}, at: [<ffffffff8146851a>] subsys_interface_register+0x4a/0xe0 #1: (rcu_read_lock){.+.+.+}, at: [<ffffffff81538210>] cpufreq_add_dev_interface+0x60/0x5e0 Pid: 292, comm: systemd-udevd Not tainted 3.9.0-rc8+ #323 Call Trace: [<ffffffff81072c90>] __might_sleep+0x140/0x1f0 [<ffffffff811581c2>] kmem_cache_alloc+0x42/0x2b0 [<ffffffff811e7179>] sysfs_new_dirent+0x59/0x130 [<ffffffff811e63cb>] sysfs_add_file_mode+0x6b/0x110 [<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0 [<ffffffff810a3254>] ? __lock_is_held+0x54/0x80 [<ffffffff811e647d>] sysfs_add_file+0xd/0x10 [<ffffffff811e6541>] sysfs_create_file+0x21/0x30 [<ffffffff81538280>] cpufreq_add_dev_interface+0xd0/0x5e0 [<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0 [<ffffffffa000337f>] ? acpi_processor_get_platform_limit+0x32/0xbb [processor] [<ffffffffa022f540>] ? do_drv_write+0x70/0x70 [acpi_cpufreq] [<ffffffff810a3254>] ? __lock_is_held+0x54/0x80 [<ffffffff8106c97e>] ? up_read+0x1e/0x40 [<ffffffff8106e632>] ? __blocking_notifier_call_chain+0x72/0xc0 [<ffffffff81538dbd>] cpufreq_add_dev+0x62d/0xae0 [<ffffffff815389b8>] ? cpufreq_add_dev+0x228/0xae0 [<ffffffff81468569>] subsys_interface_register+0x99/0xe0 [<ffffffffa014d000>] ? 0xffffffffa014cfff [<ffffffff81535d5d>] cpufreq_register_driver+0x9d/0x200 [<ffffffffa014d000>] ? 0xffffffffa014cfff [<ffffffffa014d0e9>] acpi_cpufreq_init+0xe9/0x1000 [acpi_cpufreq] [<ffffffff810002fa>] do_one_initcall+0x11a/0x170 [<ffffffff810b4b87>] load_module+0x1cf7/0x2920 [<ffffffff81322580>] ? ddebug_proc_open+0xb0/0xb0 [<ffffffff816baee0>] ? retint_restore_args+0xe/0xe [<ffffffff810b5887>] sys_init_module+0xd7/0x120 [<ffffffff816bb6d2>] system_call_fastpath+0x16/0x1b which is quite obvious, because that commit put (multiple instances of) sysfs_create_file() under rcu_read_lock()/rcu_read_unlock(), although sysfs_create_file() may cause memory to be allocated with GFP_KERNEL and that may sleep, which is not permitted in RCU read critical section. Revert the buggy commit altogether along with some changes on top of it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The reason for this patch is crash in kmemdup caused by returning from get_callid with uniialized matchoff and matchlen. Removing Zero check of matchlen since it's done by ct_sip_get_header() BUG: unable to handle kernel paging request at ffff880457b5763f IP: [<ffffffff810df7fc>] kmemdup+0x2e/0x35 PGD 27f6067 PUD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: xt_state xt_helper nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle xt_connmark xt_conntrack ip6_tables nf_conntrack_ftp ip_vs_ftp nf_nat xt_tcpudp iptable_mangle xt_mark ip_tables x_tables ip_vs_rr ip_vs_lblcr ip_vs_pe_sip ip_vs nf_conntrack_sip nf_conntrack bonding igb i2c_algo_bit i2c_core CPU 5 Pid: 0, comm: swapper/5 Not tainted 3.9.0-rc5+ #5 /S1200KP RIP: 0010:[<ffffffff810df7fc>] [<ffffffff810df7fc>] kmemdup+0x2e/0x35 RSP: 0018:ffff8803fea03648 EFLAGS: 00010282 RAX: ffff8803d61063e0 RBX: 0000000000000003 RCX: 0000000000000003 RDX: 0000000000000003 RSI: ffff880457b5763f RDI: ffff8803d61063e0 RBP: ffff8803fea03658 R08: 0000000000000008 R09: 0000000000000011 R10: 0000000000000011 R11: 00ffffffff81a8a3 R12: ffff880457b5763f R13: ffff8803d67f786a R14: ffff8803fea03730 R15: ffffffffa0098e90 FS: 0000000000000000(0000) GS:ffff8803fea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff880457b5763f CR3: 0000000001a0c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/5 (pid: 0, threadinfo ffff8803ee18c000, task ffff8803ee18a480) Stack: ffff8803d822a080 000000000000001c ffff8803fea036c8 ffffffffa000937a ffffffff81f0d8a0 000000038135fdd5 ffff880300000014 ffff880300110000 ffffffff150118ac ffff8803d7e8a000 ffff88031e0118ac 0000000000000000 Call Trace: <IRQ> [<ffffffffa000937a>] ip_vs_sip_fill_param+0x13a/0x187 [ip_vs_pe_sip] [<ffffffffa007b209>] ip_vs_sched_persist+0x2c6/0x9c3 [ip_vs] [<ffffffff8107dc53>] ? __lock_acquire+0x677/0x1697 [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d [<ffffffff810649bc>] ? sched_clock_cpu+0x43/0xcf [<ffffffffa007bb1e>] ip_vs_schedule+0x181/0x4ba [ip_vs] ... Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>

…p_pci_atmu In patch 34642bb (powerpc/fsl-pci: Keep PCI SoC controller registers in pci_controller) we choose to keep the map of the PCI SoC controller registers. But we missed to delete the unmap in setup_pci_atmu function. This will cause the following call trace once we access the PCI SoC controller registers later. Unable to handle kernel paging request for data at address 0x8000080080040f14 Faulting instruction address: 0xc00000000002ea58 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=24 T4240 QDS Modules linked in: NIP: c00000000002ea58 LR: c00000000002eaf4 CTR: c00000000002eac0 REGS: c00000017e10b4a0 TRAP: 0300 Not tainted (3.9.0-rc1-00052-gfa3529f-dirty) MSR: 0000000080029000 <CE,EE,ME> CR: 28adbe22 XER: 00000000 SOFTE: 0 DEAR: 8000080080040f14, ESR: 0000000000000000 TASK = c00000017e100000[1] 'swapper/0' THREAD: c00000017e108000 CPU: 2 GPR00: 0000000000000000 c00000017e10b720 c0000000009928d8 c00000017e578e00 GPR04: 0000000000000000 000000000000000c 0000000000000001 c00000017e10bb40 GPR08: 0000000000000000 8000080080040000 0000000000000000 0000000000000016 GPR12: 0000000088adbe22 c00000000fffa800 c000000000001ba0 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c0000000008a5b70 GPR24: c0000000008af938 c0000000009a28d8 c0000000009bb5dc c00000017e10bb40 GPR28: c00000017e32a400 c00000017e10bc00 c00000017e32a400 c00000017e578e00 NIP [c00000000002ea58] .fsl_pcie_check_link+0x88/0xf0 LR [c00000000002eaf4] .fsl_indirect_read_config+0x34/0xb0 Call Trace: [c00000017e10b720] [c00000017e10b7a0] 0xc00000017e10b7a0 (unreliable) [c00000017e10ba30] [c00000000002eaf4] .fsl_indirect_read_config+0x34/0xb0 [c00000017e10bad0] [c00000000033aa08] .pci_bus_read_config_byte+0x88/0xd0 [c00000017e10bb90] [c00000000088d708] .pci_apply_final_quirks+0x9c/0x18c [c00000017e10bc40] [c0000000000013dc] .do_one_initcall+0x5c/0x1f0 [c00000017e10bcf0] [c00000000086ebac] .kernel_init_freeable+0x180/0x26c [c00000017e10bdb0] [c000000000001bbc] .kernel_init+0x1c/0x460 [c00000017e10be30] [c000000000000880] .ret_from_kernel_thread+0x64/0xe4 Instruction dump: 38210310 2b800015 4fdde842 7c600026 5463fffe e8010010 7c0803a6 4e800020 60000000 60000000 e92301d0 7c0004ac <80690f14> 0c030000 4c00012c 38210310 ---[ end trace 7a8fe0cbccb7d992 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Signed-off-by: Kevin Hao <haokexin@gmail.com> Acked-by: Roy Zang <tie-fei.zang@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

When hot removing memory presented at boot time, following messages are shown: kernel BUG at mm/slub.c:3409! invalid opcode: 0000 [#1] SMP Modules linked in: ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc ipmi_devintf ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core e1000e ptp pps_core tpm_infineon ioatdma dca sr_mod cdrom sd_mod crc_t10dif usb_storage megaraid_sas lpfc scsi_transport_fc scsi_tgt scsi_mod CPU 0 Pid: 5091, comm: kworker/0:2 Tainted: G W 3.9.0-rc6+ #15 RIP: kfree+0x232/0x240 Process kworker/0:2 (pid: 5091, threadinfo ffff88084678c000, task ffff88083928ca80) Call Trace: __release_region+0xd4/0xe0 __remove_pages+0x52/0x110 arch_remove_memory+0x89/0xd0 remove_memory+0xc4/0x100 acpi_memory_device_remove+0x6d/0xb1 acpi_device_remove+0x89/0xab __device_release_driver+0x7c/0xf0 device_release_driver+0x2f/0x50 acpi_bus_device_detach+0x6c/0x70 acpi_ns_walk_namespace+0x11a/0x250 acpi_walk_namespace+0xee/0x137 acpi_bus_trim+0x33/0x7a acpi_bus_hot_remove_device+0xc4/0x1a1 acpi_os_execute_deferred+0x27/0x34 process_one_work+0x1f7/0x590 worker_thread+0x11a/0x370 kthread+0xee/0x100 ret_from_fork+0x7c/0xb0 RIP [<ffffffff811c41d2>] kfree+0x232/0x240 RSP <ffff88084678d968> The reason why the messages are shown is to release a resource structure, allocated by bootmem, by kfree(). So when we release a resource structure, we should check whether it is allocated by bootmem or not. But even if we know a resource structure is allocated by bootmem, we cannot release it since SLxB cannot treat it. So for reusing a resource structure, this patch remembers it by using bootmem_resource as follows: When releasing a resource structure by free_resource(), free_resource() checks whether the resource structure is allocated by bootmem or not. If it is allocated by bootmem, free_resource() adds it to bootmem_resource. If it is not allocated by bootmem, free_resource() release it by kfree(). And when getting a new resource structure by get_resource(), get_resource() checks whether bootmem_resource has released resource structures or not. If there is a released resource structure, get_resource() returns it. If there is not a releaed resource structure, get_resource() returns new resource structure allocated by kzalloc(). [akpm@linux-foundation.org: s/get_resource/alloc_resource/] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

We use kmem_cache_alloc() to allocate memory to hold the new firmware which will be flashed. kmem_cache_alloc() calls rtas_block_ctor() to set memory to NULL. But these constructor is called only for newly allocated slabs. If we run below command multiple time without rebooting, allocator may allocate memory from the area which was free'd by kmem_cache_free and it will not call constructor. In this situation we may hit kernel oops. dd if=<fw image> of=/proc/ppc64/rtas/firmware_flash bs=4096 oops message: ------------- [ 1602.399755] Oops: Kernel access of bad area, sig: 11 [#1] [ 1602.399772] SMP NR_CPUS=1024 NUMA pSeries [ 1602.399779] Modules linked in: rtas_flash nfsd lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod sg ipv6 ses enclosure ehea ehci_pci ohci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ipr libata scsi_mod [ 1602.399817] NIP: d00000000a170b9c LR: d00000000a170b64 CTR: c00000000079cd58 [ 1602.399823] REGS: c0000003b9937930 TRAP: 0300 Not tainted (3.9.0-rc4-0.27-ppc64) [ 1602.399828] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 22000428 XER: 20000000 [ 1602.399841] SOFTE: 1 [ 1602.399844] CFAR: c000000000005f24 [ 1602.399848] DAR: 8c2625a820631fef, DSISR: 40000000 [ 1602.399852] TASK = c0000003b4520760[3655] 'dd' THREAD: c0000003b9934000 CPU: 3 GPR00: 8c2625a820631fe7 c0000003b9937bb0 d00000000a179f28 d00000000a171f08 GPR04: 0000000010040000 0000000000001000 c0000003b9937df0 c0000003b5fb2080 GPR08: c0000003b58f7200 d00000000a179f28 c0000003b40058d4 c00000000079cd58 GPR12: d00000000a171450 c000000007f40900 0000000000000005 0000000010178d20 GPR16: 00000000100cb9d8 000000000000001d 0000000000000000 000000001003ffff GPR20: 0000000000000001 0000000000000000 00003fffa0b50d30 000000001001f010 GPR24: 0000000010020888 0000000010040000 d00000000a171f08 d00000000a172808 GPR28: 0000000000001000 0000000010040000 c0000003b4005880 8c2625a820631fe7 [ 1602.399924] NIP [d00000000a170b9c] .rtas_flash_write+0x7c/0x1e8 [rtas_flash] [ 1602.399930] LR [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash] [ 1602.399934] Call Trace: [ 1602.399939] [c0000003b9937bb0] [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash] (unreliable) [ 1602.399948] [c0000003b9937c60] [c000000000282830] .proc_reg_write+0x90/0xe0 [ 1602.399955] [c0000003b9937ce0] [c0000000001ff374] .vfs_write+0x114/0x238 [ 1602.399961] [c0000003b9937d80] [c0000000001ff5d8] .SyS_write+0x70/0xe8 [ 1602.399968] [c0000003b9937e30] [c000000000009cdc] syscall_exit+0x0/0xa0 [ 1602.399973] Instruction dump: [ 1602.399977] eb698010 801b0028 2f80dcd6 419e00a4 2fbc0000 419e009c ebfb0030 2fbf0000 [ 1602.399989] 409e0010 480000d8 60000000 7c1f0378 <e81f0008> 2fa00000 409efff4 e81f0000 [ 1602.400012] ---[ end trace b4136d115dc31dac ]--- [ 1602.402178] [ 1602.402185] Sending IPI to other CPUs [ 1602.403329] IPI complete This patch uses kmem_cache_zalloc() instead of kmem_cache_alloc() to allocate memory, which makes sure memory is set to 0 before using. Also removes rtas_block_ctor(), which is no longer required. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Seiji reports hitting the following crash when erasing pstore dump variables, BUG: unable to handle kernel NULL pointer dereference at 0000000000000fa4 IP: [<ffffffff8142dadf>] __efivar_entry_iter+0x2f/0x120 PGD 18482a067 PUD 190724067 PMD 0 Oops: 0000 [#1] SMP [...] Call Trace: [<ffffffff8143001f>] efi_pstore_erase+0xdf/0x130 [<ffffffff81200038>] ? cap_socket_create+0x8/0x10 [<ffffffff811ea491>] pstore_unlink+0x41/0x60 [<ffffffff811741ff>] vfs_unlink+0x9f/0x110 [<ffffffff8117813b>] do_unlinkat+0x18b/0x280 [<ffffffff81178472>] sys_unlinkat+0x22/0x40 [<ffffffff81542402>] system_call_fastpath+0x16/0x1b 'entry' needs to be initialised in efi_pstore_erase() when iterating with __efivar_entry_iter(), otherwise the garbage pointer will be dereferenced, leading to crashes like the above. Reported-by: Seiji Aguchi <seiji.aguchi@hds.com> Tested-by: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>

We need to remove the entry from the EFI variable list before we erase it from the variable store and free the associated state, otherwise it's possible to hit the following crash, BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8142ea0f>] __efivar_entry_iter+0xcf/0x120 PGD 19483f067 PUD 195426067 PMD 0 Oops: 0000 [#1] SMP [...] Call Trace: [<ffffffff81430ebf>] efi_pstore_erase+0xef/0x140 [<ffffffff81003138>] ? math_error+0x288/0x2d0 [<ffffffff811ea491>] pstore_unlink+0x41/0x60 [<ffffffff811741ff>] vfs_unlink+0x9f/0x110 [<ffffffff8117813b>] do_unlinkat+0x18b/0x280 [<ffffffff8116d7e6>] ? sys_newfstatat+0x36/0x50 [<ffffffff81178472>] sys_unlinkat+0x22/0x40 [<ffffffff81543282>] system_call_fastpath+0x16/0x1b Reported-by: Seiji Aguchi <seiji.aguchi@hds.com> Tested-by: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>

torrorosso added 3 commits February 18, 2014 20:58

GCC4.8 fix

07de4a5

GCC4.8 fix

1d31395

1d31395

e606bb3

07de4a5

4868448

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcc4.8 linaro fix#1

gcc4.8 linaro fix#1
torrorosso wants to merge 4 commits intoM1cha:kk_2.7from
torrorosso:kk_2.7

torrorosso commented Feb 18, 2014

Uh oh!

M1cha commented Feb 18, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

torrorosso commented Feb 18, 2014

Uh oh!

M1cha commented Feb 18, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants