hung_task_timeout_secs( topic related to linux kernel hangup )

ABOUT hung_task_timeout_secs

if a task(process) is hung then hung_task_timeout_secs value decides if the hung task needs no reboot or reboot after n seconds

LINUX KERNEL RELATED PARAMETER
[bash light=”true”]
$cat /proc/sys/kernel/hung_task_timeout_secs
120
$

$echo 0 | sudo tee –append /proc/sys/kernel/hung_task_timeout_secs
0
$sudo cat /proc/sys/kernel/hung_task_timeout_secs
0
$
[/bash]

When a task in D state did not get scheduled for more than this value report a warning.
This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.

0: means infinite timeout - no checking done. Possible values to set are in range {0..LONG_MAX/HZ}.

PARAMETER RELATED
[bash light=”true”]
TEST-MAIL1 ~ #dmesg
[cut]
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm D ffff88107f472c40 0 16705 22512 0x00000000
ffff881014693810 0000000000000086 ffff881000000000 ffff88102013b040
0000000000012c40 ffff880471855fd8 0000000000012c40 ffff880471854010
ffff880471855fd8 0000000000012c40 ffff881017ff8e40 0000000100000000
Call Trace:
[<ffffffff8148d45d>] ? schedule_timeout+0x1ed/0x2d0
[<ffffffffa0b7d1ea>] ? dlmlock+0x8a/0xda0 [ocfs2_dlm]
[<ffffffff8148ce5c>] ? wait_for_common+0x12c/0x1a0
[<ffffffff81052230>] ? try_to_wake_up+0x280/0x280
[<ffffffffa0a3b9c0>] ? __ocfs2_cluster_lock+0x1f0/0x780 [ocfs2]
[<ffffffff8148ce80>] ? wait_for_common+0x150/0x1a0
[<ffffffffa0a9c6bc>] ? ocfs2_buffer_cached+0x8c/0x180 [ocfs2]
[<ffffffffa0a40bc6>] ? ocfs2_inode_lock_full_nested+0x126/0x540 [ocfs2]
[<ffffffffa0a5922e>] ? ocfs2_lookup_lock_orphan_dir+0x6e/0x1b0 [ocfs2]
[<ffffffffa0a5922e>] ? ocfs2_lookup_lock_orphan_dir+0x6e/0x1b0 [ocfs2]
[<ffffffffa0a5ba1a>] ? ocfs2_prepare_orphan_dir+0x4a/0x290 [ocfs2]
[<ffffffffa0a5e621>] ? ocfs2_unlink+0x6e1/0xbb0 [ocfs2]
[<ffffffff811bcfea>] ? may_link+0xda/0x170
[<ffffffff81141c8e>] ? vfs_unlink+0x9e/0x100
[<ffffffff81145881>] ? do_unlinkat+0x1a1/0x1d0
[<ffffffff81147b00>] ? vfs_readdir+0xa0/0xe0
[<ffffffff8116fedb>] ? fsnotify_find_inode_mark+0x2b/0x40
[<ffffffff81170c24>] ? dnotify_flush+0x54/0x110
[<ffffffff81133eec>] ? filp_close+0x5c/0x90
[<ffffffff81496912>] ? system_call_fastpath+0x16/0x1b
[/bash]

CLASSROOM

While  waiting for  read()  or write()  to/from  a file  descriptor return, the process  will be put in a
special  kind of sleep, known as "D"  or "Disk Sleep". This  is special, because  the process can
not  be killed  or interrupted  while in  such a  state.  A process waiting for  a return from  ioctl()
would also  be put to  sleep in this manner.

RELATED SOURCE CODE EXPOSURE
[c]
/*
* Ok, the task did not get scheduled for more than 2 minutes,
* complain:
*/
if (sysctl_hung_task_warnings) {
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings–;
pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
t->comm, t->pid, timeout);
pr_err(" %s %s %.*s%s\n",
print_tainted(), init_utsname()->release,
(int)strcspn(init_utsname()->version, " "),
init_utsname()->version,
LINUX_PACKAGE_ID);
pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
" disables this message.\n");
sched_show_task(t);
hung_task_show_lock = true;
}
[/c]

[c light=”true”]
/*
* Process updating of timeout sysctl
*/
int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
void __user *buffer,
size_t *lenp, loff_t *ppos)
{
int ret;

ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);

if (ret || !write)
goto out;

wake_up_process(watchdog_task);

out:
return ret;
}
[/c]
SOURCE CODE TAKEN FROM OFFICIAL LINUX KERNEL

RELATED FROM RESEARCH PAPER

Kernel  data collection  tools. Several  monitoring  facilities are provided by  the Linux  kernel,
which have  been exploited  in this work. In  particular, we use  KProbes which inserts  breakpoints
in arbitrary binary code locations in charge of triggering user-defined handler  functions. Handlers
can  be used  to collect  information about internal kernel  variables; subsequently, kernel execution
is restored. Kdump is a tool  for failure data collection based on the execution of  a secondary kernel,
namely capture kernel,  which is preliminarily  loaded  into  a  reserved memory  region.  When  the
primary kernel fails, the capture  kernel is executed; then, it can collect failure  data by reading
the main memory  state.  Built-in hang  detection mechanisms. Several  hang detection  mechanisms are
available in the Linux OS,  which can be enabled by recompiling the kernel.  In particular, the  following
facilities  can be  used for hang  detection: Soft  lockup detection,  i.e., the  kernel detects
whether a  "canary" task  is not scheduled  within a  timeout; Hard lockup detection, i.e.,  if any CPU in
the  system does not handles local    timer    interrupt   for    longer    than   a    timeout;
Sleep-inside-spinlock   checking,  i.e.,  assertions   that  verify whether there are spinlocks  that have 
been acquired before calling a  sleeping function  (i.e., a  function during  which  the current
thread may block and be preempted by the scheduler); Checks on lock API  usage, that  is: missing  lock 
initialization,  release  of an already freed lock, release of a  lock by a thread or CPU different
from the lock holder, lock data structure corruption.

source : http://tinyurl.com/7pt5j9a

Assessment and Improvement of Hang Detection in the Linux Operating System
2009 28th IEEE International Symposium on Reliable Distributed System

LINKS
https://access.redhat.com/solutions/60572
https://www.linuxquestions.org/questions/linux-software-2/kernel-panic-echo-0-proc-sys-kernel-4175629199/
https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
https://stackoverflow.com/questions/84882/sudo-echo-something-etc-privilegedfile-doesnt-work-is-there-an-alterna
https://www.tldp.org/LDP/tlk/kernel/processes.html
https://www.nico.schottelius.org/blog/reboot-linux-if-task-blocked-for-more-than-n-seconds/
http://stackoverflow.com/questions/1475683/linux-process-states

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s