Add support for checking all living bthreads#3096
Conversation
|
This pr will be useful to help diagnose bthread problem like bthread deadlock, which is hard or unrealistic by using gdb only. |
|
Maybe you can try to get all bthread_id from the |
I agree with the |
|
yes, I think list all bthread_id is better。 |
@wwbmmm @chenBright Yes I have thought of this approach but with no result. Firstly, when we create a bthread, a ResourceId and TaskMeta instance is acquired from butil::ResourcePool by this way: Secondly, if we add some interface in ResourcePool to record the slot when get_resource and consider removing it when return_resource, certainly we might use thread local LocalPool to record for the sake of performance effect. The problem is the get_resource and return_resource may run in different pthread (different tls) as bthread switches to another worker. What's more, if we plan to get all living bthread id by summarizing all thread local records, the tricky part is the ResourcePool interface can be called from worker and non worker, making it even impossible to summarize. Any ideas? |
That's huge. And they represent all bthread id which ever existed,not living one? |
Maybe you can use TaskStatus to determine whether a bthread is alive. |
|
I think we can add a function to display all the live bthread ids and names, and click on the link to display the corresponding bthread details. |
No. After a bthread exit, its TaskMeta will be return to the ResourcePool, and be reused by new bthread. When you traverse the ResourcePool, you only need to traverse those slots in use, you don't need to traverse those in the free list. |
How to judge if the slot is in use? @chenBright The TaskStatus is only used when TaskTracer is enabled, but I want this pr to be applicable even when TaskTracer is not enabled. If I can judge if the slot is in use, then I need to traverse in the same way as |
If the TaskTracer is enabled, we can easily see call trace of all the living bthread and debug the deadlock problem, just like the same way as gdb/gstack does. And in normal situation, the living bthreads won't be many. Is it necessary then? |
Yes. Because it allows users to view the call stack of a specified bthread, not all of them. |
Perhaps the default support for TaskStatus can meet your needs. |
Currently the TaskStatus is only set when TaskTracer is enabled. Do u mean I need to change the code to set some status when TaskTracer is not enabled? Like set status to TASK_STATUS_CREATED when create a bthread and set it status to TASK_STATUS_UNKNOWN when that bthread is destroyed, and a bthread can be judged alive when its status is not TASK_STATUS_UNKNOWN? |
I can add link to display the corresponding bthread details. And still display all living bthreads by default. The bthread name will not be shown and I can make another pr to add support naming bthread (and even execution_queue). Is that ok? |
Yes. |
No problem. |
|
@wwbmmm @chenBright PTAL. The pr is updated and applicable no matter if BRPC_BTHREAD_TRACER is defined, with no performance side effect. Thanks for the suggestion. As for the idea to |
There's a trick here. As each task group creates a TaskMeta object internally to run main task, these TaskMeta will also be traversed. I will do a filter to not show those TaskMeta as they are opaque to user. |
4106c6f to
252a2d2
Compare
|
The comments have all been resolved. But there's one problem left, that's when the bthread status is set to TASK_STATUS_READY, the actual status is set to TASK_STATUS_FIRST_READY. brpc/src/bthread/task_tracer.cpp Lines 150 to 154 in 0708333 so the following judgement will never meet? brpc/src/bthread/task_tracer.cpp Lines 257 to 259 in 0708333 According to my test, there seems to always exist a bthread in TASK_STATUS_FIRST_READY status and not traceable. This bthread has flag 320 which means "BTHREAD_NEVER_QUIT | BTHREAD_GLOBAL_PRIORITY" , which seems to be EventDispatcher? Is this expected? |
It should be |
Fixed! BTW fix another _enable_priority_queue not initialized bug. 😂 |
|
@wwbmmm @chenBright PTAL |
|
LGTM |
User can check all living bthreads by `curl ip:port/bthreads/all` or when BRPC_BTHREAD_TRACER is enabled by `curl ip:port/bthreads/all?st=1` to show bthread stack trace. This is an enhancement of the original /bthreads service which provides a method to check a specified bthread by designated bthread id, as user has no idea what the bthread id is. BTW, fix _enable_priority_queue not initialized bug and fix task status incorrectly set to TASK_STATUS_FIRST_READY bug.


User can check all living bthreads by
curl ip:port/bthreads/allorcurl ip:port/bthreads/all?st=1to show bthread stack trace. This is an enhancement of the original /bthreads service which provides a method to check a specified bthread by designated bthread id, as user has no idea what the bthread id is.Condisering the performance cost brought by recording the bthread id on bthread startup and finish, currently this function is only enabled when BRPC_BTHREAD_TRACER is defined.What problem does this PR solve?
brpc kindly provides a bthreads_service to check a specified thread by curl ip:port/bthreads/<bthread_id>. The problem is that we have no idea what the <bthread_id> is as it is generated by code, which makes this service useless.
Issue Number:
#3088
The sample output with stack trace (note that bthread in jumping status is not displayed due to implementation restriction, so it may not show all bthreads) :
Problem Summary:
What is changed and the side effects?
Changed:
Side effects:
No side effect now
Check List: