OpenMPI源碼剖析3:
阿新 • • 發佈:2018-04-09
request 遠程 tween list for cor ocs 疑問 得到
接著上一篇的疑問,我們說道,會執行 try_kill_peers 函數,它的函數定義在 ompi_mpi_abort.c 下:
// 這裏註釋也說到了,主要是殺死在同一個communicator的進程(不包括自己) /* * Local helper function to build an array of all the procs in a * communicator, excluding this process. * * Killing a just the indicated peers must be implemented for * MPI_Abort() to work according to the standard language for * a ‘high-quality‘ implementation. * * It would be nifty if we could differentiate between the * abort scenarios (but we don‘t, currently): * - MPI_Abort() * - MPI_ERRORS_ARE_FATAL * - Victim of MPI_Abort() */ // 調用時傳入了對應通信子 static void try_kill_peers(ompi_communicator_t *comm, int errcode) { // 1. 第一部分: 給 ompi_process_name_t 指針申請空間,得到進程個數 int nprocs; ompi_process_name_t *procs; nprocs = ompi_comm_size(comm); /* ompi_comm_remote_size() returns 0 if not an intercomm, so this is safe */ nprocs += ompi_comm_remote_size(comm); procs = (ompi_process_name_t*) calloc(nprocs, sizeof(ompi_process_name_t)); if (NULL == procs) { /* quick clean orte and get out */ ompi_rte_abort(errno, "Abort: unable to alloc memory to kill procs"); } // 2. 第二部分: 將進程放入數組中 /* put all the local group procs in the abort list */ int rank, i, count; rank = ompi_comm_rank(comm); //這裏可以獲取到自己在該 communicator 中的 rank————疑問1 for (count = i = 0; i < ompi_comm_size(comm); ++i) { if (rank == i) { /* Don‘t include this process in the array */ --nprocs; } else { assert(count <= nprocs); procs[count++] = *OMPI_CAST_RTE_NAME(&ompi_group_get_proc_ptr(comm->c_remote_group, i, true)->super.proc_name); } } // 3. 第三部分: 遠程的 group 進程也放入數組中 /* if requested, kill off remote group procs too */ for (i = 0; i < ompi_comm_remote_size(comm); ++i) { assert(count <= nprocs); procs[count++] = *OMPI_CAST_RTE_NAME(&ompi_group_get_proc_ptr(comm->c_remote_group, i, true)->super.proc_name); } // 4. 第四部分: 殺死進程 if (nprocs > 0) { ompi_rte_abort_peers(procs, nprocs, errcode); } /* We could fall through here if ompi_rte_abort_peers() fails, or if (nprocs == 0). Either way, tidy up and let the caller handle it. */ free(procs); }
這個時候,就得去看看 ompi_rte_abort_peers(procs, nprocs, errcode) 函數的定義,
OpenMPI源碼剖析3: