RCore-伤齿龙

#os #proj #rust

背景知识

协程

我对进程和线程的概念有一定了解，但没了解过协程的概念。协程可以看作是用户态的线程，由调用方去控制。

找到较好的参考文章：
协程简介

有栈与无栈协程

协程运行在线程上，本质是单线程。在异步的场景下（比如大量的 IO），协程通过主动让出线程资源，来达到异步。这就是为什么叫做“协程”，有别与线程的抢占式执行。其实函数调用就是一种特殊的协程，只不过这个协程在最后才切换出去。

具体实现

时间不是很够了，不可能深入所有代码，只能大概捋顺思路

进程管理基础设施

我们的目标是可以通过 fork 和 exec 来管理进程，完成了以下基础：

基于应用名的链接和加载

看脚本和汇编代码，还有loader.rs

进程标识符

1	`pub struct PidHandle(pub usize);`

其设计和FrameTracker很像，我们同样有：

struct PidAllocator {
    current: usize,
    recycled: Vec<usize>,
}

用来管理 PID 的分配回收。

之前我们用 app_id 来管理各个应用在内核空间的内核栈，现在可以看成将 app_id 变成 PID。

1
2
3

pub struct KernelStack {
    pid: usize,
}

添加 KernelStack 方便管理对应内核栈，同时该结构提供了获取栈顶和入栈的方法。在创建一个内核栈时，会顺便完成虚实的映射。在内核栈生命周期结束后，会自动使用drop方法，通过移除逻辑段方法 remove_area_with_start_vpn 来释放空间

进程控制块

PCB 用于管理一个进程：

pub struct TaskControlBlock {
    // immutable
    pub pid: PidHandle,
    pub kernel_stack: KernelStack,
    // mutable
    inner: UPSafeCell<TaskControlBlockInner>,
}

pub struct TaskControlBlockInner {
    pub trap_cx_ppn: PhysPageNum,
    pub base_size: usize,
    pub task_cx: TaskContext,
    pub task_status: TaskStatus,
    pub memory_set: MemorySet,
    pub parent: Option<Weak<TaskControlBlock>>,
    pub children: Vec<Arc<TaskControlBlock>>,
    pub exit_code: i32,
}

parent 指向当前进程的父进程（如果存在的话）。注意我们使用 Weak 而非 Arc 来包裹另一个任务控制块，因此这个智能指针将不会影响父进程的引用计数。

children 则将当前进程的所有子进程的任务控制块以 Arc 智能指针的形式保存在一个向量中，这样才能够更方便的找到它们。

trap_cx_ppn 指出了应用地址空间中的 Trap 上下文（详见第四章）被放在的物理页帧的物理页号。

base_size 的含义是：应用数据仅有可能出现在应用地址空间低于 base_size 字节的区域中。借助它我们可以清楚的知道应用有多少数据驻留在内存中。

task_cx 将暂停的任务的任务上下文保存在任务控制块中。

我们还给 PCB 实现以下方法来直接管理进程：

new（创建进程，只用于 initproc）
exec 加载并执行另一个可执行文件
fork 复制一个几乎相同的子进程

任务管理器

1
2
3

pub struct TaskManager {
    ready_queue: VecDeque<Arc<TaskControlBlock>>,
}

维护一个双端队列，负责取出/放入任务（PCB）。使用引用来减少数据移动开销。

处理器管理

为了方便多核运行的环境，抽象出处理器这个结构

pub struct Processor {
    current: Option<Arc<TaskControlBlock>>,
    idle_task_cx: TaskContext,
}

idle_task_cx 表示当前处理器上的 idle 控制流的任务上下文。

其提供了查看/换入/换出任务的方法。

进程切换

pub fn run_tasks() {
    loop {
        let mut processor = PROCESSOR.exclusive_access();
        if let Some(task) = fetch_task() {
            let idle_task_cx_ptr = processor.get_idle_task_cx_ptr();
            // access coming task TCB exclusively
            let mut task_inner = task.inner_exclusive_access();
            let next_task_cx_ptr = &task_inner.task_cx as *const TaskContext;
            task_inner.task_status = TaskStatus::Running;
            // stop exclusively accessing coming task TCB manually
            drop(task_inner);
            processor.current = Some(task);
            // stop exclusively accessing processor manually
            drop(processor);
            unsafe {
                __switch(
                    idle_task_cx_ptr,
                    next_task_cx_ptr,
                );
            }
        }
    }
}

一个比较有意思的设计是：这里在切换时，会先切换到 idle 控制流，再从 idle 控制流切换到下一个任务。这样设计的目的是使的进程的切换对每一个进程的内核栈都是透明的：

使得换入/换出进程和调度执行流在内核层各自执行在不同的内核栈上，分别是进程自身的内核栈和内核初始化时使用的启动栈。这样的话，调度相关的数据不会出现在进程内核栈上，也使得调度机制对于换出进程的Trap执行流是不可见的，它在决定换出的时候只需调用schedule而无需操心调度的事情。从而各执行流的分工更加明确了，虽然带来了更大的开销。

pub fn schedule(switched_task_cx_ptr: *mut TaskContext) {
    let mut processor = PROCESSOR.exclusive_access();
    let idle_task_cx_ptr = processor.get_idle_task_cx_ptr();
    drop(processor);
    unsafe {
        __switch(
            switched_task_cx_ptr,
            idle_task_cx_ptr,
        );
    }
}

所以进程切换的流程如下：

时间耗尽/yield，陷入内核态
内核调用 schedule，切换到 idle 流
idle 流从上次 switch 后的地方继续执行：即再一次 run_task
idle 流切换到下一个进程，下一个进程会从内核态恢复到用户态继续执行。

进程管理机制

创建初始进程

// impl TaskControlBlock
pub fn new(elf_data: &[u8]) -> Self {
    // memory_set with elf program headers/trampoline/trap context/user stack
    let (memory_set, user_sp, entry_point) = MemorySet::from_elf(elf_data);
    let trap_cx_ppn = memory_set
        .translate(VirtAddr::from(TRAP_CONTEXT).into())
        .unwrap()
        .ppn();
    // alloc a pid and a kernel stack in kernel space
    let pid_handle = pid_alloc();
    let kernel_stack = KernelStack::new(&pid_handle);
    let kernel_stack_top = kernel_stack.get_top();
    // push a task context which goes to trap_return to the top of kernel stack
    let task_control_block = Self {
        pid: pid_handle,
        kernel_stack,
        inner: unsafe { UPSafeCell::new(TaskControlBlockInner {
            trap_cx_ppn,
            base_size: user_sp,
            task_cx: TaskContext::goto_trap_return(kernel_stack_top),
            task_status: TaskStatus::Ready,
            memory_set,
            parent: None,
            children: Vec::new(),
            exit_code: 0,
        })},
    };
    // prepare TrapContext in user space
    let trap_cx = task_control_block.inner_exclusive_access().get_trap_cx();
    *trap_cx = TrapContext::app_init_context(
        entry_point,
        user_sp,
        KERNEL_SPACE.exclusive_access().token(),
        kernel_stack_top,
        trap_handler as usize,
    );
    task_control_block
}

解析 initproc 的初始文件，创建用户地址空间
从地址空间中得到存放 TrapContext 的地方
分配 PID，并根据 PID 获得对应的内核栈、及其栈顶的位置
初始化 PCB
初始化用户空间的 TrapContext（设置好入口等）

进程调度

前一部分已经讲了大概流程，再看看 schedule 前的接口：

pub fn suspend_current_and_run_next() {
    // There must be an application running.
    let task = take_current_task().unwrap();

    // ---- access current TCB exclusively
    let mut task_inner = task.inner_exclusive_access();
    let task_cx_ptr = &mut task_inner.task_cx as *mut TaskContext;
    // Change status to Ready
    task_inner.task_status = TaskStatus::Ready;
    drop(task_inner);
    // ---- stop exclusively accessing current PCB

    // push back to ready queue.
    add_task(task);
    // jump to scheduling cycle
    schedule(task_cx_ptr);
}

注意这里是要取出PCB，放回等待队列中。

进程生成

除了 initproc 是内核手动生成的，其余的进程都是通过 fork or exec 产生的。

先看 fork 的实现：（在此之前，给 MemorySet 新增了根据已有地址空间创建新的地址空间的方法from_existed_user）

impl TaskControlBlock {
    pub fn fork(self: &Arc<TaskControlBlock>) -> Arc<TaskControlBlock> {
        // ---- access parent PCB exclusively
        let mut parent_inner = self.inner_exclusive_access();
        // copy user space(include trap context)
        let memory_set = MemorySet::from_existed_user(
            &parent_inner.memory_set
        );
        let trap_cx_ppn = memory_set
            .translate(VirtAddr::from(TRAP_CONTEXT).into())
            .unwrap()
            .ppn();
        // alloc a pid and a kernel stack in kernel space
        let pid_handle = pid_alloc();
        let kernel_stack = KernelStack::new(&pid_handle);
        let kernel_stack_top = kernel_stack.get_top();
        let task_control_block = Arc::new(TaskControlBlock {
            pid: pid_handle,
            kernel_stack,
            inner: unsafe { UPSafeCell::new(TaskControlBlockInner {
                trap_cx_ppn,
                base_size: parent_inner.base_size,
                task_cx: TaskContext::goto_trap_return(kernel_stack_top),
                task_status: TaskStatus::Ready,
                memory_set,
                parent: Some(Arc::downgrade(self)),
                children: Vec::new(),
                exit_code: 0,
            })},
        });
        // add child
        parent_inner.children.push(task_control_block.clone());
        // modify kernel_sp in trap_cx
        // **** access children PCB exclusively
        let trap_cx = task_control_block.inner_exclusive_access().get_trap_cx();
        trap_cx.kernel_sp = kernel_stack_top;
        // return
        task_control_block
        // ---- stop exclusively accessing parent/children PCB automatically
    }
}

fork 的方法类似 new。注意我们复制地址空间时，顺便会把TrapContext给复制了（记得上章时，我们把TrapContext放在了用户地址空间的固定虚拟地址处），这样我们从内核态返回到这个新的进程的用户态时，会从调用 fork 后开始执行且状态和调用瞬间一样，十分巧妙。

如何让 parent 和 child 的 fork 返回值不一样？

pub fn sys_fork() -> isize {
    let current_task = current_task().unwrap();
    let new_task = current_task.fork();
    let new_pid = new_task.pid.0;
    // modify trap context of new_task, because it returns immediately after switching
    let trap_cx = new_task.inner_exclusive_access().get_trap_cx();
    // we do not have to move to next instruction since we have done it before
    // for child process, fork returns 0
    trap_cx.x[10] = 0;  //x[10] is a0 reg
    // add new task to scheduler
    add_task(new_task);
    new_pid as isize
}

我们需要手动设置系统调用的返回值，通过修改子进程的 TrapContext 。

接着看 exec 的实现：

exec 系统调用使得一个进程能够加载一个新应用的 ELF 可执行文件中的代码和数据替换原有的应用地址空间中的内容，并开始执行。

impl TaskControlBlock {
    pub fn exec(&self, elf_data: &[u8]) {
        // memory_set with elf program headers/trampoline/trap context/user stack
        let (memory_set, user_sp, entry_point) = MemorySet::from_elf(elf_data);
        let trap_cx_ppn = memory_set
            .translate(VirtAddr::from(TRAP_CONTEXT).into())
            .unwrap()
            .ppn();

        // **** access inner exclusively
        let mut inner = self.inner_exclusive_access();
        // substitute memory_set
        inner.memory_set = memory_set;
        // update trap_cx ppn
        inner.trap_cx_ppn = trap_cx_ppn;
        // initialize trap_cx
        let trap_cx = inner.get_trap_cx();
        *trap_cx = TrapContext::app_init_context(
            entry_point,
            user_sp,
            KERNEL_SPACE.exclusive_access().token(),
            self.kernel_stack.get_top(),
            trap_handler as usize,
        );
        // **** stop exclusively accessing inner automatically
    }
}

我们直接从 elf 文件构建出一个新的地址空间。原地址空间的生命周期结束，根据其类似 RAII 的特性，所分配的物理页都会自动释放。同时需要修改 TrapContext，初始化（类似 new）。无需修改 TaskContext，因为本质上还是同一个进程。

// os/src/syscall/process.rs

pub fn sys_exec(path: *const u8) -> isize {
    let token = current_user_token();
    let path = translated_str(token, path);
    if let Some(data) = get_app_data_by_name(path.as_str()) {
        let task = current_task().unwrap();
        task.exec(data);
        0
    } else {
        -1
    }
}

translated_str：我们的目标程序路径的字符串是在用户空间的，要在内核态读取需要经过用户页表的地址转换。

可以看到，我们的系统调用是有可能改变 TrapContext 的位置的，原来的 trap_handler 没有注意这个，我们需要在 syscall 函数后重新获取 TrapContext

进程退出

sys_exit / 程序异常都会调用 exit_current_and_run_next

// os/src/task/mod.rs

pub fn exit_current_and_run_next(exit_code: i32) {
    // take from Processor
    let task = take_current_task().unwrap();
    // **** access current TCB exclusively
    let mut inner = task.inner_exclusive_access();
    // Change status to Zombie
    inner.task_status = TaskStatus::Zombie;
    // Record exit code
    inner.exit_code = exit_code;
    // do not move to its parent but under initproc

    // ++++++ access initproc TCB exclusively
    {
        let mut initproc_inner = INITPROC.inner_exclusive_access();
        for child in inner.children.iter() {
            child.inner_exclusive_access().parent = Some(Arc::downgrade(&INITPROC));
            initproc_inner.children.push(child.clone());
        }
    }
    // ++++++ stop exclusively accessing parent PCB

    inner.children.clear();
    // deallocate user space
    inner.memory_set.recycle_data_pages();
    drop(inner);
    // **** stop exclusively accessing current PCB
    // drop task manually to maintain rc correctly
    drop(task);
    // we do not have to save task context
    let mut _unused = TaskContext::zero_init();
    schedule(&mut _unused as *mut _);
}

取出当前进程并设置为僵尸进程

我们将进程控制块中的状态修改为 TaskStatus::Zombie 即僵尸进程，这样它后续才能被父进程在 waitpid 系统调用的时候回收；
同时会将当前进程的全部子进程挂在 initproc 下
提前回收该应用的地址空间
无需提供当前的 TaskContext 的地址了：因为我们不会再回来

子进程的完全回收是通过父进程实现的：

/// If there is not a child process whose pid is same as given, return -1.
/// Else if there is a child process but it is still running, return -2.
pub fn sys_waitpid(pid: isize, exit_code_ptr: *mut i32) -> isize {
    let task = current_task().unwrap();
    // find a child process

    // ---- access current TCB exclusively
    let mut inner = task.inner_exclusive_access();
    if inner.children
        .iter()
        .find(|p| {pid == -1 || pid as usize == p.getpid()})
        .is_none() {
        return -1;
        // ---- stop exclusively accessing current PCB
    }
    let pair = inner.children
        .iter()
        .enumerate()
        .find(|(_, p)| {
            // ++++ temporarily access child PCB exclusively
            p.inner_exclusive_access().is_zombie() && (pid == -1 || pid as usize == p.getpid())
            // ++++ stop exclusively accessing child PCB
        });
    if let Some((idx, _)) = pair {
        let child = inner.children.remove(idx);
        // confirm that child will be deallocated after removing from children list
        assert_eq!(Arc::strong_count(&child), 1);
        let found_pid = child.getpid();
        // ++++ temporarily access child TCB exclusively
        let exit_code = child.inner_exclusive_access().exit_code;
        // ++++ stop exclusively accessing child PCB
        *translated_refmut(inner.memory_set.token(), exit_code_ptr) = exit_code;
        found_pid as isize
    } else {
        -2
    }
    // ---- stop exclusively accessing current PCB automatically
}

第 29 行确认这是对于该子进程控制块的唯一一次强引用，即它不会出现在某个进程的子进程向量中，更不会出现在处理器监控器或者任务管理器中。当它所在的代码块结束，这次引用变量的生命周期结束，将导致该子进程进程控制块的引用计数变为 0 ，彻底回收掉它占用的所有资源，包括：内核栈和它的 PID 还有它的应用地址空间存放页表的那些物理页帧等等。

第 34 行写入到当前进程的应用地址空间中。由于应用传递给内核的仅仅是一个指向应用地址空间中保存子进程返回值的内存区域的指针，我们还需要在 translated_refmut 中手动查页表找到应该写入到物理内存中的哪个位置，这样才能把子进程的退出码 exit_code 返回给父进程。其实现可以在 os/src/mm/page_table.rs 中找到，比较简单，在这里不再赘述。

#Operating-System

RCore-伤齿龙

https://pactheman123.github.io/2025/02/13/RCore-伤齿龙/

作者

Xiaopac

发布于

2025年2月13日

许可协议

RCore-霸王龙上一篇

如何阅读源码下一篇