1. 程式人生 > >android qemu-kvm i8259 中斷控制器虛擬裝置

android qemu-kvm i8259 中斷控制器虛擬裝置

ubuntu12.04下使用android emulator,啟用kvm加速,模擬i8259中斷控制器的程式碼比較舊,對應於qemu0.14或者之前的版本,這時還沒有QOM(qemu object model)模型,虛擬裝置的程式碼是比較簡單的。

8259主片的IRQ0~7對應INT 8~INT F,從片的IRQ8~IRQ15對應INT 70~INT 77。

初始化

8259是在pc_init1中初始化的,cpu_irq是8259的parent_irq:

cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1);
i8259 = i8259_init(cpu_irq[0]);

qemu_allocate_irqs用來申請並設定qemu_irq結構體的:

qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n)
{
    qemu_irq *s;
    struct IRQState *p;
    int i;

    s = (qemu_irq *)g_malloc0(sizeof(qemu_irq) * n);
    p = (struct IRQState *)g_malloc0(sizeof(struct IRQState) * n);
    for (i = 0; i < n; i++) {
        p->handler = handler;
        p->opaque = opaque;
        p->n = i;
        s[i] = p;
        p++;
    }
    return s;
}
i8259_init是8259的初始化函式,初始化了主從兩片8259,iobase分別為0x20,0xa0,length都是2,寬度都是8bit。

elcr地址分別為0x4d0,0x4d1,分別對應兩片8259,每一位對應一個irq,是控制邊沿觸發還是電平觸發的,置1時是電平觸發,elcr_mask是因為有些irq不支援電平觸發,所以需要mask。

<a target=_blank href="https://en.wikipedia.org/wiki/Intel_8259">Edge and level triggered modes</a>[edit]
Since the ISA bus does not support level triggered interrupts, level triggered mode may not be used for interrupts connected to ISA devices. This means that on PC/XT, PC/AT, and compatible systems the 8259 must be programmed for edge triggered mode. On MCA systems, devices use level triggered interrupts and the interrupt controller is hardwired to always work in level triggered mode. On newer EISA, PCI, and later systems the Edge/Level Control Registers (ELCRs) control the mode per IRQ line, effectively making the mode of the 8259 irrelevant for such systems with ISA buses. The ELCR is programmed by the BIOS at system startup for correct operation.

The ELCRs are located 0x4d0 and 0x4d1 in the x86 I/O address space. They are 8-bits wide, each bit corresponding to an IRQ from the 8259s. When a bit is set, the IRQ is in level triggered mode; otherwise, the IRQ is in edge triggered mode.

最後申請了GFD_MAX_IRQ個,也就是16個qemu_irq結構體:

qemu_irq *i8259_init(qemu_irq parent_irq)
{
    PicState2 *s;

    s = g_malloc0(sizeof(PicState2));
    pic_init1(0x20, 0x4d0, &s->pics[0]);
    pic_init1(0xa0, 0x4d1, &s->pics[1]);
    s->pics[0].elcr_mask = 0xf8;
    s->pics[1].elcr_mask = 0xde;
    s->parent_irq = parent_irq;
    s->pics[0].pics_state = s;
    s->pics[1].pics_state = s;
    isa_pic = s;
    return qemu_allocate_irqs(i8259_set_irq, s, GFD_MAX_IRQ);
}

struct PicState2 {
    /* 0 is master pic, 1 is slave pic */
    /* XXX: better separation between the two pics */
    PicState pics[2];
    qemu_irq parent_irq;
    void *irq_request_opaque;
    /* IOAPIC callback support */
    SetIRQFunc *alt_irq_func;
    void *alt_irq_opaque;
};
typedef struct PicState {
    uint8_t last_irr; /* edge detection */
    uint8_t irr; /* interrupt request register */
    uint8_t imr; /* interrupt mask register */
    uint8_t isr; /* interrupt service register */
    uint8_t priority_add; /* highest irq priority */
    uint8_t irq_base;
    uint8_t read_reg_select;
    uint8_t poll;
    uint8_t special_mask;
    uint8_t init_state;
    uint8_t auto_eoi;
    uint8_t rotate_on_auto_eoi;
    uint8_t special_fully_nested_mode;
    uint8_t init4; /* true if 4 byte init */
    uint8_t single_mode; /* true if slave pic is not initialized */
    uint8_t elcr; /* PIIX edge/trigger selection*/
    uint8_t elcr_mask;
    PicState2 *pics_state;
} PicState;


pic_init1用來真正初始化每一片8259的,綁定了暫存器和讀寫函式,qemu_register_reset把暫存器的復位函式放到連結串列裡:

static void pic_init1(int io_addr, int elcr_addr, PicState *s)
{
    register_ioport_write(io_addr, 2, 1, pic_ioport_write, s);
    register_ioport_read(io_addr, 2, 1, pic_ioport_read, s);
    if (elcr_addr >= 0) {
        register_ioport_write(elcr_addr, 1, 1, elcr_ioport_write, s);
        register_ioport_read(elcr_addr, 1, 1, elcr_ioport_read, s);
    }
    register_savevm(NULL, "i8259", io_addr, 1, pic_save, pic_load, s);
    qemu_register_reset(pic_reset, 0, s);
}


elcr的讀寫

elcr的讀寫函式非常簡單,稍微注意下mask的使用就行了:
static void elcr_ioport_write(void *opaque, uint32_t addr, uint32_t val)
{
    PicState *s = opaque;
    s->elcr = val & s->elcr_mask;
}

static uint32_t elcr_ioport_read(void *opaque, uint32_t addr1)
{
    PicState *s = opaque;
    return s->elcr;
}

8259暫存器的讀寫

pic_ioport_write

8259的寫函式為pic_ioport_write,因為每片就兩個暫存器,addr不是0就是1,所以addr &= 1。

ICW1地址為0,ICW2~4地址為1;OCW2~3地址為0,OCW1地址為1。

需要注意地址的複用如何處理,ICW的指令是用於初始化的,先往地址0寫ICW1,然後往地址1寫剩下的幾個ICW指令,具體寫的是哪個,由狀態機init_state來確定。

初始化完畢後才可以寫入OCW指令。

ICW1,OCW2,OCW3複用地址0,是根據val中的特殊位來區分的。初始化完畢後,地址1僅僅對應OCW1。

OCW2需要詳細說明下:

1、中斷優先順序:每片8259由irq0~irq7共計8箇中斷輸入,預設情況下irq0優先順序最高,irq7優先順序最低,同時發生中斷請求時,優先順序高的先處理,在巢狀模式下,優先順序高的還可以打斷優先順序低的中斷服務程式的執行。

2、迴圈優先順序:這次優先順序最高的是0,下一次中斷時,優先機最高的輪到1,然後輪到2......,到7,然後再到0。

3、SL用來設定一個偏移量的,加上這個偏移並對8取模後再比較優先順序。

static void pic_ioport_write(void *opaque, uint32_t addr, uint32_t val)
{
    PicState *s = opaque;
    int priority, cmd, irq;

#ifdef DEBUG_PIC
    printf("pic_write: addr=0x%02x val=0x%02x\n", addr, val);
#endif
    addr &= 1;
    if (addr == 0) {
        if (val & 0x10) {  //ICW1
            /* init */
            pic_reset(s);
            /* deassert a pending interrupt */
            qemu_irq_lower(s->pics_state->parent_irq);
            s->init_state = 1;
            s->init4 = val & 1; //IC4,是否有ICW4
            s->single_mode = val & 2; //SNGL,單片還是級連
            if (val & 0x08) //只支援邊沿觸發
                hw_error("level sensitive irq not supported");
        } else if (val & 0x08) { // OCW3
            if (val & 0x04)
                s->poll = 1; // 查詢中斷狀體暫存器
            if (val & 0x02)
                s->read_reg_select = val & 1; // 讀取IRR還是ISR
            if (val & 0x40)
                s->special_mask = (val >> 5) & 1; // 特殊遮蔽
        } else { //OCW2,中斷方式設定,是否自動清除標誌位,是否自動迴圈等
            cmd = val >> 5;
            switch(cmd) {
            case 0:
            case 4: // 是否自動迴圈
                s->rotate_on_auto_eoi = cmd >> 2;
                break;
            case 1: /* end of interrupt */
            case 5: // 需要在中斷函式中清除中斷標誌位
                priority = get_priority(s, s->isr);
                if (priority != 8) { // 有中斷
                    irq = (priority + s->priority_add) & 7; // 根據優先順序,計算irq
                    s->isr &= ~(1 << irq); // 清除irq對應的位
                    if (cmd == 5) // 如果是自動迴圈的話,需要對priority_add進行加1
                        s->priority_add = (irq + 1) & 7;
                    pic_update_irq(s->pics_state); // 更新中斷
                }
                break;
            case 3:
                irq = val & 7;
                s->isr &= ~(1 << irq);
                pic_update_irq(s->pics_state);
                break;
            case 6:
                s->priority_add = (val + 1) & 7; // 指定最優先的irq是誰
                pic_update_irq(s->pics_state);
                break;
            case 7:
                irq = val & 7;
                s->isr &= ~(1 << irq);
                s->priority_add = (irq + 1) & 7; // 優先順序自動迴圈
                pic_update_irq(s->pics_state);
                break;
            default:
                /* no operation */
                break;
            }
        }
    } else {
        switch(s->init_state) {
        case 0: // OCW1,中斷遮蔽位
            /* normal mode */
            s->imr = val;
            pic_update_irq(s->pics_state);
            break;
        case 1: // ICW2
            s->irq_base = val & 0xf8; //設定中斷型號,也就是irq和中斷向量表的對映
            s->init_state = s->single_mode ? (s->init4 ? 3 : 0) : 2; //狀態機的切換
            break;
        case 2: // ICW3
            if (s->init4) {
                s->init_state = 3;
            } else {
                s->init_state = 0;
            }
            break;
        case 3: // ICW4
            s->special_fully_nested_mode = (val >> 4) & 1;
            s->auto_eoi = (val >> 1) & 1;
            s->init_state = 0; // 初始化結束
            break;
        }
    }
}
尋找這片8259的優先順序最高的中斷,mask就是isr,需要考慮優先順序迴圈和SL設定的偏移對優先順序的影響。

沒有中斷時返回8。

priority_add綜合了自動迴圈和SL設定的東西的因素,對優先順序進行調整。

/* return the highest priority found in mask (highest = smallest
   number). Return 8 if no irq */
static inline int get_priority(PicState *s, int mask)
{
    int priority;
    if (mask == 0)
        return 8;
    priority = 0;
    while ((mask & (1 << ((priority + s->priority_add) & 7))) == 0)
        priority++;
    return priority;
}
更新中斷的狀態,如果有中斷請求,那麼qemu_irq_raise請求parent_irq,也就是cpu_irq,去對CPU產生中斷請求
void pic_update_irq(PicState2 *s)
{
    int irq2, irq;

    /* first look at slave pic */
    irq2 = pic_get_irq(&s->pics[1]);
    if (irq2 >= 0) {
        /* if irq request by slave pic, signal master PIC */
        pic_set_irq1(&s->pics[0], 2, 1);  // slave 8259接在master的irq2上,模擬一個邊沿觸發master的irq2
        pic_set_irq1(&s->pics[0], 2, 0);
    }
    /* look at requested irq */
    irq = pic_get_irq(&s->pics[0]);
    if (irq >= 0) {
        qemu_irq_raise(s->parent_irq);
    }

/* all targets should do this rather than acking the IRQ in the cpu */
#if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)
    else {
        qemu_irq_lower(s->parent_irq);
    }
#endif
}

irr中斷請求,isr中斷服務。經過irr後才能到isr。irr表示請求中斷,isr表示正在處理的中斷。

static int pic_get_irq(PicState *s)
{
    int mask, cur_priority, priority;

    mask = s->irr & ~s->imr;
    priority = get_priority(s, mask); //獲得irr中優先順序最高的
    if (priority == 8)
        return -1;
    /* compute current priority. If special fully nested mode on the
       master, the IRQ coming from the slave is not taken into account
       for the priority computation. */
    mask = s->isr;
    if (s->special_mask) // in OCW3
        mask &= ~s->imr;
    if (s->special_fully_nested_mode && s == &s->pics_state->pics[0])
        mask &= ~(1 << 2);
    cur_priority = get_priority(s, mask);
    if (priority < cur_priority) { // irr中最優先的比isr中最優先的小,也就是更優先
        /* higher priority found: an irq should be generated */
        return (priority + s->priority_add) & 7;
    } else {
        return -1;
    }
}
設定中斷請求暫存器irr
static inline void pic_set_irq1(PicState *s, int irq, int level)
{
    int mask;
    mask = 1 << irq;
    if (s->elcr & mask) {
        /* level triggered */
        if (level) {
            s->irr |= mask;
            s->last_irr |= mask;
        } else {
            s->irr &= ~mask;
            s->last_irr &= ~mask;
        }
    } else {
        /* edge triggered */
        if (level) {
            if ((s->last_irr & mask) == 0)
                s->irr |= mask;
            s->last_irr |= mask;
        } else {
            s->last_irr &= ~mask;
        }
    }
}

pic_ioport_read

8259的讀函式為pic_ioport_read
static uint32_t pic_ioport_read(void *opaque, uint32_t addr1)
{
    PicState *s = opaque;
    unsigned int addr;
    int ret;

    addr = addr1;
    addr &= 1;
    if (s->poll) {
        ret = pic_poll_read(s, addr1);
        s->poll = 0;
    } else {
        if (addr == 0) {
            if (s->read_reg_select)
                ret = s->isr;
            else
                ret = s->irr;
        } else {
            ret = s->imr;
        }
    }
#ifdef DEBUG_PIC
    printf("pic_read: addr=0x%02x val=0x%02x\n", addr1, ret);
#endif
    return ret;
}

static uint32_t pic_poll_read (PicState *s, uint32_t addr1)
{
    int ret;

    ret = pic_get_irq(s);
    if (ret >= 0) {
        if (addr1 >> 7) { // 從片地址第7位為1,0xa0
            s->pics_state->pics[0].isr &= ~(1 << 2);
            s->pics_state->pics[0].irr &= ~(1 << 2);
        }
        s->irr &= ~(1 << ret);
        s->isr &= ~(1 << ret);
        if (addr1 >> 7 || ret != 2)
            pic_update_irq(s->pics_state);
    } else {
        ret = 0x07;
        pic_update_irq(s->pics_state);
    }

    return ret;
}


如何通知CPU來了一箇中斷?

qemu_set_irq用來設定中斷請求,會呼叫申請qemu_irq時設定的handler函式,對於cpu_irq來說,handler是pic_irq_request;對於8259來說,handler是i8259_set_irq

void qemu_set_irq(qemu_irq irq, int level)
{
    if (!irq)
        return;

    irq->handler(irq->opaque, irq->n, level);
}

pic_irq_request會設定cpu->interrupt_request |= CPU_INTERRUPT_HARD。

i8259_set_irq最終也會呼叫到pic_irq_request函式。

static void pic_irq_request(void *opaque, int irq, int level)
{
    CPUState *cpu = first_cpu;
    CPUArchState *env = cpu->env_ptr;

    if (env->apic_state) {
        while (cpu) {
            if (apic_accept_pic_intr(env))
                apic_deliver_pic_intr(env, level);
            cpu = QTAILQ_NEXT(cpu, node);
            env = cpu ? cpu->env_ptr : NULL;
        }
    } else {
        if (level)
            cpu_interrupt(cpu, CPU_INTERRUPT_HARD);
        else
            cpu_reset_interrupt(cpu, CPU_INTERRUPT_HARD);
    }
}

void cpu_interrupt(CPUState *cpu, int mask)
{
    CPUArchState *env = cpu->env_ptr;
    int old_mask;

    old_mask = cpu->interrupt_request;
    cpu->interrupt_request |= mask;

    /*
     * If called from iothread context, wake the target cpu in
     * case its halted.
     */
    if (!qemu_cpu_is_self(cpu)) {
        qemu_cpu_kick(cpu);
        return;
    }

    if (use_icount) {
        env->icount_decr.u16.high = 0xffff;
        if (!can_do_io(env)
            && (mask & ~old_mask) != 0) {
            cpu_abort(env, "Raised interrupt while not in I/O function");
        }
    } else {
        cpu->tcg_exit_req = 1;
    }
}

cpu_interrupt注入的中斷,會在kvm_arch_pre_run中進行處理。根據cpu->interrupt_request的設定,會呼叫kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr):

int kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
{
    CPUX86State *env = cpu->env_ptr;

    /* Try to inject an interrupt if the guest can accept it */
    if (run->ready_for_interrupt_injection &&
        (cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
        (env->eflags & IF_MASK)) {
        int irq;

        cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
        irq = cpu_get_pic_interrupt(env);
        if (irq >= 0) {
            struct kvm_interrupt intr;
            intr.irq = irq;
            /* FIXME: errors */
            dprintf("injected interrupt %d\n", irq);
            kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr);
        }
    }

    /* If we have an interrupt but the guest is not ready to receive an
     * interrupt, request an interrupt window exit.  This will
     * cause a return to userspace as soon as the guest is ready to
     * receive interrupts. */
    if ((cpu->interrupt_request & CPU_INTERRUPT_HARD))
        run->request_interrupt_window = 1;
    else
        run->request_interrupt_window = 0;

    dprintf("setting tpr\n");
    run->cr8 = cpu_get_apic_tpr(env);

#ifdef CONFIG_KVM_GS_RESTORE
    gs_base_pre_run();
#endif

    return 0;
}
kvm_arch_pre_run會在kvm_cpu_exec的迴圈中執行的,每次退出kvm核心態,重新kvm_run之前會呼叫這個。所以中斷的注入並不是實時的,需要等kvm退出後,才能夠進行真正的注入:
int kvm_cpu_exec(CPUState *cpu)
{
    CPUArchState *env = cpu->env_ptr;
    struct kvm_run *run = cpu->kvm_run;
    int ret;

    dprintf("kvm_cpu_exec()\n");

    do {
        if (cpu->exit_request) {
            dprintf("interrupt exit requested\n");
            ret = 0;
            break;
        }

        kvm_arch_pre_run(cpu, run);
        ret = kvm_arch_vcpu_run(cpu);
        kvm_arch_post_run(cpu, run);

        if (ret == -EINTR || ret == -EAGAIN) {
            dprintf("io window exit\n");
            ret = 0;
            break;
        }

        if (ret < 0) {
            dprintf("kvm run failed %s\n", strerror(-ret));
            abort();
        }

        kvm_run_coalesced_mmio(cpu, run);

        ret = 0; /* exit loop */
        switch (run->exit_reason) {
        case KVM_EXIT_IO:
            dprintf("handle_io\n");
            ret = kvm_handle_io(cpu, run->io.port,
                                (uint8_t *)run + run->io.data_offset,
                                run->io.direction,
                                run->io.size,
                                run->io.count);
            break;



apic,iopic,lapic是啥?

APIC(Advanced Programmable Interrupt Controller)取代了8259,成為目前標準的中斷控制器,包括了兩部分: iopic和lapic,iopic接裝置,每個cpu都有lapic。iopic把中斷請求發給lapic。

APIC方式下,支援更多的中斷,無需使用中斷共享。

結尾

android goldfish platform bus的中斷控制器在guest為x86時不啟用的。

現在qemu的8259都是使用了QOM模型了,這個模型太TMD的複雜了。另外hw/i386/kvm/timer/i8259.c中提供了kvm版本的8259,使用kvm提供的核心態的8259的模擬,中斷的處理和IO的讀寫都在核心態,不需要退出kvm了,速度要更快些,有提供了核心態的apic的模擬。類似的,8254之類的也有kvm核心態的實現,所以說android emulator的效能還是有提升空間的。