DPDK rte_mempool创建与使用

一、前言

    rte_mempool库是DPDK中的一个基本核心库,它是提高DPDK性能的方式之一,DPDK中基本所有的设备的应用都会应用到它。了解它,有助于性能问题定位,有助于跟深入理解DPDK。 rte_mempool的核心库位于工程的lib\mempool\目录下。

二、rte_mempool结构介绍

2.1 rte_mempool结构体       

    介绍rte_mempool之前,先了解以下rte_mempool结构体的定义,其定义位于rte_mempool.h中,结构体定义如下:


各域段含义:

name: 表示内存池的名字,一个进程中的内存池的名字不可相同,否则申请不会成功(申请memzone时检测)。内存池名字的唯一性,决定了可以通过内存池的名字,通过rte_mempool_lookup()对外接口在全局rte_mempool_list中找到该内存池的地址。 *pool_data或pool_id,这是一个枚举体。pool_data指向该mempool中用于存储rte_ring的首地址。 pool_config:应用传给ops函数的不透明数据。当前DPDK框架层未用到,cnxk和mlx5自定的有用到。 mz:内存池的内存memzone。 flags:分配内存池的flags,多生产者多消费者的模式,通过该flag指定,决定了rte_mempool_ops的类型。 socket_id:分配内存池所在的socket_id; size: 内存池中mbuf的个数 cache_size:内存池中每个core的本地cache大小 elt_size:对象中一个元素的大小。等于rte_mbuf结构体大小+私有数据+mbuf_data_room_size. header_size和trailer_size分别表示对象的头部和尾部大小 private_data_size:添加在rte_mempool结构体后面的用于存储私有数据的一段私有数据大小。对于网络设备的pktmbuf内存池,其大小就是struct rte_pktmbuf_pool_private结构体的大小。
ops_index: rte_mempool可以通过名字指定rte_mempool_ops,rte_mempool_ops中有分配和释放、入队和出队、获取有效的对象个数、内存池填充、内存池信息获取和计算存储指定数量对象的memory size。DPDK中有支持多个rte_mempool_ops,如,ops_mp_mc、ops_sp_sc、ops_mp_sc、ops_sp_mc、ops_mt_rts和ops_mt_hts。用户也可以自定义这些ops,然后通过将其注册到全局rte_mempool_ops_table变量中,该变量中定义了一个ops数组。ops注册到该全局变量后,该ops就占用了一个index。这里的ops_index就是DPDK中注册的rte_mempool_ops在全局变量定义的数组的下表。 local_cache指向rte_mempool的本地核的chache内存,具体细节下文还会提到。 populated_size:已填充的对象个数 elt_list:内存池中对象是通过该链表将其串起来的。
nb_mem_chunks:memory chunks的数量 mem_list:数据类型为struct rte_mempool_memhdr,其记录了一个chunk的iova、va和内存大小,通过tailq将mempool中所有的memory chunk串在一起。对象的内存就是memory chunks关联的。

2.2 mempool的结构

    rte_memool库的基本概念,也可以从中也有一些介绍。mempool的是通过三部分实现的:
    mempool对象节点:mempool对象节点,通过名称来唯一标识,其在创建时挂接在全局static struct rte_tailq_elem rte_mempool_tailq链表中。通过名字可以找到该对象节点,对象节点保存了rte_mempool的地址。 mempool的实体内存区域:rte_mempool中的mz保存了实际分配的连续内存空间的信息,mz->addr就是rte_mempool的地址,存储了所mempool对象实体。对象实体,有三部分构成:rte_mempool结构体,private data和local cache(每个核都有一个)构成。 ring无锁队列:无锁环形队列struct rte_ring,rte_ring的内存结构中包含了一个指针数组,其指向了mempool的所有对象。
    rte_mempool中本地cache、rte_ring和对象的存取关系图如下:
    rte_mempool中引入的local_cache对象缓冲区,并非硬件上的cache,DPDK应用的业务线程一般绑核的,因此是为了减少多核访问ring造成的临界区访问。local_cache上和rte_ring中一样,有一个指针数组,指向具体的对象。从coreX上的app会优先访问该local_cache上的对象。入队的时候优先入local_cache中,出队时优先出local_cache中。当cache是空时,则会从rte_ring中取对象;当cache被放满时,则会将多余的对象放入到rte_ring中。

三、rte_mempool创建

    下面以pktmbuf pool的创建流程为例进行rte_mempool创建说明。

3.1 pktmbuf pool私有数据计算

点击(此处)折叠或打开

    // 每个mbuf的大小
    elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + (unsigned)data_room_size; // 每个mbuf data_room_size mbp_priv.mbuf_data_room_size = data_room_size;
    mbp_priv.mbuf_priv_size = priv_size;
mbuf对象有三部分构成:rte_mbuf结构头,priv_size和data_room。

3.2 空mempool创建

    创建空memepool接口为rte_mempool_create_empty()。该接口中做了如下事情:

    通过rte_mempool_calc_obj_size计算mempool的object的大小。object的内存结构为:header + element_size + trailer。其中头就是struct rte_mempool_objhdr结构,记录了对象所属mp和对象的iova地址。 分配一个struct rte_tailq_entry并将其插入到全局的static struct rte_tailq_elem rte_mempool_tailq上。

点击(此处)折叠或打开

    mempool_list = RTE_TAILQ_CAST(rte_mempool_tailq.head, rte_mempool_list); struct rte_tailq_entry *te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0); te->data = mp; TAILQ_INSERT_TAIL(mempool_list, te, next);
    3. 计算mempool的大小:rte_mempool结构体大小 + sizeof(struct rte_mempool_cache) * RTE_MAX_LCORE) + private_data_size

点击(此处)折叠或打开

    mempool_size = RTE_MEMPOOL_HEADER_SIZE(mp, cache_size); mempool_size += private_data_size; mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
    4. 计算完mempool大小后,申请mempool的内存

点击(此处)折叠或打开

    mz = rte_memzone_reserve(mz_name, mempool_size, socket_id, mz_flags);
    if (mz == NULL)
        goto exit_unlock;

    /* init the mempool structure */
    mp = mz->addr;
    memset(mp, 0, RTE_MEMPOOL_HEADER_SIZE(mp, cache_size));
    ret = strlcpy(mp->name, name, sizeof(mp->name));
    if (ret < 0 || ret >= (int)sizeof(mp->name)) {
        rte_errno = ENAMETOOLONG;
        goto exit_unlock;
    }
    mp->mz = mz;
    mp->size = n;
    mp->flags = flags;
    mp->socket_id = socket_id;
    mp->elt_size = objsz.elt_size;
    mp->header_size = objsz.header_size;
    mp->trailer_size = objsz.trailer_size;
    /* Size of default caches, zero means disabled. */
    mp->cache_size = cache_size;
    mp->private_data_size = private_data_size;
    STAILQ_INIT(&mp->elt_list);
    STAILQ_INIT(&mp->mem_list);

    /*
     * local_cache pointer is set even if cache_size is zero.
     * The local_cache points to just past the elt_pa[] array.
     */
    mp->local_cache = (struct rte_mempool_cache *)
        RTE_PTR_ADD(mp, RTE_MEMPOOL_HEADER_SIZE(mp, 0));

    /* Init all default caches. */
    if (cache_size != 0) {
        for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
            mempool_cache_init(&mp->local_cache[lcore_id],
                     cache_size);
    }
    5. 初始化mempool结构体即,初始化mempool中的每个local_cache数据,如35-38行。

3.3 设置mempool ops

调用rte_mempool_set_ops_byname()通过名字设置mempool ops。

点击(此处)折叠或打开

    int rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,     void *pool_config)
    {
        struct rte_mempool_ops *ops = NULL;
        unsigned i;

        /* too late, the mempool is already populated. */
        if (mp->flags & RTE_MEMPOOL_F_POOL_CREATED)
            return -EEXIST;

        for (i = 0; i < rte_mempool_ops_table.num_ops; i++) {
            if (!strcmp(name,
                    rte_mempool_ops_table.ops[i].name)) {
                ops = &rte_mempool_ops_table.ops[i];
                break;
            }
        }

        if (ops == NULL)
            return -EINVAL;

        mp->ops_index = i;
        mp->pool_config = pool_config;
        rte_mempool_trace_set_ops_byname(mp, name, pool_config);
        return 0;
    }

3.4 pool私有数据初始化

调用rte_pktmbuf_pool_init()初始化pool中的私有数据结构。

点击(此处)折叠或打开

    void
    rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg)
    {
        struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv;
        struct rte_pktmbuf_pool_private default_mbp_priv;
        uint16_t roomsz;

        RTE_ASSERT(mp->private_data_size >=
             sizeof(struct rte_pktmbuf_pool_private));
        RTE_ASSERT(mp->elt_size >= sizeof(struct rte_mbuf));

        /* if no structure is provided, assume no mbuf private area */
        user_mbp_priv = opaque_arg;
        if (user_mbp_priv == NULL) {
            memset(&default_mbp_priv, 0, sizeof(default_mbp_priv));
            if (mp->elt_size > sizeof(struct rte_mbuf))
                roomsz = mp->elt_size - sizeof(struct rte_mbuf);
            else
                roomsz = 0;
            default_mbp_priv.mbuf_data_room_size = roomsz;
            user_mbp_priv = &default_mbp_priv;
        }

        RTE_ASSERT(mp->elt_size >= sizeof(struct rte_mbuf) +
            ((user_mbp_priv->flags & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
                sizeof(struct rte_mbuf_ext_shared_info) :
                user_mbp_priv->mbuf_data_room_size) +
            user_mbp_priv->mbuf_priv_size);
        RTE_ASSERT((user_mbp_priv->flags &
             ~RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) == 0);

        mbp_priv = rte_mempool_get_priv(mp);
        memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv));
    }

3.5 填充mempool

填充mempool的实现如下:

点击(此处)折叠或打开

    int
    rte_mempool_populate_default(struct rte_mempool *mp)
    {
        unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
        char mz_name[RTE_MEMZONE_NAMESIZE];
        const struct rte_memzone *mz;
        ssize_t mem_size;
        size_t align, pg_sz, pg_shift = 0;
        rte_iova_t iova;
        unsigned mz_id, n;
        int ret;
        bool need_iova_contig_obj;
        size_t max_alloc_size = SIZE_MAX;

        ret = mempool_ops_alloc_once(mp);
        if (ret != 0)
            return ret;

        /* mempool must not be populated */
        if (mp->nb_mem_chunks != 0)
            return -EEXIST;

        /*
         * the following section calculates page shift and page size values.
         *
         * these values impact the result of calc_mem_size operation, which
         * returns the amount of memory that should be allocated to store the
         * desired number of objects. when not zero, it allocates more memory
         * for the padding between objects, to ensure that an object does not
         * cross a page boundary. in other words, page size/shift are to be set
         * to zero if mempool elements won't care about page boundaries.
         * there are several considerations for page size and page shift here.
         *
         * if we don't need our mempools to have physically contiguous objects,
         * then just set page shift and page size to 0, because the user has
         * indicated that there's no need to care about anything.
         *
         * if we do need contiguous objects (if a mempool driver has its
         * own calc_size() method returning min_chunk_size = mem_size),
         * there is also an option to reserve the entire mempool memory
         * as one contiguous block of memory.
         *
         * if we require contiguous objects, but not necessarily the entire
         * mempool reserved space to be contiguous, pg_sz will be != 0,
         * and the default ops->populate() will take care of not placing
         * objects across pages.
         *
         * if our IO addresses are physical, we may get memory from bigger
         * pages, or we might get memory from smaller pages, and how much of it
         * we require depends on whether we want bigger or smaller pages.
         * However, requesting each and every memory size is too much work, so
         * what we'll do instead is walk through the page sizes available, pick
         * the smallest one and set up page shift to match that one. We will be
         * wasting some space this way, but it's much nicer than looping around
         * trying to reserve each and every page size.
         *
         * If we fail to get enough contiguous memory, then we'll go and
         * reserve space in smaller chunks.
         */

        need_iova_contig_obj = !(mp->flags & RTE_MEMPOOL_F_NO_IOVA_CONTIG);
        ret = rte_mempool_get_page_size(mp, &pg_sz);
        if (ret < 0)
            return ret;

        if (pg_sz != 0)
            pg_shift = rte_bsf32(pg_sz);

        for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
            size_t min_chunk_size;

            mem_size = rte_mempool_ops_calc_mem_size(
                mp, n, pg_shift, &min_chunk_size, &align);
            if (mem_size < 0) {
                ret = mem_size;
                goto fail;
            }

            ret = snprintf(mz_name, sizeof(mz_name),
                RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
            if (ret < 0 || ret >= (int)sizeof(mz_name)) {
                ret = -ENAMETOOLONG;
                goto fail;
            }

            /* if we're trying to reserve contiguous memory, add appropriate
             * memzone flag.
             */
            if (min_chunk_size == (size_t)mem_size)
                mz_flags |= RTE_MEMZONE_IOVA_CONTIG;

            /* Allocate a memzone, retrying with a smaller area on ENOMEM */
            do {
                mz = rte_memzone_reserve_aligned(mz_name,
                    RTE_MIN((size_t)mem_size, max_alloc_size),
                    mp->socket_id, mz_flags, align);

                if (mz != NULL || rte_errno != ENOMEM)
                    break;

                max_alloc_size = RTE_MIN(max_alloc_size,
                            (size_t)mem_size) / 2;
            } while (mz == NULL && max_alloc_size >= min_chunk_size);

            if (mz == NULL) {
                ret = -rte_errno;
                goto fail;
            }

            if (need_iova_contig_obj)
                iova = mz->iova;
            else
                iova = RTE_BAD_IOVA;

            if (pg_sz == 0 || (mz_flags & RTE_MEMZONE_IOVA_CONTIG))
                ret = rte_mempool_populate_iova(mp, mz->addr,
                    iova, mz->len,
                    rte_mempool_memchunk_mz_free,
                    (void *)(uintptr_t)mz);
            else
                ret = rte_mempool_populate_virt(mp, mz->addr,
                    mz->len, pg_sz,
                    rte_mempool_memchunk_mz_free,
                    (void *)(uintptr_t)mz);
            if (ret == 0) /* should not happen */
                ret = -ENOBUFS;
            if (ret < 0) {
                rte_memzone_free(mz);
                goto fail;
            }
        }

        rte_mempool_trace_populate_default(mp);
        return mp->size;

     fail:
        rte_mempool_free_memchunks(mp);
        return ret;
    }
创建rte_ring     在上面填充实现接口中,通过rte_mempool_ops创建内存池中的rte_ring,并将其地址赋给mp->pool_data,实现流程间如下代码:

点击(此处)折叠或打开

    static int
    mempool_ops_alloc_once(struct rte_mempool *mp)
    {
        int ret;

        /* create the internal ring if not already done */
        if ((mp->flags & RTE_MEMPOOL_F_POOL_CREATED) == 0) {
            ret = rte_mempool_ops_alloc(mp);
            if (ret != 0)
                return ret;
            mp->flags |= RTE_MEMPOOL_F_POOL_CREATED;
        }
        return 0;
    }

    int
    rte_mempool_ops_alloc(struct rte_mempool *mp)
    {
        struct rte_mempool_ops *ops;

        rte_mempool_trace_ops_alloc(mp);
        ops = rte_mempool_get_ops(mp->ops_index);
        return ops->alloc(mp);
    }

    static int
    ring_alloc(struct rte_mempool *mp, uint32_t rg_flags)
    {
        int ret;
        char rg_name[RTE_RING_NAMESIZE];
        struct rte_ring *r;

        ret = snprintf(rg_name, sizeof(rg_name),
            RTE_MEMPOOL_MZ_FORMAT, mp->name);
        if (ret < 0 || ret >= (int)sizeof(rg_name)) {
            rte_errno = ENAMETOOLONG;
            return -rte_errno;
        }

        /*
         * Allocate the ring that will be used to store objects.
         * Ring functions will return appropriate errors if we are
         * running as a secondary process etc., so no checks made
         * in this function for that condition.
         */
        r = rte_ring_create(rg_name, rte_align32pow2(mp->size + 1),
            mp->socket_id, rg_flags);
        if (r == NULL)
            return -rte_errno;

        mp->pool_data = r;

        return 0;
    }
再顺便补充一下:内存池的rte_ring{BANNED}{BANNED}{BANNED}最佳佳佳终时通过rte_ring_create_elem()接口创建的。该接口创建时,从rte_memzone里申请rte_ring的内存(结构为:rte_ring结构体+void*ptr[mp->size]),并将rte_ring的地址和对应的memzone地址保存在struct rte_tailq_entry中,将其插入到全局的rte_ring_tailq上。具体请查看rte_ring_create_elem()的实现。
得到page_size和page_shift,存放所有的mbuf。计算当前可用的{BANNED}{BANNED}{BANNED}最佳佳佳小chunk大小,申请chunk内存。每个chunk memory的信息以struct rte_mempool_memhdr形式保存下来,插入到mp->mem_list中,chunk memory的数量保存在mp->nb_mem_chunks。在chunk虚拟内存中,依次划分对象实体,通过rte_mempoo_ops填充接口rte_mempool_ops_populate(){BANNED}{BANNED}{BANNED}最佳佳佳终调用mempool_add_elem()将一个个实体对象插入到mp->elt_list链表上,关键函数如下。

点击(此处)折叠或打开

    i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size,
            (char *)vaddr + off,
            (iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off),
            len - off, mempool_add_elem, NULL);

返回值i表示该chunk memory中填充的对象个数。mempool_add_elem实现如下:

点击(此处)折叠或打开

    static void
    mempool_add_elem(struct rte_mempool *mp, __rte_unused void *opaque,
             void *obj, rte_iova_t iova)
    {
        struct rte_mempool_objhdr *hdr;
        struct rte_mempool_objtlr *tlr __rte_unused;

        /* set mempool ptr in header */
        hdr = RTE_PTR_SUB(obj, sizeof(*hdr));
        hdr->mp = mp;
        hdr->iova = iova;
        STAILQ_INSERT_TAIL(&mp->elt_list, hdr, next);
        mp->populated_size++;

    #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
        hdr->cookie = RTE_MEMPOOL_HEADER_COOKIE2;
        tlr = rte_mempool_get_trailer(obj);
        tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
    #endif
    }

3.6 初始化pkt mbuf

调用rte_mempool_obj_iter()遍历rte_mempool中的所有对象,调用rte_pktmbuf_init()初始化每个对象,
遍历所有对象的接口:

点击(此处)折叠或打开

    uint32_t
    rte_mempool_obj_iter(struct rte_mempool *mp,
        rte_mempool_obj_cb_t *obj_cb, void *obj_cb_arg)
    {
        struct rte_mempool_objhdr *hdr;
        void *obj;
        unsigned n = 0;

        STAILQ_FOREACH(hdr, &mp->elt_list, next) {
            obj = (char *)hdr + sizeof(*hdr);
            obj_cb(mp, obj_cb_arg, obj, n);
            n++;
        }

        return n;
    }
初始化每个对象的接口:

点击(此处)折叠或打开

    void
    rte_pktmbuf_init(struct rte_mempool *mp,
             __rte_unused void *opaque_arg,
             void *_m,
             __rte_unused unsigned i)
    {
        struct rte_mbuf *m = _m;
        uint32_t mbuf_size, buf_len, priv_size;

        RTE_ASSERT(mp->private_data_size >=
             sizeof(struct rte_pktmbuf_pool_private));

        priv_size = rte_pktmbuf_priv_size(mp);
        mbuf_size = sizeof(struct rte_mbuf) + priv_size;
        buf_len = rte_pktmbuf_data_room_size(mp);

        RTE_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == priv_size);
        RTE_ASSERT(mp->elt_size >= mbuf_size);
        RTE_ASSERT(buf_len <= UINT16_MAX);

        memset(m, 0, mbuf_size);
        /* start of buffer is after mbuf structure and priv data */
        m->priv_size = priv_size;
        m->buf_addr = (char *)m + mbuf_size;
        m->buf_iova = rte_mempool_virt2iova(m) + mbuf_size;
        m->buf_len = (uint16_t)buf_len;

        /* keep some headroom between start of buffer and data */
        m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len);

        /* init some constant fields */
        m->pool = mp;
        m->nb_segs = 1;
        m->port = RTE_MBUF_PORT_INVALID;
        rte_mbuf_refcnt_set(m, 1);
        m->next = NULL;
    }
至此,一个rte_mempool的池子就建立完毕。

四、rte_mempool使用 

pktmbuf pool中的mbuf是供网口收包和应用发包使用的。
从内存池中申请一个原始的mbuf:

点击(此处)折叠或打开

    static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
申请接口内部会调用rte_mempool_get_bulk()从mp中批量获取n个mbuf(此处n为1,该接口支持批量申请,接口如下)。从本地core的cache中获取,不够则先从rte_ring中获取mbuf保存在本地cache中。

点击(此处)折叠或打开

    static __rte_always_inline int
    rte_mempool_get_bulk(struct rte_mempool *mp, void **obj_table, unsigned int n)
    {
        struct rte_mempool_cache *cache;
        cache = rte_mempool_default_cache(mp, rte_lcore_id());
        rte_mempool_trace_get_bulk(mp, obj_table, n, cache);
        return rte_mempool_generic_get(mp, obj_table, n, cache);
    }

    static __rte_always_inline int
    rte_mempool_generic_get(struct rte_mempool *mp, void **obj_table,
                unsigned int n, struct rte_mempool_cache *cache)
    {
        int ret;
        ret = rte_mempool_do_generic_get(mp, obj_table, n, cache);
        if (ret == 0)
            RTE_MEMPOOL_CHECK_COOKIES(mp, obj_table, n, 1);
        rte_mempool_trace_generic_get(mp, obj_table, n, cache);
        return ret;
    }

    static __rte_always_inline int
    rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
                 unsigned int n, struct rte_mempool_cache *cache)
    {
        int ret;
        uint32_t index, len;
        void **cache_objs;

        /* No cache provided or cannot be satisfied from cache */
        if (unlikely(cache == NULL || n >= cache->size))
            goto ring_dequeue;

        cache_objs = cache->objs;

        /* Can this be satisfied from the cache? */
        if (cache->len < n) {
            /* No. Backfill the cache first, and then fill from it */
            uint32_t req = n + (cache->size - cache->len);

            /* How many do we require i.e. number to fill the cache + the request */
            ret = rte_mempool_ops_dequeue_bulk(mp,
                &cache->objs[cache->len], req);
            if (unlikely(ret < 0)) {
                /*
                 * In the off chance that we are buffer constrained,
                 * where we are not able to allocate cache + n, go to
                 * the ring directly. If that fails, we are truly out of
                 * buffers.
                 */
                goto ring_dequeue;
            }

            cache->len += req;
        }

        /* Now fill in the response ... */
        for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++)
            *obj_table = cache_objs[len];

        cache->len -= n;

        RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
        RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);

        return 0;

    ring_dequeue:

        /* get remaining objects from ring */
        ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, n);

        if (ret < 0) {
            RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
            RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
        } else {
            RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
            RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
        }

        return ret;
    }
将一个mbuf放回到内存池:

点击(此处)折叠或打开

    void rte_mbuf_raw_free(struct rte_mbuf *m)
释放接口内部调用rte_mempool_put_bulk()将n个mbuf(此处n为1,该接口支持批量申请,接口如下)释放到内存池。先释放到本地core的cache,本地cache满且仍有多余则释放到rte_ring中。

点击(此处)折叠或打开

    static __rte_always_inline void
    rte_mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
             unsigned int n)
    {
        struct rte_mempool_cache *cache;
        cache = rte_mempool_default_cache(mp, rte_lcore_id());
        rte_mempool_trace_put_bulk(mp, obj_table, n, cache);
        rte_mempool_generic_put(mp, obj_table, n, cache);
    }

    static __rte_always_inline void
    rte_mempool_generic_put(struct rte_mempool *mp, void * const *obj_table,
                unsigned int n, struct rte_mempool_cache *cache)
    {
        rte_mempool_trace_generic_put(mp, obj_table, n, cache);
        RTE_MEMPOOL_CHECK_COOKIES(mp, obj_table, n, 0);
        rte_mempool_do_generic_put(mp, obj_table, n, cache);
    }

    static __rte_always_inline void
    rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
                 unsigned int n, struct rte_mempool_cache *cache)
    {
        void **cache_objs;

        /* increment stat now, adding in mempool always success */
        RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
        RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);

        /* No cache provided or if put would overflow mem allocated for cache */
        if (unlikely(cache == NULL || n > RTE_MEMPOOL_CACHE_MAX_SIZE))
            goto ring_enqueue;

        cache_objs = &cache->objs[cache->len];

        /*
         * The cache follows the following algorithm
         * 1. Add the objects to the cache
         * 2. Anything greater than the cache min value (if it crosses the
         * cache flush threshold) is flushed to the ring.
         */

        /* Add elements back into the cache */
        rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);

        cache->len += n;

        if (cache->len >= cache->flushthresh) {
            rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache->size],
                    cache->len - cache->size);
            cache->len = cache->size;
        }

        return;

    ring_enqueue:

        /* push remaining objects in ring */
    #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
        if (rte_mempool_ops_enqueue_bulk(mp, obj_table, n) < 0)
            rte_panic("cannot put objects in mempool\n");
    #else
        rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
    #endif
    }

五、rte_mempool信息查询

rte_mempool的状态信息查询接口rte_mempool_dump(FILE *f, struct rte_mempool *mp),支持dump如下信息:

点击(此处)折叠或打开

    void
    rte_mempool_dump(FILE *f, struct rte_mempool *mp)
    {
    #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
        struct rte_mempool_info info;
        struct rte_mempool_debug_stats sum;
        unsigned lcore_id;
    #endif
        struct rte_mempool_memhdr *memhdr;
        struct rte_mempool_ops *ops;
        unsigned common_count;
        unsigned cache_count;
        size_t mem_len = 0;

        RTE_ASSERT(f != NULL);
        RTE_ASSERT(mp != NULL);

        fprintf(f, "mempool <%s>@%p\n", mp->name, mp);
        fprintf(f, " flags=%x\n", mp->flags);
        fprintf(f, " socket_, mp->socket_id);
        fprintf(f, " pool=%p\n", mp->pool_data);
        fprintf(f, " iova=0x%" PRIx64 "\n", mp->mz->iova);
        fprintf(f, " nb_mem_chunks=%u\n", mp->nb_mem_chunks);
        fprintf(f, " size=%"PRIu32"\n", mp->size);
        fprintf(f, " populated_size=%"PRIu32"\n", mp->populated_size);
        fprintf(f, " header_size=%"PRIu32"\n", mp->header_size);
        fprintf(f, " elt_size=%"PRIu32"\n", mp->elt_size);
        fprintf(f, " trailer_size=%"PRIu32"\n", mp->trailer_size);
        fprintf(f, " total_obj_size=%"PRIu32"\n",
         mp->header_size + mp->elt_size + mp->trailer_size);

        fprintf(f, " private_data_size=%"PRIu32"\n", mp->private_data_size);

        fprintf(f, " ops_index=%d\n", mp->ops_index);
        ops = rte_mempool_get_ops(mp->ops_index);
        fprintf(f, " ops_name: <%s>\n", (ops != NULL) ? ops->name : "NA");

        STAILQ_FOREACH(memhdr, &mp->mem_list, next)
            mem_len += memhdr->len;
        if (mem_len != 0) {
            fprintf(f, " avg bytes/object=%#Lf\n",
                (long double)mem_len / mp->size);
        }

        cache_count = rte_mempool_dump_cache(f, mp);
        common_count = rte_mempool_ops_get_count(mp);
        if ((cache_count + common_count) > mp->size)
            common_count = mp->size - cache_count;
        fprintf(f, " common_pool_count=%u\n", common_count);

        /* sum and dump statistics */
    #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
        rte_mempool_ops_get_info(mp, &info);
        memset(&sum, 0, sizeof(sum));
        for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
            sum.put_bulk += mp->stats[lcore_id].put_bulk;
            sum.put_objs += mp->stats[lcore_id].put_objs;
            sum.put_common_pool_bulk += mp->stats[lcore_id].put_common_pool_bulk;
            sum.put_common_pool_objs += mp->stats[lcore_id].put_common_pool_objs;
            sum.get_common_pool_bulk += mp->stats[lcore_id].get_common_pool_bulk;
            sum.get_common_pool_objs += mp->stats[lcore_id].get_common_pool_objs;
            sum.get_success_bulk += mp->stats[lcore_id].get_success_bulk;
            sum.get_success_objs += mp->stats[lcore_id].get_success_objs;
            sum.get_fail_bulk += mp->stats[lcore_id].get_fail_bulk;
            sum.get_fail_objs += mp->stats[lcore_id].get_fail_objs;
            sum.get_success_blks += mp->stats[lcore_id].get_success_blks;
            sum.get_fail_blks += mp->stats[lcore_id].get_fail_blks;
        }
        fprintf(f, " stats:\n");
        fprintf(f, " put_bulk=%"PRIu64"\n", sum.put_bulk);
        fprintf(f, " put_objs=%"PRIu64"\n", sum.put_objs);
        fprintf(f, " put_common_pool_bulk=%"PRIu64"\n", sum.put_common_pool_bulk);
        fprintf(f, " put_common_pool_objs=%"PRIu64"\n", sum.put_common_pool_objs);
        fprintf(f, " get_common_pool_bulk=%"PRIu64"\n", sum.get_common_pool_bulk);
        fprintf(f, " get_common_pool_objs=%"PRIu64"\n", sum.get_common_pool_objs);
        fprintf(f, " get_success_bulk=%"PRIu64"\n", sum.get_success_bulk);
        fprintf(f, " get_success_objs=%"PRIu64"\n", sum.get_success_objs);
        fprintf(f, " get_fail_bulk=%"PRIu64"\n", sum.get_fail_bulk);
        fprintf(f, " get_fail_objs=%"PRIu64"\n", sum.get_fail_objs);
        if (info.contig_block_size > 0) {
            fprintf(f, " get_success_blks=%"PRIu64"\n",
                sum.get_success_blks);
            fprintf(f, " get_fail_blks=%"PRIu64"\n", sum.get_fail_blks);
        }
    #else
        fprintf(f, " no statistics available\n");
    #endif

        rte_mempool_audit(mp);
    }
rte_mempool中有一些统计信息,保存在mp->stats,值得关注,它是通过RTE_LIBRTE_MEMPOOL_DEBUG控制的,一般不会打开。
rte_mempool库中还有两个很有用的接口:rte_mempool_dump_cache(FILE *f, const struct rte_mempool *mp)得到指定内存池中每个本地core的cache中可用的对象个数。
和rte_mempool_ops_get_count(const struct rte_mempool *mp)得到rte_ring中可用的对象个数。
再根据mp->size和以上连个值,可以计算得到应用中的mbuf使用中的个数,有些产品业务中很关注些指标。



来源url
栏目
文章分类