阿福的技术BLOG: 测试伪共享对性能的影响

伪共享(false sharing)的知识需要了解的话，请google之。
下面是测试代码：两个线程分别计算5亿次累加两个相邻变量的时间消耗。
//test_false_sharing_1.cpp

#include <stdio.h>
#include <sys/time.h>
#include <time.h>
#include <pthread.h>

#define PACK __attribute__ ((packed))
typedef int cache_line_int __attribute__((aligned(LEVEL1_DCACHE_LINESIZE)));

struct data
{
int a;
int b;
};

#define MAX_NUM 500000000

void* thread_func_1(void* param)
{
timeval start, end;
gettimeofday(&start, NULL);
data* d = (data*)param;
for (int i=0; i<MAX_NUM; ++i)
{
++d->a;
}
gettimeofday(&end, NULL);
printf("thread 1, time=%d\n", (int)(end.tv_sec-start.tv_sec)*1000000+(int)(end.tv_usec-start.tv_usec));
return NULL;
}

void* thread_func_2(void* param)
{
timeval start, end;
gettimeofday(&start, NULL);
data* d = (data*)param;
for (int i=0; i<MAX_NUM; ++i)
{
++d->b;
}
gettimeofday(&end, NULL);
printf("thread 2, time=%d\n", (int)(end.tv_sec-start.tv_sec)*1000000+(int)(end.tv_usec-start.tv_usec));
return NULL;
}

int main()
{
data d = {a:0, b:0};
pthread_t t1, t2;
pthread_create(&t1, NULL, thread_func_1, &d);
pthread_create(&t2, NULL, thread_func_2, &d);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("end, a=%d,b=%d\n", d.a, d.b);
return 0;
}

/*
g++ -o test_false_sharing_1 test_false_sharing_1.cpp -g -Wall -O2
*/

----------------------------------------------

执行后输出：

thread 1, time=4121562
thread 2, time=4329193

把以上程序稍稍修改：

struct data

{

cache_line_int a;

cache_line_int b;

};

//struct中的int修改为按照cache_line对齐的int

然后酱紫编译：

g++ -o test_false_sharing_2 test_false_sharing_2.cpp -g -Wall -lpthread -DLEVEL1_DCACHE_LINESIZE=`getconf LEVEL1_DCACHE_LINESIZE`

执行后输出：

thread 1, time=1607430
thread 2, time=1629508

性能提高了2.5倍。

-------------------------------------------------

测试中注意两点：

1. int重新对齐的定义后，在struct中不要在定义对齐的属性，否则之前的对齐属性会失效；

2. 采用getconf LEVEL1_DCACHE_LINESIZE这样的命令获得cache line的大小；
3. 编译中不能加上-O2，否则编译器计算会导致瞬间出结果；（这个优化真是强大啊）

阿福的技术BLOG

2012年12月10日星期一

测试伪共享对性能的影响

没有评论:

发表评论