ããã«ã¡ã¯ããµã¤ãã¦ãºã»ã©ãã®å æã§ãã
ä»åã¯å°æ°ãæ´æ°ã«ä¸¸ããå¦çã«é¢ãã¦ãx86/x64ã«ãããå½ä»¤ãã©ã®ããã«å¤ãã£ã¦ããããç´¹ä»ãã¾ãã
C++ã«ãããå°æ°ããæ´æ°ã¸ã®å¤æã«ã¼ã«
ã¾ãC++ã«ãããæµ®åå°æ°ç¹æ°åï¼float, doubleï¼ãæ´æ°åï¼int, int64_tãªã©ï¼ã«ä¸¸ããã«ã¼ã«ããããããã¾ãããã floating-integral conversionsã«ããã¨ãã®å¤æã§ã¯å°æ°ç¹é¨åãåãé¤ãã¾ãã ã¤ã¾ã1.5, 2.3, -2.9ãintã«ãã£ã¹ãããã¨ãããã1, 2, -2ã«ãªãã¾ãã
ãªãæ´æ°åã«å ¥ããããªãã¨ãã®æåã¯æªå®ç¾©ã§ãã
4種é¡ã®ä¸¸ãè¦å
x86ã®æµ®åå°æ°ç¹æ°ãæ±ãFPUã¯ä¸¸ãå¦çã®ã¢ã¼ãã4種é¡æã¡ã¾ãã ããã¯IEEEæ¨æº754ã®ä¸¸ãã¢ã¼ãã®è¦åã«å¾ã£ããã®ã§ãã
- æè¿æ¥ä¸¸ãï¼round to nearestï¼evenï¼ : RNï¼
- åãæ¨ã¦ï¼round down toward -â : RDï¼
- åãä¸ãï¼round up toward +â : RUï¼
- 0ã¸ã®ä¸¸ãï¼round toward zeroï¼truncateï¼ : RZï¼
ããããã®å¦çãå³ã«ãã¦ã¿ã¾ãããç¢å°ã§ç¤ºãããåºéå ã®å¤ãç¢å°ã®å ã®æ´æ°ã«ä¸¸ãããã¾ããé»ä¸¸âã¯ãã®ç¹ãå«ã¿ãç½ä¸¸âã¯å«ã¾ãªããã¨ãæå³ãã¾ãã
åãæ¨ã¦ã¯è² ã®ç¡é大æ¹åã«åãæ¨ã¦ããåãä¸ãã¯æ£ã®ç¡é大æ¹åã«åãä¸ãã¾ãã ããããfloor()é¢æ°ãceil()é¢æ°ã«ç¸å½ãã¾ãã 0ã¸ã®ä¸¸ãã¯0ã«åããæ¹åã«åãæ¨ã¦ã¾ããdoubleããintã«ãã£ã¹ãããã¨ãã®ã«ã¼ã«ã¯ããã«ç¸å½ãã¾ãã
æè¿æ¥ä¸¸ãã¯è¥å¹²é¦´æã¿ãç¡ãããããã¾ããã é常ãã使ãããåæ¨äºå ¥ã§ã¯1.5, 2.5, 3.5ã¯å ¨ã¦2, 3, 4ã¨åãä¸ãã¾ãã ãããããã®æ¹æ³ã ã¨å¤æ°ã®å¤ãå¦çããã¨ãã«å ¨ä½ã¨ãã¦æ£ã®æ¹åã«å¤§ãããªãå¯è½æ§ãããã¾ãã ããã§ç«¯æ°ã0.5ã®ã¨ãã¯å¶æ°ã«åããæ¹åã«å¦çãã¾ããã¤ã¾ã0.5ã¯0ã«ã1.5ã¯2ã«ã2.5ã¯2ã«ãªãã¾ãã ããããã¨å¤æ°ã®ã©ã³ãã ãªå¤ã«å¯¾ãã¦ã¯åãä¸ãã¨åãä¸ããåçã«ãªãã¾ãã
丸ãã¢ã¼ãã®ä¸ã«åæ¨äºå ¥ã¯ããã¾ããã ããã¯xã0以ä¸ãªã0.5ã足ããxãè² ãªã0.5ãå¼ãã¦0ã¸ä¸¸ããã°ããããã§ãã
ã³ã¼ãã¨ãã¦ã¯
double round(double x) { double c = (x >= 0) ? 0.5 : -0.5; return (double)(int)(x + c); }
ã§å®ç¾ã§ãã¾ãã
æã®x86ã«ãããæ´æ°åã¸ã®å¤ææ¹æ³
FPUã®ä¸¸ãã¢ã¼ãã¯ãé常æè¿æ¥ä¸¸ãã«è¨å®ããã¦ãã¾ãã ããããã¨doubleãintã«ãã£ã¹ãããã«ã¯ã¢ã¼ããå¤æ´ããªãã¦ã¯ãªãã¾ããã ããããããããã°ã©ã å®è¡ä¸ã«ã©ããã§ã¢ã¼ããå¤æ´ããã¦ãããããããªãã®ã§ã
- ç¾å¨ã®ä¸¸ãã¢ã¼ããåå¾ãã¦ä¿æãã
- 丸ãã¢ã¼ããRZã«å¤æ´ãã
- fistå½ä»¤ã§doubleâintãã£ã¹ããè¡ã
- 丸ãã¢ã¼ãã1.ã§åå¾ããã¢ã¼ãã«æ»ã
ã¨ããå¦çãå¿ è¦ã«ãªãã¾ãã FPUã®ã¢ã¼ããå¤æ´ããã®ã¯ãªããªãã³ã¹ãã大ãããå¤æ°ã®å°æ°ãæ´æ°ã«å¤æããéã«ã¯ãããããã«ããã¯ã«ãªããã¨ãããã¾ããã ãã®ç¶æ³ã¯ãã¯ãã«æ¼ç®å½ä»¤ã追å ãããSSE, SSE2ã§ãå¤ããã¾ããã§ããï¼cvtsd2si ï½2004å¹´ï¼ã
2017/8/21å çãã丸ãã¢ã¼ããå¤æ´ã»ã«ä¾åããã«0åãæ¨ã¦ãè¡ãææ³ã¯é ãã¨ã1997å¹´ã«ã¯åå¨ãããã¨ãããææããã£ãã®ã§æçµæ®µã«ãPentium4ã§ã®æ¹è¯ç¹ãã®æ®µãå çãã¾ããã
0ã¸ã®ä¸¸ãå½ä»¤ã®è¿½å
ããã§Intelã¯SSE3ã§0ã¸ã®ä¸¸ãå°ç¨å½ä»¤ï¼fisttp, cvttsd2siãªã©ï¼ã追å ãã¾ãï¼2004å¹´ï¼ã ããã¯FPU/SSEã®å¶å¾¡ã¬ã¸ã¹ã¿ã«é¢ããã常ã«0ã¸ã®ä¸¸ããè¡ãå½ä»¤ã§ãã ãã®ããdoubleããintã¸ã®ãã£ã¹ãã丸ãã¢ã¼ããå¤æ´ããã«å®ç¾å¯è½ã¨ãªããå¦çæ§è½ãåä¸ãã¾ããã
åãæ¨ã¦ãåãä¸ã対å¿å½ä»¤ã®è¿½å
intã¸ã®ãã£ã¹ãã¯é«éã«å¦çã§ããããã«ãªã£ã¦ãã¾ããããfloor()ãceil()é¢æ°ã®å®è£ ã§ã¯å¾æ¥ã®intã¸ã®ãã£ã¹ãåæ§ã丸ãã¢ã¼ããå¤æ´ãããããããã¯ç¬¦å·ãèæ ®ãã¤ã¤cvttsd2siããã¾ã使ãå¿ è¦ãããã¾ããã ããã§ä¸¸ãã¢ã¼ããèªç±ã«è¨å®ã§ããroundsdãªã©ã®å½ä»¤ãSSE4.1ã§è¿½å ããã¾ããï¼2007å¹´ï¼ã
roundsd xmm1, xmm2/mem, imm8 vroundsd xmm1, xmm2, xmm3/mem, imm8
vroundsdã¯AVXã§è¿½å ãããå½ä»¤ã§ãï¼2011å¹´ï¼ã imm8ã§ä¸¸ãã¢ã¼ããæå®ã§ãã¾ãã
ãã¨ãã°åè¿°ã®roundé¢æ°ã¯æ¬¡ã®ããã«å®è£ ã§ãã¾ãã æµ®åå°æ°ç¹æ°ã®ç¬¦å·ã¯æä¸ä½ãããã«ããã®ã§æ¡ä»¶åå²ã使ããã«å®è£ ãã¦ãã¾ãã
// xmm0ã«xãå ¥ã£ã¦ããã¨ãã vandpd xmm1, xmm0, ptr [const1] // xã®ç¬¦å·ããããåãåºã vorpd xmm1, xmm1, ptr [const2] // xmm1 = (x >= 0) ? 0.5 : -0.5 vaddsd xmm0, xmm0, xmm1 // xmm0 += xmm1 vroundsd xmm0, xmm0, 3 // truncate const1: // æä¸ä½ãããã符å·ããã dd 0x00000000 dd 0x80000000 dd 0, 0 const2: // double(0.5)ã®ããããã¿ã¼ã³ dd 0x00000000 dd 0x3fe00000 dd 0, 0
ãã floorãceilã¯æ´æ°ã欲ããã¨ãã«ä½¿ããã¨ãå¤ãã®ã«roundsdå½ä»¤ã¯çµæãæµ®åå°æ°ç¹æ°åãªã®ã§, æ´ã«ããä¸åº¦cvtsd2siå½ä»¤ã使ã£ã¦æ´æ°ã¬ã¸ã¹ã¿ã«å¤æããªããã°ãªãã¾ãããåé·ãªæãããã¾ãã
AVX-512å½ä»¤
2016å¹´Intelã¯AVX-512対å¿CPUãçºè¡¨ãã¾ããã å¾æ¥ã®256ãããSIMDå½ä»¤ã ã£ãAVX-2ã2åã«æ¡å¼µãããã¨ã§512ãããSIMDå½ä»¤ã使ããããã«ãªãã¾ããã åæã«ã»ã¨ãã©ã®æ¼ç®å½ä»¤ã«åå¥ã«ä¸¸ãã¢ã¼ããæå®ã§ããããã«ãªã£ã¦ãã¾ãã
å¾æ¥ã®vcvtsd2siãæ¡å¼µããã¦ãã¾ãããããã£ã¦ãã1å½ä»¤ã§doubleããintã¸ã®åãä¸ããåãæ¨ã¦ã0ã¸ã®ä¸¸ããå¯è½ã«ãªãã¾ããã
ã¢ã»ã³ããªè¨èªã§ã¯
vcvtsd2si eax, xmm0, {rn-sae}
ã®ããã«æå®ãã¾ãã", {rn-sae}“ã®ã«ã³ãã丸æ¬å¼§ãæ°æã¡æªãï¼ã¨ããããã¼ã¹ãã«ããï¼ã§ãããã®ããã«æå®ãã¾ããsaeã¯Suppress All Exceptionsã®ç¥ã§å°æ°æ¼ç®ä¾å¤ãæå¶ãã¾ãã
対å¿ããintrinsicå½ä»¤ã¯ _mm_cvt_roundsd_i64(__m128d, int r)ã§ãã
ãã æå ã§å®é¨ããéããNASM 2.14rc0ã§ã¢ã»ã³ãã«ããã¨{rn-sae}ãç¡è¦ããã³ã¼ãçæãããããã§ãï¼ãã°? å ±åãã¦ã¿ã¾ããï¼ã intrinsicé¢æ°ã«ã¤ãã¦ã¯Visual Studio 2017ã¯é対å¿ã§ãgcc-7.1ãclang-4ã§ã³ã³ãã¤ã«ã¨ã©ã¼ã«ãªãã¾ããï¼ããããããç§ã®ä½¿ãæ¹ãæªãã®ããããã¾ããâ2017/8/16 ãintrinsicé¢æ°ã§ã®ä½¿ãæ¹ãã§å çï¼ã
æä½ã®Xbyakã¯AVX-512ã«å¯¾å¿ãã¦ãããã¨ãã°vcvtsd2si(eax, xmm0 | T_rn_sae);ã¨è¨è¿°ã§ãã¾ãï¼ãµã³ãã«ã³ã¼ã round.cppï¼ã AVX-512対å¿CPUãæã£ã¦ããªã人ã§ãIntel Software Development Emulatorã使ãã°æ£ããåãã¦ãããã¨ã確èªã§ãã¾ãã
sde -- round.exe
intrinsicé¢æ°ã§ã®ä½¿ãæ¹
_mm_cvt_roundsd_i64(__m128d, int r)
ã®rã«æå®ãã丸ãã¢ã¼ãã¯RN, RD, RU, RZã«å¯¾å¿ãã0ï½3ã®æ´æ°ã§ã¯ãªããããã¨8ï¼SAEï¼ã¨ã®orãã¨ãå¿
è¦ãããã¾ããã_mm_round_pd
ãªã0ï½3ãæå®ã§ããã®ã§ç§ã¯ããã¨åãã ã¨æãè¾¼ãã§ãã¾ããã
int x = _mm_cvt_roundsd_i64(m1, 0); // error: incorrect rounding operand int y = _mm_cvt_roundsd_i64(m2, 0 | 8); // ok
0, 1, 2, 3, 8ã¯ãããã§ãããã
#define _MM_FROUND_TO_NEAREST_INT 0 #define _MM_FROUND_TO_NEG_INF 1 #define _MM_FROUND_TO_POS_INF 2 #define _MM_FROUND_TO_ZERO 3 #define _MM_FROUND_NO_EXC 8
ã¨å®ç¾©ããã¦ããã®ã§ãããã®ãã¯ããå©ç¨ãã¦ãããã§ãããã
Pentium4ã§ã®æ¹è¯ç¹
æã®è³æãæ¢ãã¦1997å¹´ã®Intelã®æé©åããã¥ã¢ã«ã«ã丸ãã¢ã¼ãã®å¤æ´ãé¿ããã¢ã«ã´ãªãºã ããè¼ã£ã¦ãããã¨ã確èªãã¾ãããæ å ±ãããã¨ããããã¾ãã ãã è¨äºã§ã¯çç¥ããã®ã§ãã2000å¹´ã«ç»å ´ããPentium 4ã§ã¯ä¸¸ãã¢ã¼ãï¼ï¼Î±ï¼ã®ã¿ãå¤æ´ããã¨ãã«éãfldcwãé«éåããã¦ãã¾ãã ãã®ããããã¥ã¢ã«ã«ãæ¸ããã¦ãã¾ãã2種é¡ã®ä¸¸ãã¢ã¼ãéã§åãæ¿ããã ãã®ã¨ãã¯ä¸¸ãã¢ã¼ããå¤æ´ããæ¹ãéããã§ãï¼æªç¢ºèªï¼ã
ã¾ã¨ã
x86/x64ã«ãããæµ®åå°æ°ç¹æ°ããæ´æ°ã¸ã®ä¸¸ãæ¹æ³ã«ã¤ãã¦å¾æ¥ã®æ¹æ³ããï¼2017å¹´ã«ãããï¼ææ°ã®ããã»ããµåãã®æ¹æ³ã«ã¤ãã¦ç´¹ä»ãã¾ããã å°å³ãªä¸åãã®å½ä»¤ã§ã¯ããã¾ãããä»å¾å¯¾å¿ãããã¼ã«ãå¢ãã¦ããã§ãããã