åãã«
ãµã¤ãã¦ãºã»ã©ãã®å æã§ãã
C++ã§å精度é
åã«å¯¾ããææ°é¢æ°ã®ãã¯ãã«åãAVX-512ã使ã£ã¦å®è£
ãã¾ããã
æ¨æºé¢æ°std::exp(float)
ã«å¯¾ããç¸å¯¾èª¤å·®ã¯2e-6ãé度ã¯10åãããã§ãã
ææ°é¢æ°ãã©ããã£ã¦è¨ç®ããã®ããä¸è¬çãªè©±ã¨AVX-512ã«ç¹æã®é¨åãç´¹ä»ãã¾ãã
æ³å®èªè
C++ã¨x64(x86-64)ã®ã¢ã»ã³ããªè¨èªã®ç¥èãå¤å°ä»®å®ãã¾ããã ãªãã¹ãå°ãªãåæç¥èã§èªããããã«å¿ããã¾ãã ããç¨åº¦ç¥èã®ããæ¹ã¯è¿ä¼¼è¨ç®ããèªã¿å§ãã¦ãã¾ãã¾ããã
- åãã«
- æ³å®èªè
- å®è¡ç°å¢
- ãã³ããã¼ã¯
- exp(x)ã®æ§è³ª
- è¨ç®ã®ç¯å²
- è¿ä¼¼è¨ç®
- ã¢ã«ã´ãªãºã
- AVX-512ã§ã®å®è£
- floatã®ãã©ã¼ããã
- floatããintã¸ã®å¤æ
- 端æ°å¦ç
- ä¿æ°ã®æ±ºãæ¹
- ã¾ã¨ã
å®è¡ç°å¢
AVX-512ã使ããç°å¢ã¨x64ç¨C++ã³ã³ãã¤ã©ãå¿ è¦ã§ãã
ã³ã¼ãã¯herumi/fmathã«ããã¾ãã ã³ã³ãã¤ã«ã«xbyakãå¿ è¦ãªã®ã§ãã¦ã³ãã¼ããã¦é©å®includeãã¹ãæå®ãã¦ãã ããã
fmath2
åå空éã§
void expf_v(float *dst, const float *src, size_t n);
ãå®ç¾©ããã¦ãã¾ããnåã®floatã®é åsrcã®expãdstã«æ ¼ç´ãã¾ãã
for (size_t i = 0; i < n; i++) { dst[i] = std::exp(src[i]); }
ã¨(誤差ãé¤ãã¦)ç価ã§ãã
ãã³ããã¼ã¯
é度
ãã³ããã¼ã¯ã¯exp_v.cppã§è¡ãã¾ããã
float x[3000];
ã«å¯¾ãã¦expãæ±ããè¨ç®ãstd::expã¨expf_vã¨ã§æ¯è¼ãã¾ããã
ç°å¢ã¯OS : Ubuntu 19.10, CPU : Xeon Platinum 8280 2.7GHz, compiler : gcc-9.2.1 -Ofastã§ãã
é¢æ° | std::exp | expf_v |
---|---|---|
æé(clk) | 22.6K | 1.8K |
clkã¯rdtscã«ããCPUã¯ããã¯ã®è¨æ¸¬ã§ãæ¦ã10å以ä¸é«éåããã¦ãã¾ãã
誤差
ç¸å¯¾èª¤å·®ã(çã®å¤ - å®è£
å¤) / çã®å¤ã¨ãã¾ãã
std::exp(float x)
ãçã®å¤ã¨ãã¦x = -30ãã30ã¾ã§1e-5ãã¤å¢ããã¦è¨ç®ããå¤ã®ç¸å¯¾èª¤å·®ã®å¹³åãåºãã¨2e-6ã¨ãªãã¾ããã
exp(x)ã®æ§è³ª
ææ°é¢æ°exp(x)ã¯e = 2.71828...ã®å·¾ä¹exp(x) = exã¨ããé¢æ°ã§ãã
æ°å¦çã«ã¯
exp(x) = 1 + x + x2/2! + x3/3! + ... + xn/n! + ...
ã¨å®ç¾©ããã¦ãã¾ãã xã¯ãã¤ãã¹ç¡é大ãããã©ã¹ç¡é大ã¾ã§ã¨ããã¾ãã
ããã§!
ã¯éä¹ã®è¨å·ã§4! = 4 * 3 * 2 * 1
ã§ãã
å®ç¾©ã¯ç¡éåã®å¤ã®åã§ãããã³ã³ãã¥ã¼ã¿ã§ã¯ãã¡ããéä¸ã§æã¡åã£ã¦æéåã§è¿ä¼¼è¨ç®ãã¾ãã
xã®çµ¶å¯¾å¤ã1ããå°ããã¨ãã¯xnã¯ã¨ã¦ãå°ãããæ´ã«n!ã§å²ãã®ã§ãã£ã¨å°ãããªãã¾ãã ãããã£ã¦å°ãªãåã§æã¡åã£ã¦ã誤差ã¯å°ããã¦ãã¿ã¾ãã ãããxã1ãã大ããã¨xnã¯ã¨ã¦ã大ãããªããæã¡åã誤差ã大ãããªãã¾ãã ã©ã®ããã«ãã¦èª¤å·®ãå°ãããããããã¤ã³ãã§ãã
y = axã®éé¢æ°ãx = log_a(y)ã¨æ¸ãã¾ãã ç¹ã«a = eã®ã¨ãlog_e(y) = log(y)ã¨çç¥ãã¾ãã
- ax+y = ax ay
- log(xy) = y log(x)
- xy = zy log_z(x) ; åºã®å¤æå ¬å¼
ãªã©ãæãç«ã¡ã¾ãã
è¨ç®ã®ç¯å²
ã¾ãxã®ã¨ãå¾ãç¯å²ã調ã¹ã¾ãããã xã®åã¯floatã§ã(ããã§ã¯x64ã対象ãªã®ã§floatã¯32bitæµ®åå°æ°ç¹æ°ã¨ãã¾ã)ã 調ã¹ã¦ã¿ãã¨x = -87.3ããå°ããã¨floatã§æ£ããæ±ããæå°ã®æ°FLT_MIN=1.17e-38ããå°ããã éã«x = 88.72ãã大ããã¨æ大ã®æ°FLT_MAX=3.4e38ããã大ãããªã£ã¦infã«ãªãã¾ãã å¾ã£ã¦xã¯-87.3 <= x <= 88.72ã¨ãã¦ããã§ãããã
è¿ä¼¼è¨ç®
2ã®æ´æ°å·¾ä¹2nã¯ãããã·ããã使ã£ã¦é«éã«è¨ç®ã§ãã¾ãã ãããã£ã¦exp(x) = exãã2ã®æ´æ°å·¾ãä½ãåºããã¨ãèãã¾ãã
åºã®å¤æå ¬å¼ã使ã£ã¦
ex = 2x log_2(e)
ã¨å¤å½¢ããx'=x log_2(e)ãæ´æ°é¨ånã¨å°æ°é¨åaã«åå²ãã¾ãã
x' = n + a (|a| <= 0.5)
ããããã¨ex = 2n à 2aã§ãã
2nã¯ãããã·ããã§è¨ç®ã§ããã®ã§æ®ãã¯2aã®è¨ç®ã§ãã ããã§å度åºã®å¤æããã¾ãã
2a = ea log(2) = eb ; b = a log(2)ã¨ãã
|a| <= 0.5ã§log(2) = 0.693ãªã®ã§|b| = |a log(2)| <= 0.346.
bã0ã«è¿ãå¤ãªã®ã§ebãåé ã®ç´æ°å±éã使ã£ã¦è¿ä¼¼è¨ç®ãã¾ãã
6次ã®é ã¯0.3466/6! = 2.4e-6ã¨floatã®å解è½ã«è¿ãã®ã§5次ã§åãã¾ãããã
eb = 1 + b + b2/2! + b3/3! + b4/4! + b5/5!
ã¢ã«ã´ãªãºã
çµå±æ¬¡ã®ã¢ã«ã´ãªãºã ãæ¡ç¨ãã¾ãã
input : x output : e^x 1. x = max(min(x, expMax), expMin) 2. x = x * log_2(e) 3. n = round(x) ; åæ¨äºå ¥ 4. a = x - n 5. b = a * log(2) 6. z = 1 + b(1 + b(1/2! + b(1/3! + b(1/4! + b/5!)))) 7. w = 1 << n 8. return z * w
æå¤ã¨çãã§ããã
AVX-512ã§ã®å®è£
AVX-512ã¯int32_t, int8_t, double, floatãªã©æ§ã ãªåã®ãã¾ã¨ãã¦å¦çããå½ä»¤ã»ããã®ååã§ãã ä¸ã¤ã®ã¬ã¸ã¹ã¿ã512ãããããã®ã§floatãªã512/32 = 16åã¾ã¨ãã¦å¦çã§ãã¾ãã ã¬ã¸ã¹ã¿ã¯zmm0ããzmm31ã¾ã§32åå©ç¨ã§ãã¾ãã
AVX-512ã®å½ä»¤æ¦ç¥
ãã®è¨äºã«ç»å ´ããAVX-512ç¨ã®å½ä»¤ãã¾ã¨ãã¦ããã¾ãã 詳細ã¯Intel64 and IA-32 Architectures Software Developer Manualsãåç §ãã ããã
ã¢ã»ã³ããªè¨èªã®è¡¨è¨ã¯Intelå½¢å¼ã§ããªãã©ã³ãã¯dst, src1, src2ã®é åºã§ãã dst, srcã¯dst, dst, srcã®çç¥è¨æ³ã§ãã
å½ä»¤ | æå³ | 注é |
---|---|---|
vmovaps [mem], zmm0 vmovaps zmm0, [mem] |
[mem] = zmm0 zmm0 = [mem] |
memã¯16ã®åæ°ã®ã¡ã¢ãªã¢ãã¬ã¹ã§ããã㨠|
vmovups [mem], zmm0 vmovaps zmm0, [mem] |
[mem] = zmm0 zmm0 = [mem] |
memã®å¶ç´ã¯ãªã |
vaddps zmm0, zmm1, zmm2 | zmm0 = zmm1 + zmm2 | floatã¨ã㦠vsubps, vmulpsãªã©ãåæ§ |
vminps zmm0, zmm1, zmm2 | zmm0 = min(zmm1, zmm2) | floatã¨ã㦠|
vpaddd zmm0, zmm1, zmm2 | zmm0 = zmm1 + zmm2 | uint32_tã¨ã㦠|
vpslld zmm0, zmm1, imm | zmm0 = zmm1 << imm | uint32_tã¨ã㦠|
vfmadd213ps zmm0, zmm1, zmm2 | zmm0 = zmm0 * zmm1 + zmm2 | floatã¨ã㦠|
vpbroadcastd zmm0, eax | eaxã16ååzmm0ã«ã³ãã¼ãã | |
vcvtps2dq zmm0, zmm1 | zmm0 = round(zmm0) | çµæã¯intå |
vcvtdq2ps zmm0, zmm1 | zmm0 = float(zmm0) | çµæã¯floatå |
vrndscaleps zmm0, zmm1, 0 | zmm0 = round(zmm1) | çµæã¯floatå |
åæå
// exp_v(float *dst, const float *src, size_t n); void genExp(const Xbyak::Label& expDataL) { const int keepRegN = 7; using namespace Xbyak; util::StackFrame sf(this, 3, util::UseRCX, 64 * keepRegN);
StackFrameã¯é¢æ°ã®ãããã¼ã°ãçæããã¯ã©ã¹ã§ãã 3ã¯å¼æ°ã3åãUseRCXã¯rcxã¬ã¸ã¹ã¿ãæ示çã«ä½¿ãæå®ã zmmã¬ã¸ã¹ã¿ã®ä¿åã®ãã64 * keepRegN byteã¹ã¿ãã¯ã確ä¿ãã¾ãã
const Reg64& dst = sf.p[0]; const Reg64& src = sf.p[1]; const Reg64& n = sf.p[2];
StackFrameã¯ã©ã¹ã®sf.p[i]ã§é¢æ°ã®å¼æ°ã®içªç®ã®ã¬ã¸ã¹ã¿ã表ãã¾ãã Windowsã¨Linuxã¨ã§å¼æ°ã®ã¬ã¸ã¹ã¿ãç°ãªãã®ã§ããã§å¸åãã¾ãã
// prolog #ifdef XBYAK64_WIN vmovups(ptr[rsp + 64 * 0], zm6); vmovups(ptr[rsp + 64 * 1], zm7); #endif for (int i = 2; i < keepRegN; i++) { vmovups(ptr[rsp + 64 * i], Zmm(i + 6)); }
AVX-512ã®Zmmã¬ã¸ã¹ã¿ãä¿åãã¾ãã é¢æ°å ã§Windowsã§ã¯zmm6以éãå©ç¨ããå ´åã¯ä¿åããå¿ è¦ãããã¾ãã
// setup constant const Zmm& i127 = zmm3; const Zmm& expMin = zmm4; const Zmm& expMax = zmm5; const Zmm& log2 = zmm6; const Zmm& log2_e = zmm7; const Zmm expCoeff[] = { zmm8, zmm9, zmm10, zmm11, zmm12 }; mov(eax, 127); vpbroadcastd(i127, eax); vpbroadcastd(expMin, ptr[rip + expDataL + (int)offsetof(ConstVar, expMin)]); ...
å種å®æ°ãã¬ã¸ã¹ã¿ã«ã»ãããã¾ãã vpbroadcastdã¯floatå¤æ°1åã32åZmmã¬ã¸ã¹ã¿ã«ã³ãã¼ããå½ä»¤ã§ãã
vpbroadcastd(expMin, ptr[rip + expDataL + (int)offsetof(ConstVar, expMin)]);
ã¯Xbyakç¹æã®æ¸ãæ¹ã§ãã Labelã¯ã©ã¹expDataLã¯å種å®æ°(ConstVarã¯ã©ã¹)ãç½®ããã¦ããå é ã¢ãã¬ã¹ãæãã¾ãã ripã§ç¸å¯¾ã¢ãã¬ã¹ãå©ç¨ããCã®offsetofãã¯ãã§ã¯ã©ã¹ã¡ã³ãã®ãªãã»ããå¤ãå ç®ãã¾ãã
ã¡ã¤ã³ã«ã¼ã
vminps(zm0, expMax); // x = min(x, expMax) vmaxps(zm0, expMin); // x = max(x, expMin) vmulps(zm0, log2_e); // x *= log_2(e) vcvtps2dq(zm1, zm0); // zm1 = n = round(zm0) vcvtdq2ps(zm2, zm1); // zm2 = float(zm1) vsubps(zm0, zm2); // a = x - n vmulps(zm0, log2); // a *= log2
ã¢ã«ã´ãªãºã ã®1ãã5è¡ç®ã«å¯¾å¿ãã¾ãã vminps, vmaxpsã§å ¥åå¤ã[expMin, expMax]ã®ç¯å²å ã«ã¯ãªããã³ã°ãã¾ãã vmulpsã§log_2(e)åãvcvtps2dqã§æ´æ°ã¸æè¿ä¼¼ä¸¸ã(round)ãã¾ãã çµæã¯intåã«ãªãã®ã§ãããvcvtdq2psã§floatåã«æ»ãã¾ãã
vmovaps(zm2, expCoeff[4]); // 1/5! vfmadd213ps(zm2, zm0, expCoeff[3]); // b * (1/5!) + 1/4! vfmadd213ps(zm2, zm0, expCoeff[2]); // b(b/5! + 1/4!) + 1/3! vfmadd213ps(zm2, zm0, expCoeff[1]); // b(b(b/5! + 1/4!) + 1/3!) + 1/2! vfmadd213ps(zm2, zm0, expCoeff[0]); // b(b(b(b/5! + 1/4!) + 1/3!) + 1/2!) + 1 vfmadd213ps(zm2, zm0, expCoeff[0]); // b(b(b(b(b/5! + 1/4!) + 1/3!) + 1/2!) + 1) + 1
ã¢ã«ã´ãªãºã ã®6è¡ç®ã«å¯¾å¿ãã¾ãã vfmadd213ps(x, y, z)ã¯ç©åæ¼ç®å½ä»¤ã§ãx = x * y + zãå®è¡ãã¾ãã
// zm1 = n vpaddd(zm1, zm1, i127); vpslld(zm1, zm1, 23); // 2^n
ã¢ã«ã´ãªãºã ã®7è¡ç®ã«å¯¾å¿ãã¾ãã ããã¯ã¡ãã£ã¨ãããã«ããã®ã§æ¬¡ç¯ã§è§£èª¬ãã¾ãã
vmulps(zm0, zm2, zm1);
ã¢ã«ã´ãªãºã ã®8è¡ç®ã«å¯¾å¿ãã¾ãã ããã§exp(x)ã®è¨ç®ãçµäºã§ãã
floatã®ãã©ã¼ããã
ã¢ã«ã´ãªãºã ã®7è¡ç®ãå®è£ ããããã«floatã®ãã©ã¼ãããã®èª¬æããã¾ãã floatã¯ç¬¦å·ãããsãææ°é¨eãä»®æ°é¨fãããªãã¾ãã ããããsã1ããããeã8ããããfã23ãããã§åè¨32ãããã§ãã
å½¹å² | ç¬¦å· | ææ°é¨ | ä»®æ°é¨ |
---|---|---|---|
è¨å· | s | e | f |
ãããæ° | 1 | 8 | 23 |
ããããã¿ã¼ã³ã[s:e:f]
ã§è¡¨ãããfloatã¯(-1)^s à 2^(e-127) à (1 + f/2^23)
ã¨ããå¤ã表ãã¾ãã
ãã¨ãã°x = 0ãªã0 = (-1)0 à 20 à (1 + 0)ã¨ãããã®ã§ 符å·ã¯0(0ã¾ãã¯æ£)ãææ°é¨ã¯e = 127ãä»®æ°é¨f = 0ã§ãã éã«s = 1, e = 130, f = 0x123456ã§è¡¨ãããããããã¿ã¼ã³[s:e:f]=0xc1123456ã¯floatã¨ãã¦- 2130-127 à (1+f/223) = -9.137ã§ãã
åç¯ã§ã¯æ´æ°nã«å¯¾ãã¦float(2n)ã欲ããã£ãã®ã§ããã ããã«å¯¾å¿ããããããã¿ã¼ã³ã¯s = 0, e = n + 127, f = 0ã§ãã ã¤ã¾ã((n + 127) << 23)ã¨ãã32ãããæ´æ°ãfloatã®2nã表ãã®ã§ãã
// zm1 = n vpaddd(zm1, zm1, i127); vpslld(zm1, zm1, 23); // 2^n
ãããã£ã¦vpadddã§intã®127ã足ããvpslldã§å·¦23ãããã·ãããããã¨ã§å¿ è¦ãªå¤ãå¾ããã¾ãã
floatããintã¸ã®å¤æ
floatãintã«ä¸¸ããæ¹æ³ã¯ããã¤ãããã¾ãã ä»åã¯SSEã®æ代ããããå¤æå½ä»¤vcvtps2dqã使ãã¾ããã ããã¯ä¸¸ãæ¹æ³ãã°ãã¼ãã«ãªè¨å®ã«ä¾åãã¾ãã é常ã¢ã¼ããå¤æ´ãããã¨ã¯ããã¾ããããããå¥ã®è¨å®ã使ããã¨ããããªããã®æ¹æ³ã¯ä½¿ãã¾ããã
intã¸ã®åãæ¨ã¦å°ç¨å½ä»¤vcvttps2dqã¨ããã®ãããã¾ãããã®å ´åã¯0.5ã足ãã¦ããåãæ¨ã¦ãã°åæ¨äºå ¥ã¨ãªãã¾ãã ãããè² ã®å ´åã¯0.5ãå¼ãå¿ è¦ããããããè¤éã«ãªãã¾ãã
次ã«vroundpsã¨ãã丸ãã¢ã¼ããè¨å®ãã¦ä½¿ããå½ä»¤ãããã¾ãã ããããã®å½ä»¤ã¯AVX2ã¾ã§ã§ä½æ ãAVX-512ç¨ã«æ¡å¼µããã¦ãã¾ããã
代ããã«è¿½å ãããvrndscalepsã¯ä¸¸ãã¢ã¼ããèªåã§è¨å®ã§ãã¾ãã çµæã¯floatåã«ãªãã®ã§intã«ããã«ã¯vcvtps2dqãå¿ è¦ã§ãã ä»åã®ã¢ã«ã´ãªãºã ã¯æ´æ°ã«ããå¾ãfloatã¨intã®ä¸¡æ¹ã®åã®å¤ãå¿ è¦ã ã£ãã®ã§ã¬ã¤ãã³ã·ã®çãvcvtps2dqãå©ç¨ãã¾ããã
端æ°å¦ç
floatã16åãã¤å¦çããã¨å ã®é åã®åæ°nã16ã®åæ°ã§ãªãã¨ã端æ°ãåºã¾ãã ãã®å¦çæ¹æ³ã«ã¤ãã¦è§£èª¬ãã¾ãã
AVX2ã¾ã§ã®SIMDå½ä»¤ã§ã¯ç«¯æ°å¦çãè¦æã§ããã å½ä»¤ã16åä½ãªã®ã§æ®ã5åãã¬ã¸ã¹ã¿ã«èªã¿è¾¼ãã¨ãã£ãå¦çãããã«ããã®ã§ãã ãã®ããSIMDå½ä»¤ã使ããªãé常ã®æ¹æ³ã§ã«ã¼ããåãæ¹æ³ãã¨ããã¨ãå¤ãã§ãã
AVX-512ã§ã¯ããã解決ããããã®ãã¹ã¯ã¬ã¸ã¹ã¿k1, ..., k7ãç»å ´ãã¦ãã¾ãã ãã¹ã¯ã¬ã¸ã¹ã¿ã¯åãããããã¼ã¿å¦çãã(1)ãããªã(0)ããæå®ããã¬ã¸ã¹ã¿ã§ãã ãã¼ã¿å¦çããªãå ´åã¯æ´ã«ã¼ãã§åãã(T_z)ãå¤ãå¤æ´ããªãããé¸æã§ãã¾ãã
ãã¨ãã°
vmovups(zmm0|k1|T_z, ptr[src]); // zmm0 = *src;
ã§k1 = 0b11111;ã®å ´åãä¸ä½5ããããç«ã£ã¦ããã®ã§float *srcã®src[0], ..., src[4]ã ããzmm0ã«ã³ãã¼ãããT_zãæå®ãã¦ããã®ã§æ®ãã¯ã¼ãã§åãããã¾ãã
å®è£ ã³ã¼ãã§ã®è§£èª¬ã«æ»ãã¾ãã
and_(ecx, 15); // ecx = n % 16 mov(eax, 1); // eax = 1 shl(eax, cl); // eax = 1 << n sub(eax, 1); // eax = (1 << n) - 1 ; nãããã®mask kmovd(k1, eax); // ãã¹ã¯è¨å®
ecxã«ã«ã¼ãã®åæ°nãå ¥ã£ã¦ããã¨ã15ã¨andãã¨ã端æ°ãå¾ã¾ãã ((1 << n) - 1)ã¯ãã¼ã¿ãå ¥ã£ã¦ããªãé¨åã0ã¨ãªããã¹ã¯ã§ã(n = 3ãªã0b111ã¨ãªã)ã
vmovups(zm0|k1|T_z, ptr[src]);
vmovups(zm0, ptr[src])
ã¯srcããzm0ã«16åã®floatãèªãå½ä»¤ã§ããã
vmovups(zm0|k1|T_z, ptr[src])
ã¨k1ã§ãã¹ã¯ããã¨æå®ããããããç«ã£ãé¨åããã¡ã¢ãªã«ã¢ã¯ã»ã¹ãã¾ããã
ããã§éè¦ãªç¹ã¯å é¨çã«512ãããèªã¿è¾¼ãã§ããã¼ãã«ããã®ã§ã¯ãªã ãã¹ã¯ããã¦ããªãé åã«read/writeå±æ§ãç¡ãã¦ãä¾å¤ãçºçããªãã¨ããç¹ã§ãã
å®å¿ãã¦ãã¼ã¸å¢çã«ã¢ã¯ã»ã¹ã§ãã¾ãã
ä¿æ°ã®æ±ºãæ¹
æå¾ã«ã¢ã«ã´ãªãºã ã®6è¡ç®
6. z = 1 + b(1 + b(1/2! + b(1/3! + b(1/4! + b/5!))))
ã®å¤ã®æ¹åæ¹æ³ã«ã¤ãã¦ç´¹ä»ãã¾ãã ãã®å¼ã¯ç¡éã«ç¶ãåãéä¸ã§æã¡åã£ããã®ã§ããã ãããã£ã¦å¿ ãæ£ããå¤ãããå°ãããªãã¾ãã
1/k!ã®å¤ã微調æ´ãããã¨ã§èª¤å·®ãããå°ããåºæ¥ã¾ãã
bã®ã¨ãå¾ãç¯å²ã¯L = log(2)/2ã¨ãã¦[-L, L]ã§ããã é¢æ°f(x) = 1 + A + Bx + Cx2 + Dx3 + Ex4 + Fx5ã¨ã㦠åºé[-L, L]ã§f(x)ã¨exp(x)ã®å·®ã®2ä¹èª¤å·®ã®å¹³åãæå°åãã(A, B, C, D, E, F)ãè¦ã¤ãã¾ãã
æ°å¦çã«ã¯I(A, B, C, D, E, F):=â«_[-L,L](exp(x) - f(x))2 dxã¨ã㦠IãA, B, C, D, E, Fã§åå¾®åããå¤ãå ¨ã¦0ã«ãªã解ãæ±ãã¾ãã
Mapleã§ã¯
f := x->A+B*x+C*x^2+D*x^3+E*x^4+F*x^5; g:=int((f(x)-exp(x))^2,x=-L..L); sols:=solve({diff(g,A)=0,diff(g,B)=0,diff(g,C)=0,diff(g,D)=0,diff(g,E)=0,diff(g,F)=0},{A,B,C,D,E,F}); Digits:=1000; s:=eval(sols,L=log(2)/2); evalf(s,20);
ã§æ±ãã¾ããã éãªæ¯è¼ã§ããåç´ã«æã¡åã£ãã¨ãã«æ¯ã¹ã¦èª¤å·®ãååç¨åº¦ã«ãªãã¾ããã
Sollyaã使ã£ã¦
remez(exp(x),5,[-log(2)/2,log(2)/2]);
ã§æ±ããããæ¹ãããã¾ã(å人çã«ã¯2ä¹èª¤å·®ãå°ããããåè ã®æ¹ãããå°è±¡ : æ°å¤è¨ç®å°éã®æ¹æãã¦ãã ãã)ã
ã¾ã¨ã
exp(x)ã®è¿ä¼¼è¨ç®ã®æ¹æ³ã¨AVX-512ç¹æã®å½ä»¤ã®ç´¹ä»ããã¾ããã 端æ°å¦çããã¾ãã§ãããã¹ã¯ã¬ã¸ã¹ã¿ã¯ä¾¿å©ã§ããã æã«æ¯ã¹ã¦SIMDã¬ã¸ã¹ã¿ã®å¹ ã大ãããªã£ã¦ããã®ã§ãã¼ãã«å¼ããããã«è¨ç®ããæ¹ãéããªããã¨ãå¤ãããã§ãã