gracefullight.dev Blog

2026-01-03T05:54:47.000Z

Vocabulary & Expressions

Term/Expression	Definition	Simpler Paraphrase	Meaning
Calm down	To become less agitated or upset	Relax	진정되다, 가라앉다
Cool off	To become less hot or angry	Chill out	진정해지다, 차분해지다
Chill out	To relax completely	Take it easy	화를 누그러뜨리다, 열을 식히다, 긴장을 풀다
Cool down	To lower temperature or become less angry	Calm down	화를 누그러뜨리다
Settle down	To become calm or to establish a stable life	Get comfortable	진정하다
Simmer down	To calm down gradually	Relax slowly	흥분을 가라앉히다
Take it easy	To relax and not stress	Chill out	진정하다

]]>

2026-01-01T05:43:35.574Z

현대자동차에 합류하기 전, 인사팀에서는 진급이 거의 확정적인 것처럼 이야기했었다. 하지만 막상 들어와 보니 실상은 많이 달랐다. 나름 열심히 했다고 생각했지만, 작년 첫 고과에서 나는 1월 입사자라는 이유로 평가 대상에서 제외되었고 그 때문에 리스트에도 오르지 못한 것이 아닐까 싶다.

규칙이라면 따라야겠지만, 인사와 내부 사정이 다른 것에 실망스러웠다. 비슷한 연차의 동료들 역시 각자 나름의 불만을 안고 출발했던 것 같다. 바꿀 수 없는 것보다는 바꿀 수 있는 것에 집중하자는 생각으로, 나는 다른 일들에 더 많은 시간을 쓰게 되었다.

1분기는 목표가 사라진 채로 거의 의욕 없이 흘려보냈다. 그 전까지는 대학원 시험을 잘 치르는 것이 가장 큰 목표였는데, 결과 발표가 계속 미뤄지면서 마음이 더 힘들어졌다. 여러 번 연기되었던 결과는 4월 초에야 나왔고, 낙방이었다. GPA인지, 시험 결과인지, SOP인지, 영어 성적인지 무엇이 부족했는지 궁금해 문의했지만 구체적인 답을 듣지는 못했다. 대신 아래와 같은 답변을 받을 수 있었다.

We regret to inform you that your application in this admission cycle was not successful. Please understand that admission into the Master of Science in Machine Learning is very competitive and takes into account a large number of criteria. Due to restrictions on the number of places, we unfortunately have to decline a large number of strong applications. Although this final decision may be disappointing, we are confident that, given your credentials, many other opportunities will open up for you.

"Strong applications"라는 표현에 그나마 위안을 얻었던 것 같다. 하지만 다시 같은 꿈을 꾸기에는 아이엘츠 성적 만료가 코앞이었고, 재도전은 현실적으로 어려워 보였다. 한정된 시간 안에서 무엇을 해야 할지 고민하던 중, 예전부터 와이프가 추천해주던 박람회가 떠올랐다.

코엑스에서 주기적으로 열리는 유학·해외취업·이민 박람회를 알게 되었고, 별다른 기대 없이 무작정 찾아갔다. 영국 석사는 1.5년 코스였지만, 아이엘츠의 영국 전용 버전이 신설되면서 기존 성적을 사용할 수 없었다. 그 대안으로 호주가 눈에 들어왔다. 아랍에미리트의 다른 대학원도 가능성은 있었지만, 예전부터 시드니에서 살아보고 싶다고 말하던 와이프의 영향으로 호주 대학원을 목표로 삼게 되었다.

박람회에서 만난 유학원은 생각보다 체계적이지 않았고, 진행 과정도 만족스럽지는 않았다. 그럼에도 불구하고 인공지능으로 급변하는 소프트웨어 엔지니어 정의와 미래를 대비해야 한다는 생각, 그리고 물리 AI 시대가 오기 전에 관련 기업으로 이직하거나 연구로 방향을 틀어야 한다는 판단은 분명했다. 그렇게 인공지능 석사를 목표로 상담을 이어갔다. 아이엘츠 성적도 있었고, 자금도, 경력도, 의지도 갖추고 있었기에 과정은 비교적 빠르게 진행되었다. 그 무렵 퇴사 의사를 밝혔고, 멋진 동료들로부터 불확실한 앞길에 대한 따뜻한 덕담을 받을 수 있었다는 점이 참 감사했다.

집을 팔고, 가진 것들을 정리한 뒤, 캐리어 두 개만 들고 7월 13일 호주에 도착했다. 예상보다 훨씬 쌀쌀했던 호주의 겨울에 적응하는 일은 쉽지 않았다. 한국에서는 영어를 꽤 한다고 생각했지만, 내가 익숙했던 것은 정제된 영어였다는 걸 곧 깨달았다. 현지인들의 영어와 영어를 세컨드 랭기지로 사용하는 친구들의 영어는 완전히 달랐다.

게다가 한국어로도 아카데믹 레포트에 익숙하지 않은 상태에서, 짧은 시간 안에 영어 레퍼런스를 포함한 IMRD 포맷의 리포트를 작성하는 일은 정말 버거웠다. "이만큼 돈을 쓰는데 석사 학위쯤은 그냥 살 수 있는 거 아니야?"라고 생각했던 과거의 내가 부끄러워졌다. ChatGPT조차 없던 시절에 해외 석박을 마친 선배님들이 새삼 대단하게 느껴졌다. 몇몇 뛰어난 20대 글로벌 인재들을 보며, 그들의 10년 뒤 모습이 궁금해지기도 했다.

수업과 과제, 시험에 적응해 가는 시간 속에서, 와이프가 잠깐씩 지구 반대편으로 와줄 때마다 마음의 여유를 얻을 수 있었다. 그렇게 New South Wales 주에서의 혼인신고까지 마무리하며, 1학기를 High Distinction으로 잘 끝낼 수 있었다. 마지막 달에는 멋진 슈퍼바이저 밑에서 재미있는 연구를 같이할 기회도 얻었다. 시간이 날 때마다 해보고 싶었던 코드 템플릿화와 프롬프트화 역시 그 마지막 달에 시도해볼 수 있었다.

내년에는 어떤 회고를 쓰게 될까. 벌써부터 기대가 된다.

]]>

2025-12-29T10:19:46.800Z

shadcn/ui(Vaul)를 사용하여 모바일 웹을 개발하던 중, iOS Safari에서 Drawer 내부의 Dialog를 호출할 때 발생하는 스크롤 충돌 버그를 해결했다.

문제 상황

모바일 뷰에서 '더보기' Drawer를 열고 내부 메뉴를 클릭하여 '공유하기' Dialog를 띄우는 기능이었다.

Drawer Open: 스크롤이 있는 페이지에서 Drawer를 연다.
Interaction: 메뉴 클릭 시 Drawer가 닫히고 Dialog가 열린다.
Bug: 이 전환 과정에서 배경 스크롤이 최상단으로 초기화(Jump to Top) 되거나, 화면이 고정되어 먹통이 된다.

원인 분석: Scroll Lock Race Condition

Vaul(Drawer)도 Radix Dialog 기반이라, Drawer close 직후 Dialog open이 같은 프레임/틱에서 일어나면 중첩된 overlay의 스크롤 락/바디 스타일 변경이 겹치며 iOS Safari에서 스크롤 복원/점프가 발생할 수 있다

해결 방법

1. Timing (실행 시점 분리)

onClose나 클릭 핸들러에서 즉시 Dialog를 띄우지 않고, onAnimationEnd 이벤트를 활용하여 Drawer의 정리가 완전히 끝난 후 Dialog를 호출해야 한다.

<Drawer
  onAnimationEnd={(open) => {
    // Drawer가 완전히 닫힌 후(!open) 실행
    if (!open) {
      setDialogOpen(true);
    }
  }}
>

2. Configuration (iOS 호환성 옵션)

Vaul 컴포넌트에 다음 옵션을 추가하여 Safari의 불완전한 스타일 재계산을 방지한다.

<Drawer
  disablePreventScroll={true}
  repositionInputs={false}
  shouldScaleBackground={false}
>

disablePreventScroll={true}
- Vaul이 body에 강제로 스크롤 락 스타일을 적용하지 않도록 한다
- iOS에서 발생하는 스크롤 점프 현상을 완화
- 단, Drawer가 열려 있는 동안 배경 스크롤이 허용될 수 있음
repositionInputs={false}
- 가상 키보드 대응을 위한 viewport 재배치 로직 비활성화
- iOS Safari에서 input + 모달 조합 시 발생하는 레이아웃 충돌 방지
shouldScaleBackground={false}
- Drawer 오픈 시 배경을 scale 처리하지 않음
- 불필요한 body 스타일 변경을 줄여 안정성 확보

이 옵션들은 완전한 해결책이라기보다는 iOS Safari 환경에서의 리스크를 줄이기 위한 방어설정에 가깝긴하다.

Ref

Github Issue: Vaul Drawer iOS Scroll Bug

]]>

2025-12-27T06:46:08.652Z

OpenQASM 2.0

OPENQASM 2.0; // 언어 버전 선언

qreg q[1]; // 큐비트 레지스터 q를 선언 (큐비트 1개, 초기 상태 |0⟩)
creg c[1]; // 고전 비트 레지스터 c를 선언 (측정 결과 저장용)

h q[0]; // Hadamard 게이트 적용: |0⟩ → (|0⟩ + |1⟩) / √2

measure q[0] -> c[0]; // q[0]을 측정하고 결과(0 또는 1)를 c[0]에 저장

OPENQASM 2.0;

qreg q[10]; // ∣0000000000⟩
creg c[10];

// x 게이트를 사용하여 처음 세 큐비트를 ∣1⟩ 상태로 변경
x q[0]; ∣1000000000⟩
x q[1]; ∣1100000000⟩
x q[2]; ∣1110000000⟩

measure q[0] -> c[0]; // 1
measure q[1] -> c[1]; // 1
measure q[2] -> c[2]; // 1

Linear Algebra

Vectors

In quantum computing, vectors represent quantum states.
A 2-dimensional vactor can be written as:
$|\psi\rangle = \begin{pmatrix} \psi_0 \\ \psi_1 \end{pmatrix}$

Computational Basis

$|0\rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad |1\rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix}$

$|\psi\rangle$ is a linear combination of basis states as follows:
$|\psi\rangle = \psi_0|0\rangle + \psi_1|1\rangle = \begin{pmatrix} \psi_0 \\ \psi_1 \end{pmatrix}$

Indentity Matrix

$\mathbb I \psi\rangle = \begin{bmatrix} \psi_0 \\ \psi_1 \end{bmatrix}$
$\mathbb I^2 = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$

Conjugate Transpose

Bra: $|\psi\rangle^\dagger = \begin{pmatrix} \psi_0 \\ \psi_1 \end{pmatrix}^\dagger = \begin{pmatrix} \overline{\psi_0} & \overline{\psi_1} \end{pmatrix} = \langle \psi|$ $∣ ψ ⟩^{†} = (ψ_{0} ψ_{1})^{†} = (\overline{ψ_{0}} \overline{ψ_{1}}) = ⟨ ψ ∣$
- $\langle \psi| := |\psi\rangle^\dagger$
- ket: $|\psi\rangle$
- bra: $\langle \psi|$
- bra + ket: dot product $\langle \phi|\psi\rangle$
Dagger: $A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \implies A^\dagger = \begin{bmatrix} \overline{a} & \overline{c} \\ \overline{b} & \overline{d} \end{bmatrix}$

psi_dagger = psi.conjugate().T
psi_dagger = psi.conjugate().transpose()
psi_dagger = psi.H

$(\alpha A)\dagger = \overline\alpha A^\dagger$
$(A^\dagger)^\dagger = A$
$(A + B)^\dagger = A^\dagger + B^\dagger$
$(AB)^\dagger = B^\dagger A^\dagger$

Hermitian

A matrix is equal to its conjugate transpose.
$H = H^\dagger$

A_dagger = A.H
AA_dagger = A * A_dagger
# (AA†)†=AA†?
is_hermitian = AA_dagger.is_hermitian

Unitary

A matrix whose conjugate transpose is also its inverse.
$U^\dagger U = U U^\dagger = \mathbb I$
$U = \begin{pmatrix} \cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{pmatrix}$

U = Matrix([
  [cos(theta), sin(theta)], 
  [-sin(theta), cos(theta)]
])

U_dagger = U.H
U_dagger_U = trigsimp(U_dagger * U)

is_unitary = U_dagger_U == I

Inner Product

$\langle \psi|\phi\rangle = \begin{pmatrix} \overline{\psi_0} & \overline{\psi_1} \end{pmatrix} \begin{pmatrix} \phi_0 \\ \phi_1 \end{pmatrix} = \overline{\psi_0}\phi_0 + \overline{\psi_1}\phi_1$
$|\langle \psi|\phi\rangle|^2 = \langle \psi|\phi\rangle \langle \phi|\psi\rangle$
$|\langle\psi|\phi\rangle| = |\langle\phi|\psi\rangle|$

Orthogonality

$|0\rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad |1\rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix}$ $∣0 ⟩ = (10), ∣1 ⟩ = (01)$
- orthonormal basis
$\langle 0 | 1 \rangle = \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} 0 \\ 1 \end{pmatrix} = 0$ $⟨ 0∣1 ⟩ = (10) (01) = 0$
- $\langle 1 | 0 \rangle = 0$
$\langle 0 | 0 \rangle = \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = 1$ $⟨ 0∣0 ⟩ = (10) (10) = 1$
- $\langle 1 | 1 \rangle = 1$

$\langle \psi|\phi\rangle = \overline{\psi_0}\phi_0 + \overline{\psi_1}\phi_1$

$|\psi\rangle = \psi_0|0\rangle + \psi_1|1\rangle, \quad |\phi\rangle = \phi_0|0\rangle + \phi_1|1\rangle$
$\langle \psi| = \overline{\psi_0}\langle 0| + \overline{\psi_1}\langle 1|$
$\langle \psi|\phi\rangle = \overline{\psi_0}\phi_0\langle 0|0\rangle + \overline{\psi_0}\phi_1\langle 0|1\rangle + \overline{\psi_1}\phi_0\langle 1|0\rangle + \overline{\psi_1}\phi_1\langle 1|1\rangle$
$\langle 0|1\rangle = 0, \quad \langle 1|0\rangle = 0$

Magnitude

$\|\psi\rangle\|^2 = |\psi_0|^2 + |\psi_1|^2$

$\|\psi\rangle\| = \sqrt{\langle \psi|\psi\rangle} = \sqrt{|\psi_0|^2 + |\psi_1|^2}$
$\| |\psi\rangle\|^2 = \langle \psi|\psi\rangle$ $∥∣ ψ ⟩ ∥^{2} = ⟨ ψ ∣ ψ ⟩$
- $\langle \psi|\psi\rangle = \overline{\psi_0}\psi_0 + \overline{\psi_1}\psi_1 = |\psi_0|^2 + |\psi_1|^2$

Outer product

$|\psi\rangle\langle\phi| = (\psi_0 | 0\rangle + \psi_1 |1\rangle)(\overline{\phi_0}\langle 0| + \overline{\phi_1}\langle 1|) \\ \quad =\psi_0\overline{\phi_0}|0\rangle\langle 0| + \psi_0\overline{\phi_1}|0\rangle\langle 1| + \psi_1\overline{\phi_0}|1\rangle\langle 0| + \psi_1\overline{\phi_1}|1\rangle\langle 1|$
$|\psi\rangle\langle\phi| = \begin{pmatrix} \psi_0 \\ \psi_1 \end{pmatrix} \begin{pmatrix} \overline{\phi_0} & \overline{\phi_1} \end{pmatrix} = \begin{pmatrix} \psi_0\overline{\phi_0} & \psi_0\overline{\phi_1} \\ \psi_1\overline{\phi_0} & \psi_1\overline{\phi_1} \end{pmatrix}$
$|0\rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad |1\rangle = \begin{pmatrix} 0 & 1 \end{pmatrix}$
$|0\rangle\langle 0| = \begin{pmatrix} 1 \\ 0 \end{pmatrix} \begin{pmatrix} 0 & 1 \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}$
$A = \begin{pmatrix} a_{00} & a_{0 1} \\ a_{1 0} & a_{1 1} \end{pmatrix} \\ \quad = a_{00}|0\rangle\langle 0| + a_{01}|0\rangle\langle 1| + a_{10}|1\rangle\langle 0| + a_{11}|1\rangle\langle 1|$

Qubit

$|\psi\rangle = \alpha|0\rangle + \beta|1\rangle$

where $\alpha, \beta$ are complex numbers satisfying $|\alpha|^2 + |\beta|^2 = 1$ .
phase factor: $e^{i\phi}$ $e^{i ϕ}$ , turn the state by angle $\phi$ $ϕ$ in the complex plane, but does not affect measurement probabilities.
- $|e^{i\phi}| = 1$

One-Qubit Gates

Identity Gate

$\mathbb{I} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$

Pauli-X Gate

$X = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$

NOT gate

Pauli-Y Gate

$Y = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix}$

Pauli-Z Gate

$Z = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}$

Hadamard Gate

$H = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}$

Rotation Gate

$R(\theta) = \begin{pmatrix} \cos{\theta} & \sin{\theta} \\ -\sin{\theta} & \cos{\theta} \end{pmatrix}$

The Bloch Sphere

$|\psi\rangle = \cos(\theta) |0\rangle + e^{i\phi}\sin(\theta)|1\rangle$

where $0 \leq \theta \leq \pi$ and $0 \leq \phi < 2\pi$ .
$\theta$ $θ$ : the polar (or colatitude) angle, measured from the "north pole" of the sphere.
- polar angle: 편각
$\phi$ $ϕ$ : the azimuthal (or longitude) angle around the equator.
- azimuthal angle: 방위각

Latex

\texttip{}: Displays a tooltip when hovering over the equation.
\toggle{}: Toggles between two expressions.
\begin{align}: Aligns equations, where the ampersand (&) marks the alignment points.
\bbox[color, padding]{}: Puts a bounding box with the specified color and padding around an expression.
\boldsymbol{}: Renders a bold version of symbols like variables.
\cancel{}: Strikes through an expression.
\cancelto{value}{}: Strikes through and labels with the specified value.
\begin{cases}: Creates a piecewise function with conditions.
\color{}: Applies a color to text or math expressions. You can use predefined colors or hex values.
\enclose{}: Encloses an expression with various effects like circles or strikes.
[mathcolor="color", mathbackground="color"]: Adds custom colors to the enclosing effect.
\xmapsto{}: Creates an arrow with a label for mapping.
\xlongequal{}: Creates a long equals sign with a label.
\ce{}: Renders chemical equations or formulas.
\newcommand{\ket}[1]{\left|#1\right\rangle}: defines a custom command for ket notation. \ket{\psi}
\tag{}: Assigns a custom tag to an equation.
\unicode{}: Inserts a Unicode character using its code.

Ref

OpenQASM 2.0

]]>

2025-12-10T13:40:52.653Z

AI 시대에 개발자는 어떻게 살아남아야 할까? LLM의 코드 퀄리티는 이미 나보다 탁월하고, 더 빠른 속도로 코드를 찍어낸다. 이런 상황에서 나는 무엇을 해야 할까? 5년 안에 대부분의 개발자가 대체될 것이라는 두려움 속에서, 나는 지금 무엇을 배워야 할까?

ChatGPT 이후의 모델들을 사용하며 느낀 점은, 결국 내가 문제를 어떻게 분해하고, 어떤 데이터를 먹이로 주느냐(feed)가 결과물의 퀄리티를 결정한다는 것이다. 예를 들면, 시니어 개발자가 AI를 활용해 더 뛰어난 아웃풋을 낼 수 있는 이유는 좋은 코드와 아키텍처를 수없이 봐왔고, 개념을 피부로 느껴왔으며, 협업과 실무 경험을 통해 축적된 노하우가 있기 때문이다. 즉, 모델에 입력할 수 있는 키워드와 컨텍스트의 크기가 다르다는 뜻이다.

어디서 좋은 코드와 아키텍처를 배울 수 있을까? 소프트웨어 엔지니어링의 정수는 오픈소스에 있기에, 그곳에서 답을 찾을 수 있을 것이다.

나는 매주 Github 인기 레파지토리를 30분씩 훑어보는(Skim reading) 루틴을 가지고 있다. 관심 가는 프로젝트는 스타를 눌러 북마크 해두고, 폴더 구조와 사용된 패키지, 그리고 그 구현체를 뜯어본다. 해당 소스에서 참신함이 느껴진다면, 포크해서 LLM을 통해 개괄하고, 핵심 기능을 찾아 바닥부터 직접 만들어 보기도 한다.

나보다 더 많은 시간을 쏟고, 해당 문제에 대해 더 깊이 고민했을 오픈소스 컨트리뷰터들의 방법론을 내 것으로 만드는 것. 그것이 AI-native 시대에 대체되지 않는 개발자가 되는 빠른 길 중 하나일 것이다.

그렇다면 AI 시대의 개발자의 역할은 한마디로 무엇일까? 나는 생성적 적대 신경망(GAN)에서의 Discriminator, 즉 판별자의 역할이라고 생각한다.

AI 모델이 쏟아내는 코드와 아키텍처가 논리적으로 맞는지, 효율적인지, 혹은 더 나은 방법은 없는지를 판단하는 능력. 그리고 그 판단에 맞춰 프롬프트를 조정하고, AI에게 피드백을 주어 결과물을 개선해 나가는 능력. 이것이 개발자가 가져야 할 핵심 역량이다.

가트너는 이러한 개발 방식을 AI-native Software Engineering이라고 정의한다 (Khandabattu & Tamersoy, 2025). 단순 코딩은 AI에게 위임하고, 개발자는 더 본질적인 과업(Meaningful tasks)에 집중해야 한다는 것이다. 기계적인 구현에서 벗어나 비판적 사고(Critical thinking), 인간 고유의 독창성(Ingenuity), 그리고 사용자를 향한 공감(Empathy) 같은 영역 말이다. 결국 우리가 판별자가 되어야 하는 이유는, 인간만이 할 수 있는 이 고유한 가치를 지키고 확장하기 위함이다.

다른 한편으론, 리누스 토발즈의 말처럼 지금의 AI 하이프(Hype)의 90%는 마케팅이고 10%만이 진짜일 것이다 (TFiR, 2024, 37:59). 그 10%를 가려내기 위해서는 이론을 더 깊게 파고들어야 한다. 그리고 나서 이론이 어떻게 엔지니어링을 통해 구현되는지를 경험해보면 판별할 수 있는 눈이 생길 것이다.

프레임워크나 라이브러리는 금방 변한다. 하지만 그 기저에 있는 개념들은 바뀌지 않는다. 왜냐하면 모든 프로그래밍은 결국 자료구조와 분할 정복으로 귀결되기 때문이다. AI는 단지 그 추상화된 레이어를 한 단계 더 높여줄 뿐이다.

다수는 AI 시대에 학위가 필요 없어질 것이라 말하지만, 나는 다르게 생각한다. 정말로 그 개념을 제대로 알고 있는지가 중요해지는 만큼, 학위나 자격증처럼 기초 지식을 증명하는 수단이 오히려 더 중요해질 것이다.

공학이 중요하다. 다른 모든 것은 그 개념의 implementation일 뿐이다.

Ref

Khandabattu, H., & Tamersoy, B. (2025, June 11). Hype Cycle for artificial intelligence, 2025 (ID G00828523). Gartner. https://www.gartner.com/interactive/hc/6579402
TFiR. (2024, October 17). Linus Torvalds on the kernel, GenAI, EVs, programming languages and more… [Video]. YouTube. https://youtu.be/s4wlrxFf2lM

]]>

2025-12-10T10:43:46.884Z

1. Abstract

Core Concept: $\pi_{0.5}$ is a model designed for broad generalization by utilizing co-training on heterogeneous tasks.
Method: It combines hybrid multi-modal examples including image observations, language commands, object detection, semantic subtask prediction, and low-level actions.
Impact: This knowledge transfer is essential for effective generalization, enabling the execution of long-horizon and dexterous manipulation skills in the wild.

2. Introduction

Goal: Design training recipes that provide the breadth of knowledge required for robots to generalize at multiple levels of abstraction, from physical behaviors to scene semantics.
Unified Framework: By casting different modalities into a single sequence modeling framework, VLAs can be trained on diverse sources: robot data, language data, computer vision tasks, and combinations thereof.
Capabilities: The model can control mobile manipulators to perform varied household tasks even in homes never seen during training.
Hierarchical Architecture:
- Training: Pre-trains on a heterogeneous mixture of tasks, then fine-tunes specifically for mobile manipulation using both low-level action examples and high-level semantic actions (e.g., predicting "pick up the cutting board").
- Inference: At runtime, the model first predicts a semantic subtask (inferring appropriate next behavior based on scene semantics) and then predicts the robot action chunk based on this subtask.

3. Model Structure

Unified Transformer Architecture

The model corresponds to a transformer taking in $N$ multimodal input tokens $x_{1:N}$ (images, text, and actions) and producing multimodal outputs.
Input Processing: Different token types are processed by specific encoders (e.g., Vision Encoder for images, Embedding Matrix for text).
Output Split: The output is split into two streams:
- Text Logits ( $y^{l}_{1:M}$ ): Used for QA, reasoning, and dividing the task (predicting subtasks $\hat{l}$ ).
- Action Tokens ( $y^{a}_{1:H}$ ): Produced by a separate Action Expert to create continuous outputs for robot control.

Probabilistic Decomposition

The distribution captured by the model is decomposed using the chain rule and a conditional independence assumption:

\pi_{\theta}(a_{t:t+H}, \hat{l} | o_{t}, l) = \pi_{\theta}(a_{t:t+H} | o_{t}, \hat{l}) \cdot \pi_{\theta}(\hat{l} | o_{t}, l)

Assumption: The action distribution ( $a_{t:t+H}$ ) does not depend on the overall task prompt ( $l$ ), but only on the predicted subtask ( $\hat{l}$ ).
High-Level Inference: $\pi_{\theta}(\hat{l} | o_{t}, l)$ (Predicting "what to do next").
Low-Level Inference: $\pi_{\theta}(a_{t:t+H} | o_{t}, \hat{l})$ (Predicting "how to move").

4. Combining Discrete & Continuous Actions

The model employs a hybrid approach to balance training efficiency with inference speed and quality.

The Dilemma:
- Discrete Tokens (FAST): Fast training, but requires slow autoregressive decoding during inference.
- Continuous (Flow Matching): High quality and smooth control, but computationally expensive to train from scratch on massive datasets.
The Solution: Train on discretized actions (FAST) but use Flow Matching for inference.
- Attention Masking: Ensures discrete and continuous action representations do not attend to each other during joint training.

Hybrid Loss Function

The model minimizes a combined objective:

\mathbb{E} \left[ \underbrace{H(x, f^l_\theta)}_{\text{Cross Entropy}} + \alpha \underbrace{\| \omega - a - f^a_\theta \|^2}_{\text{MSE for Flow}} \right]

Cross Entropy: For text and discrete action tokens.
MSE: For the Flow Matching vector field (Action Expert).

5. Training Recipe

The training is split into two distinct stages based on the $\alpha$ parameter and the inclusion of the Action Expert.

Stage 1: Pre-training ( $\alpha = 0$ )

Goal: Efficient large-scale learning.
Method: Action Expert is OFF. Trains as a standard auto-regressive transformer using next-token prediction for text and discrete FAST action tokens.
Datasets:
- MM: Mobile Manipulator data (100+ homes).
- ME: Multi-Environment non-mobile robots.
- CE: Cross-Embodiment laboratory data (diverse tasks like folding).
- HL: High-Level subtask prediction data.
- WD: Multimodal Web Data (VQA, captioning).

Stage 2: Post-training ( $\alpha = 10.0$ )

Goal: Specialization for mobile manipulation and enabling continuous control.
Method: Action Expert is ON.
- Initialized with random weights.
- Jointly trains next-token prediction (to preserve text capabilities) and Flow Matching for continuous actions.
Key Addition (Verbal Instructions - VI):
- Data collected by "teleoperating" the robot using language commands (e.g., expert users selecting sub-tasks step-by-step).
- Crucial for training the model to predict high-quality subtasks ( $\hat{l}$ ).

6. Evaluation

Methodology

Settings: Tested in entirely new kitchens and bedrooms not seen during training.
Tasks: Long-horizon tasks like cleaning kitchens, putting laundry away, and making beds.
Metrics: Task progress (percentage of steps completed) and Language Following Rate.

Key Findings

Generalization: $\pi_{0.5}$ successfully performs multi-stage tasks in real, unseen homes.
Scaling: Performance improves consistently as the number of training environments increases.
Ablation Studies:
- Cross-Embodiment (CE/ME): Excluding data from other robots significantly degrades performance, indicating strong transfer learning.
- Web Data (WD): While less critical for general task progress, it is essential for Out-of-Distribution (OOD) object generalization and language following.
Comparison: Significantly outperforms $\pi_0$ and the $\pi_0$ -FAST+Flow baseline.

7. Conclusions & Future Work

Current Status: $\pi_{0.5}$ demonstrates that co-training with heterogeneous data enables end-to-end robotic systems to perform long-horizon, dexterous skills in open-world settings.
Limitations:
- Struggles with physical constraints (hard-to-open cabinets) or partial observability.
- Limited to relatively simple prompts based on training data.
Future Directions:
- Incorporating richer context and memory for better handling of partial observability.
- Expanding data sources, particularly exploring verbal instructions as a powerful new supervision modality.

Ref

Intelligence, P., Black, K., Brown, N., Darpinian, J., Dhabalia, K., Driess, D., Esmail, A., Equi, M., Finn, C., & Fusai, N. (2025). π0.5: a Vision-Language-Action Model with Open-World Generalization. arXiv preprint arXiv:2504.16054.

]]>

2025-12-07T04:12:58.610Z

1. Problem Statement

Modeling tabular data poses unique challenges for GANs, which existing statistical and deep neural network models fail to address properly:

Mixed Data Types: Tabular data contains a mix of discrete and continuous columns.
Non-Gaussian & Multimodal Distributions: Continuous columns often have multiple modes (peaks) and do not follow a simple Gaussian distribution.
Imbalanced Discrete Columns: Categorical columns are often heavily imbalanced (e.g., 90% 'Normal', 10% 'Fraud'), leading to mode collapse where minor categories are ignored.

2. Methodology

To address these challenges, the authors propose CTGAN, which introduces Mode-specific Normalization, a Conditional Generator, and a Training-by-Sampling strategy.

A. Mode-Specific Normalization

Challenge: Representing continuous values with arbitrary, non-Gaussian distributions is non-trivial. Simple Min-Max normalization to [-1, 1] fails on multimodal data.
Solution: Treat each continuous column $C_i$ independently using a Variational Gaussian Mixture Model (VGM).
1. Estimate the number of modes $m_i$ and fit a Gaussian mixture.
2. Represent each value as a concatenation of:
  - One-hot vector ( $\beta$ ): Indicates which mode the value belongs to.
  - Scalar ( $\alpha$ ): Represents the normalized value within that mode.

B. Conditional Generator and Training-by-Sampling

Challenge: Random sampling during training neglects minor categories in imbalanced columns, causing the generator to fail in learning them.
Solution: Condition the generator to produce specific discrete values.
- Conditional Vector: defined as $cond = m_1 \oplus ... \oplus m_{N_d}$ .
  - Example: For columns $D_1=\{1,2,3\}$ and $D_2=\{1,2\}$ , the condition $(D_2=1)$ is represented as mask vectors $m_1=[0,0,0]$ (ignored) and $m_2=[1,0]$ (selected).
- Generator Loss: Penalize the generator if it fails to produce the requested condition. This is done by adding the cross-entropy between the input mask $m_{i^*}$ and the generated output $\hat{d}_{i^*}$ to the loss.
- Training-by-Sampling (Curriculum):
  1. Create zero-filled mask vectors.
  2. Randomly select a discrete column $D_i$ .
  3. Construct a PMF based on the log-frequency of values in that column (giving minor classes a higher chance).
  4. Sample a value $k^*$ based on this PMF and set the mask bit to 1.
  5. This ensures the model evenly explores all possible discrete values, not just the majority classes.

C. Network Structure (CTGAN)

Architecture: Two fully-connected hidden layers for both Generator and Critic.
- Generator: Batch Normalization + ReLU.
- Critic: Dropout + Leaky ReLU.
Optimization: WGAN loss with gradient penalty + Adam optimizer ( $lr=2 \cdot 10^{-4}$ ).

Generator Flow:

h0 = z ⊕ cond
h1 = h0 ⊕ ReLU(BN(FC_256(h0)))
h2 = h1 ⊕ ReLU(BN(FC_256(h1)))
α_hat = tanh(FC(h2))              # Continuous scalar
β_hat = gumbel_0.2(FC(h2))        # Continuous mode (one-hot)
d_hat = gumbel_0.2(FC(h2))        # Discrete value (one-hot)

Critic Flow:

h0 = r1 ⊕ ... ⊕ r10 ⊕ cond1 ⊕ ... ⊕ cond10
h1 = drop(leaky_0.2(FC_256(h0)))
h2 = drop(leaky_0.2(FC_256(h1)))
Score = FC_1(h2)

D. TVAE (Tabular Variational AutoEncoder)

The authors also propose TVAE as a robust baseline for comparison.

Uses two networks to model $p_\theta(r_j|z_j)$ and $q_\phi(z_j|r_j)$ .
Optimized using Evidence Lower-Bound (ELBO) loss.
Treats continuous variables ( $\alpha$ ) as Gaussian and discrete variables ( $\beta, d$ ) using softmax.

3. Evaluation & Benchmarks

Evaluation Metrics

Likelihood Fitness (Simulated Data):
- Uses a known Oracle $S$ (Gaussian Mixture or Bayesian Network).
- $\mathcal{L}_{syn}$ : Likelihood of synthetic data on original Oracle $S$ . (Prone to overfitting).
- $\mathcal{L}_{test}$ : Train a new Oracle $S'$ using synthetic data $T_{syn}$ , then compute likelihood of real test data $T_{test}$ on $S'$ . (Detects mode collapse).
Machine Learning Efficacy (Real Data):
- Train classifiers/regressors on Synthetic Data ( $T_{syn}$ ).
- Test them on Real Test Data ( $T_{test}$ ).
- Metrics: Accuracy, F1-Score (Classification), $R^2$ (Regression).

Benchmarks

Baselines: 2 Bayesian Networks (CLBN, PrivBN) + 3 Deep Learning methods (MedGAN, VeeGAN, TableGAN).
Simulated Datasets: Grid, GridR (Grid + Offset), Ring (GMM Oracles), and Bayesian Networks (Alarm, Child, Asia, Insurance).
Real Datasets: 6 UCI datasets (Adult, Census, etc.), Credit (Kaggle), MNIST28.

4. Outcomes & Conclusion

Performance: CTGAN outperforms all deep learning methods and surpasses Bayesian networks on 87.5% of datasets.
TVAE vs CTGAN: TVAE is highly competitive and outperforms CTGAN in several cases. However, CTGAN is preferred for privacy applications (easier to implement Differential Privacy) since the generator doesn't access real data during inference.
Key Contributions:
- Mode-specific normalization solves the non-Gaussian/multimodal distribution issue.
- Conditional Generator & Training-by-sampling effectively solve the imbalanced data issue.

Ref

Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling tabular data using conditional gan. Advances in neural information processing systems, 32.

]]>

2025-12-05T03:12:18.080Z

Vocabulary for AI 014

Term/Expression	Definition	Simpler Paraphrase	Meaning
proliferation	the rapid increase or spread of something	rapid increase	확산, 급증
necesitate	to make something necessary or unavoidable	to require	필요하게 하다
efficacy	the ability to produce a desired or intended result	effectiveness	효능, 효과
densitify	to make something denser or more concentrated	to compact	밀도 높이다
reminiscent	tending to remind one of something	suggestive	연상시키는
discriminatory	showing prejudice or bias against certain groups or individuals	biased	차별적인
exacerbated	made worse or more severe	worsened	악화된
devise	to plan or invent something by careful thought	to invent	고안하다
analogously	in a way that is similar or comparable to something else	similarly	유사하게
firsthand	obtained directly from personal experience	direct experience	직접적인 경험
plausible	seeming reasonable or probable	believable	그럴듯한
off-the-shelf	readily available for use without modification	ready-made	기성품의
granularity	the quality of being detailed or specific	detail level	세분성
holonomic base	a system in which all constraints can be expressed as functions of the coordinates and time	fully constrained system	전체 구속 시스템
denoising	the process of removing noise from a signal or data	noise reduction	잡음 제거
audacious	showing a willingness to take bold risks	bold	대담한
dimensionality	the number of independent parameters or coordinates needed to specify a point in a space	number of dimensions	차원 수
lowntirelyr-dimensional	relating to or denoting a space of low dimensions	low-dimensional	저차원의
jointly	together; in combination	together	함께, 공동으로
in-the-wild	occurring in natural, uncontrolled environments	natural setting	자연 환경에서
novelty	the quality of being new, original, or unusual	newness	새로움, 참신함
palpitate	to beat rapidly or strongly	to throb	두근거리다
prohibitively	in a way that is too expensive or too much	excessively	엄두를 못낼 만큼, 엄청나게
ablate	to remove or destroy something by melting, vaporizing, or eroding	to remove	제거하다
occlude	to block or obstruct something	to block	가리다, 막다

]]>

2025-11-29T04:05:49.118Z

Vocabulary & Expressions

Term/Expression	Definition	Simpler Paraphrase	Meaning
contamination	the process of making something impure or unsuitable by contact with something unclean	impurity	오염
insulation	the process of protecting something by surrounding it with a material that reduces or prevents the transmission of heat, sound, or electricity	protective covering	절연
compound	a substance formed from two or more elements chemically bonded together	mixture	화합물
uniformity	the quality of being uniform or consistent	consistency	균일성
high-fidelity	the accurate reproduction of sound or images	accurate reproduction	고성능
electromagnetic	relating to the interrelation of electric currents or fields and magnetic fields	electric and magnetic	전자기
anisotropic	having properties that vary depending on the direction of measurement	direction-dependent	이방성
discontinuity	a point or area where something is not continuous or uniform	interruption	불연속
defy	to openly resist or refuse to obey	to resist	반항하다
In light of	considering or taking into account	taking into account	~을 고려하여
relevance	the quality of being closely connected or appropriate to the matter at hand	pertinence	관련성
circumvent	to find a way around an obstacle or difficulty	to bypass	우회하다
pairwise	relating to or involving pairs of things	in pairs	쌍으로 된
enrich	to improve or enhance the quality or value of something	to enhance	풍부하게 하다
nuanced	characterized by subtle distinctions or variations	subtle	미묘한
affinity	a natural liking or attraction to something	liking	친밀감, 유사성
asymmetric	not identical on both sides of a central line; lacking symmetry	uneven	비대칭
trump	to surpass or outdo someone or something	to surpass	능가하다
diffuse	to spread out over a large area; not concentrated	to spread	확산시키다
inclusion	the act of including or being included within a group or structure	incorporation	포함
task-agnostic	not specific to any particular task or function	task-independent	작업에 구애받지 않는
bifurcate	to divide into two branches or parts	to split	두 갈래로 나누다, 분기하다
governed	controlled or regulated by a set of rules or principles	controlled	지배되는
In this sense	in the way just described; in this context	in this context	이런 의미에서
tractable	able to be easily managed or controlled	manageable	다루기 쉬운
suppress	to put an end to the activities of something	to restrain	억제하다
deliberately	in a careful and intentional manner	intentionally	고의로, 의도적으로
coherent	logical and consistent	logical	일관된, 논리적인
monotonic	consistently increasing or decreasing without any reversals	unchanging	단조로운
atop	on the top of; above	on top of	~의 꼭대기에
saliency	the quality of being particularly noticeable or important	prominence	두드러짐, 현저함
reception	the act of receiving or being received	receiving	수신
jointly	together with one or more other people or things	together	공동으로
extrusion	the process of shaping material by forcing it through a die	shaping process	압출
blockage	an obstruction that prevents movement or flow	obstruction	막힘, 장애
waypoint	a reference point in physical space used for navigation	navigation point	웨이포인트, 경유지
redistribute	to distribute something again or differently	to reallocate	재분배하다
planar	relating to or existing in a flat, two-dimensional surface	flat	평면의
perimeter	the outer boundary or edge of an area or object	boundary	둘레, 주변
ablation	the removal of material from the surface of an object by vaporization, chipping, or other erosive processes	removal process	제거
indiscriminate	not showing careful judgment or distinction	random	무차별적인
sparsification	the process of making something sparse or less dense	thinning	희소화
standpoint	a particular perspective or position from which something is considered	perspective	관점
implication	a possible effect or result of an action or decision	consequence	시사점
cardinal sides	the four main directions: north, south, east, and west	main directions	모든 방향 (동서남북)
undergo	to experience or be subjected to something	to experience	겪다
discernible	able to be perceived or recognized	noticeable	인지할 수 있는
sliceable	able to be cut into thin, flat pieces	cuttable	얇게 자를 수 있는
substance	a particular kind of matter with uniform properties	material	물질

]]>

2025-11-21T11:18:26.198Z

AI Application Categories

LLM Adoption by Use Case

Text Summarization: 62%
Internal Knowledge Management: 60%
Customer Service: 59%
Marketing Copy: 53%
Software Development: 53%
Contract Review: 45%
External Chatbots: 39%
Recommendation Algorithms: 39%

]]>

gracefullight.dev Blog

문제 상황​

원인 분석: Scroll Lock Race Condition​

해결 방법​

1. Timing (실행 시점 분리)​

2. Configuration (iOS 호환성 옵션)​

Ref​

Linear Algebra​

Vectors​

Computational Basis​

Indentity Matrix​

Conjugate Transpose​

Hermitian​

Unitary​

Inner Product​

Orthogonality​

Magnitude​

Outer product​

Qubit​

One-Qubit Gates​

Identity Gate​

Pauli-X Gate​

Pauli-Y Gate​

Pauli-Z Gate​

Hadamard Gate​

Rotation Gate​

The Bloch Sphere​

Latex​

Ref​

Ref​

2. Introduction​

3. Model Structure​

Unified Transformer Architecture​

Probabilistic Decomposition​

4. Combining Discrete & Continuous Actions​

Hybrid Loss Function​

5. Training Recipe​

Stage 1: Pre-training (α=0\alpha = 0α=0)​

Stage 2: Post-training (α=10.0\alpha = 10.0α=10.0)​

6. Evaluation​

Methodology​

Key Findings​

7. Conclusions & Future Work​

Ref​

2. Methodology​

A. Mode-Specific Normalization​

B. Conditional Generator and Training-by-Sampling​

C. Network Structure (CTGAN)​

D. TVAE (Tabular Variational AutoEncoder)​

3. Evaluation & Benchmarks​

Evaluation Metrics​

Benchmarks​

4. Outcomes & Conclusion​

Ref​

LLM Adoption by Use Case​

문제 상황

원인 분석: Scroll Lock Race Condition

해결 방법

1. Timing (실행 시점 분리)

2. Configuration (iOS 호환성 옵션)

Ref

Linear Algebra

Vectors

Computational Basis

Indentity Matrix

Conjugate Transpose

Hermitian

Unitary

Inner Product

Orthogonality

Magnitude

Outer product

Qubit

One-Qubit Gates

Identity Gate

Pauli-X Gate

Pauli-Y Gate

Pauli-Z Gate

Hadamard Gate

Rotation Gate

The Bloch Sphere

Latex

Ref

Ref

2. Introduction

3. Model Structure

Unified Transformer Architecture

Probabilistic Decomposition

4. Combining Discrete & Continuous Actions

Hybrid Loss Function

5. Training Recipe

Stage 1: Pre-training ( $\alpha = 0$ )

Stage 2: Post-training ( $\alpha = 10.0$ )

6. Evaluation

Methodology

Key Findings

7. Conclusions & Future Work

Ref

2. Methodology

A. Mode-Specific Normalization

B. Conditional Generator and Training-by-Sampling

C. Network Structure (CTGAN)

D. TVAE (Tabular Variational AutoEncoder)

3. Evaluation & Benchmarks

Evaluation Metrics

Benchmarks

4. Outcomes & Conclusion

Ref

LLM Adoption by Use Case