A very common reason is a wrong site baseUrl configuration.\n

Current configured baseUrl = / (default value)\n

We suggest trying baseUrl = \n\n',document.body.prepend(n);var e=document.getElementById("__docusaurus-base-url-issue-banner-suggestion-container"),s=window.location.pathname;e.innerHTML="/"===s.substr(-1)?s:s+"/"}

본문으로 건너뛰기

comparative analysis of ml algorithms for 5g coverage prediction review

· 약 3분

Summary

  • Model performance in 5G coverage prediction is primarily determined by the alignment between data characteristics, feature design, and model inductive bias, rather than by model complexity alone.
  • Using real-world 5G NR drive-test data with physics-informed numerical features, this study demonstrates that Random Forest can achieve SOTA performance, outperforming more complex models such as XGBoost and deep neural networks.
  • The results highlight the continued importance of domain-informed feature engineering and show that deep learning becomes advantageous only when the data representation and scale justify its use.

Introduction

  • Coverage prediction in 5G networks is a core component of network planning, optimization, and resource allocation.
  • Conventional propagation and path loss models are limited in their ability to accurately capture the complexity of dense urban environments and the unique characteristics of 5G systems.
  • Machine Learning and Deep Learning have emerged as promising alternatives, as they can model complex non-linear relationships across multiple parameters.
  • However, prior studies typically suffer from several limitations:
    • Most focus on 4G networks or rely on a limited set of input features.
    • Comparisons across a wide range of algorithms are often insufficient.
    • Systematic analyses of feature importance are largely lacking.

The objectives of this study are to:

  • Conduct a comprehensive comparison of multiple ML and DL algorithms using a unified dataset.
  • Identify dominant feature parameters that significantly influence 5G coverage prediction.
  • Demonstrate performance improvements over previously reported methods.

Methods

Data Collection

  • Real-world 5G NR drive test measurements conducted in Bandung, Indonesia (Batununggal area).
  • Approximately 1,500 SS-RSRP samples collected.
  • Deployment includes 10 gNodeBs, each configured with three sectors.
  • Measurement vehicle speed maintained below 30 km/h to minimize fast fading effects.

Input Features (10 Total)

  • 2D Distance between Transmitter and Receiver
  • Frequency
  • Transmitter Tilt Angle
  • Transmitter Azimuth Angle
  • Altitude
  • Elevation Angle
  • Azimuth Offset Angle
  • Tilting Offset Angle
  • Horizontal Distance of Receiver from Transmitter Antenna Boresight
  • Vertical Distance of Receiver from Transmitter Antenna Boresight

Algorithms

Machine Learning (Classification-based):

  • Logistic Regression
  • K-Nearest Neighbors (KNN)
  • Naive Bayes
  • Random Forest
  • Support Vector Machine (SVM)
  • XGBoost
  • LightGBM
  • AdaBoost
  • Bayesian Network Classifier

Deep Learning:

  • Multi-Layer Perceptron (MLP)
  • Long Short-Term Memory (LSTM)
  • Convolutional Neural Network (CNN)

Training and Validation

  • Experiments conducted using Google Colab.
  • 10-fold cross-validation applied for all models.
  • Hyperparameter optimization performed only on the best-performing models.

Evaluation Metrics

  • Regression Metrics: RMSE, MAE, R²
  • Classification Metrics: Accuracy, Precision, Recall, F1-score

Results

Machine Learning

Random Forest:

  • RMSE = 1.14 dB
  • MAE = 0.12
  • R² = 0.97
  • Accuracy / Precision / Recall / F1-score ≈ 98.4%

Deep Learning

Convolutional Neural Network (CNN):

  • RMSE = 0.289

  • MAE = 0.289

  • R² = 0.78

  • Accuracy = 75%

  • Precision = 85.6%

  • Recall = 87.8%

  • F1-score = 89.9%

  • MLP and LSTM exhibit inferior performance compared to CNN.

Feature Importance

  • The 2D Transmitter–Receiver Distance is identified as the most dominant feature across all algorithms.
  • Incorporating horizontal and vertical distances from the antenna boresight significantly improves prediction accuracy.

Comparison with Previous Studies

  • Both Random Forest and CNN achieve lower RMSE values compared to prior studies.
  • Random Forest, in particular, demonstrates state-of-the-art performance relative to existing 4G and 5G coverage prediction research.

Discussion

  • Random Forest
    • Highly effective for small-to-medium-sized datasets with numerical features.
    • Offers strong interpretability and robust performance stability.
  • Convolutional Neural Network
    • Well-suited for grid-based or spatial data representations.
    • Shows greater potential when image-based or satellite-derived features are incorporated.
    • In this study, CNN was applied by transforming numerical features into a matrix-like structure.
  • The results empirically demonstrate that feature design and selection can be more critical than the choice of learning algorithm itself.

Phrasal Verbs 003

· 약 2분

Vocabulary & Expressions

Term/ExpressionDefinitionSimpler ParaphraseMeaning
right off the batImmediately, without delayImmediately즉시, 지체 없이
from the get-goFrom the very beginningFrom the start처음부터
What are friends forUsed to express that friends are there to help each otherThat's why we have friends친구 좋다는게 뭐니
Turn someone downTo reject or refuse someoneReject someone거절하다
Go out (with ~)To date ~Date someone~와 사귀다
Cheat on ~To have a sexual relationship with someone other than your partnerBe unfaithful to ~다른 사람과 성적 관계를 갖다, 바람 피다
Settle downTo start live a steady lifeStart a stable life정착해서 안정된 삶을 살기 시작하다
Break up (with ~)To end a romanctic relationship with ~End a relationship with ~~와 헤어지다
Fall in love (with ~)To have a deep romantic feeling (with ~)Develop romantic feelings for ~~와 사랑에 빠지다
Fall for ~To fall in love with ~Fall in love with ~~와 사랑에 빠지다
Hit it off (with ~)To get along with ~Become good friends quickly (with ~)(~와) 사이좋게 지내다
Drift apartTo become less closeGrow apart서서히 사이가 멀어지다
Talk ~ outTo talk about ~ in order to settle a disagreement or misunderstandingDiscuss ~ to resolve a problem~에 대해 대화로 해결하려고 하다
green-eyedJealous or enviousJealous질투하는
black marketAn illegal market where goods are bought and soldIllegal market암시장
gray areaAn unclear situation or area where the rules are not clearUnclear situation불분명한 상황
in the redLosing money or in debtLosing money적자 상태인
in the blackMaking a profit or not in debtMaking money흑자
in the pinkIn very good healthVery healthy매우 건강한

Phrasal Verbs 002

· 약 1분

Vocabulary & Expressions

Term/ExpressionDefinitionSimpler ParaphraseMeaning
Get together (with ~)TO meet and spend time with each otherMeet up (with ~)만나서 함께 시간을 보내다
Find out ~ / Find ~ outTo discover some informationLearn (about) ~~을 알아내다, 발견하다
Take after ~To resemble or behave like an older family memberResemble ~(외모, 성격이) ~을 닮다
Look like ~To physically resemble an older family memberResemble ~(외모가) ~을 닮다
Get along (with ~)To have a positive relationship with ~Be friendly (with ~)~와 잘 지내다.
Run awayTo leave a place, usually one's home, because of negative circumstancesEscape부정적 환경 떄문에 집을 떠나다, 가출하다
Go against ~To disagree or be opposed to ~Disagree with ~~에 반대하다
End up ~For something to eventually happenEventually become ~결국 ~하게 되다
Cut off ~To separate or block someone from something that they previously had access toIsolate ~~을 끊어내다, 잘라 버리다

Phrasal Verbs 001

· 약 1분

Vocabulary & Expressions

Term/ExpressionDefinitionSimpler ParaphraseMeaning
Calm downTo become less agitated or upsetRelax진정되다, 가라앉다
Cool offTo become less hot or angryChill out진정해지다, 차분해지다
Chill outTo relax completelyTake it easy화를 누그러뜨리다, 열을 식히다, 긴장을 풀다
Cool downTo lower temperature or become less angryCalm down화를 누그러뜨리다
Settle downTo become calm or to establish a stable lifeGet comfortable진정하다
Simmer downTo calm down graduallyRelax slowly흥분을 가라앉히다
Take it easyTo relax and not stressChill out진정하다

End of Year Retro 2025

· 약 3분

현대자동차에 합류하기 전, 인사팀에서는 진급이 거의 확정적인 것처럼 이야기했었다. 하지만 막상 들어와 보니 실상은 많이 달랐다. 나름 열심히 했다고 생각했지만, 작년 첫 고과에서 나는 1월 입사자라는 이유로 평가 대상에서 제외되었고 그 때문에 리스트에도 오르지 못한 것이 아닐까 싶다.

규칙이라면 따라야겠지만, 인사와 내부 사정이 다른 것에 실망스러웠다. 비슷한 연차의 동료들 역시 각자 나름의 불만을 안고 출발했던 것 같다. 바꿀 수 없는 것보다는 바꿀 수 있는 것에 집중하자는 생각으로, 나는 다른 일들에 더 많은 시간을 쓰게 되었다.

1분기는 목표가 사라진 채로 거의 의욕 없이 흘려보냈다. 그 전까지는 대학원 시험을 잘 치르는 것이 가장 큰 목표였는데, 결과 발표가 계속 미뤄지면서 마음이 더 힘들어졌다. 여러 번 연기되었던 결과는 4월 초에야 나왔고, 낙방이었다. GPA인지, 시험 결과인지, SOP인지, 영어 성적인지 무엇이 부족했는지 궁금해 문의했지만 구체적인 답을 듣지는 못했다. 대신 아래와 같은 답변을 받을 수 있었다.

We regret to inform you that your application in this admission cycle was not successful. Please understand that admission into the Master of Science in Machine Learning is very competitive and takes into account a large number of criteria. Due to restrictions on the number of places, we unfortunately have to decline a large number of strong applications. Although this final decision may be disappointing, we are confident that, given your credentials, many other opportunities will open up for you.

"Strong applications"라는 표현에 그나마 위안을 얻었던 것 같다. 하지만 다시 같은 꿈을 꾸기에는 아이엘츠 성적 만료가 코앞이었고, 재도전은 현실적으로 어려워 보였다. 한정된 시간 안에서 무엇을 해야 할지 고민하던 중, 예전부터 와이프가 추천해주던 박람회가 떠올랐다.

코엑스에서 주기적으로 열리는 유학·해외취업·이민 박람회를 알게 되었고, 별다른 기대 없이 무작정 찾아갔다. 영국 석사는 1.5년 코스였지만, 아이엘츠의 영국 전용 버전이 신설되면서 기존 성적을 사용할 수 없었다. 그 대안으로 호주가 눈에 들어왔다. 아랍에미리트의 다른 대학원도 가능성은 있었지만, 예전부터 시드니에서 살아보고 싶다고 말하던 와이프의 영향으로 호주 대학원을 목표로 삼게 되었다.

박람회에서 만난 유학원은 생각보다 체계적이지 않았고, 진행 과정도 만족스럽지는 않았다. 그럼에도 불구하고 인공지능으로 급변하는 소프트웨어 엔지니어 정의와 미래를 대비해야 한다는 생각, 그리고 물리 AI 시대가 오기 전에 관련 기업으로 이직하거나 연구로 방향을 틀어야 한다는 판단은 분명했다. 그렇게 인공지능 석사를 목표로 상담을 이어갔다. 아이엘츠 성적도 있었고, 자금도, 경력도, 의지도 갖추고 있었기에 과정은 비교적 빠르게 진행되었다. 그 무렵 퇴사 의사를 밝혔고, 멋진 동료들로부터 불확실한 앞길에 대한 따뜻한 덕담을 받을 수 있었다는 점이 참 감사했다.

집을 팔고, 가진 것들을 정리한 뒤, 캐리어 두 개만 들고 7월 13일 호주에 도착했다. 예상보다 훨씬 쌀쌀했던 호주의 겨울에 적응하는 일은 쉽지 않았다. 한국에서는 영어를 꽤 한다고 생각했지만, 내가 익숙했던 것은 정제된 영어였다는 걸 곧 깨달았다. 현지인들의 영어와 영어를 세컨드 랭기지로 사용하는 친구들의 영어는 완전히 달랐다.

게다가 한국어로도 아카데믹 레포트에 익숙하지 않은 상태에서, 짧은 시간 안에 영어 레퍼런스를 포함한 IMRD 포맷의 리포트를 작성하는 일은 정말 버거웠다. "이만큼 돈을 쓰는데 석사 학위쯤은 그냥 살 수 있는 거 아니야?"라고 생각했던 과거의 내가 부끄러워졌다. ChatGPT조차 없던 시절에 해외 석박을 마친 선배님들이 새삼 대단하게 느껴졌다. 몇몇 뛰어난 20대 글로벌 인재들을 보며, 그들의 10년 뒤 모습이 궁금해지기도 했다.

수업과 과제, 시험에 적응해 가는 시간 속에서, 와이프가 잠깐씩 지구 반대편으로 와줄 때마다 마음의 여유를 얻을 수 있었다. 그렇게 New South Wales 주에서의 혼인신고까지 마무리하며, 1학기를 High Distinction으로 잘 끝낼 수 있었다. 마지막 달에는 멋진 슈퍼바이저 밑에서 재미있는 연구를 같이할 기회도 얻었다. 시간이 날 때마다 해보고 싶었던 코드 템플릿화와 프롬프트화 역시 그 마지막 달에 시도해볼 수 있었다.

내년에는 어떤 회고를 쓰게 될까. 벌써부터 기대가 된다.

shadcn/ui Drawer & Dialog 사용 시 iOS Safari 스크롤 오류 해결

· 약 2분

shadcn/ui(Vaul)를 사용하여 모바일 웹을 개발하던 중, iOS Safari에서 Drawer 내부의 Dialog를 호출할 때 발생하는 스크롤 충돌 버그를 해결했다.

문제 상황

모바일 뷰에서 '더보기' Drawer를 열고 내부 메뉴를 클릭하여 '공유하기' Dialog를 띄우는 기능이었다.

  1. Drawer Open: 스크롤이 있는 페이지에서 Drawer를 연다.
  2. Interaction: 메뉴 클릭 시 Drawer가 닫히고 Dialog가 열린다.
  3. Bug: 이 전환 과정에서 배경 스크롤이 최상단으로 초기화(Jump to Top) 되거나, 화면이 고정되어 먹통이 된다.

원인 분석: Scroll Lock Race Condition

  • Vaul(Drawer)도 Radix Dialog 기반이라, Drawer close 직후 Dialog open이 같은 프레임/틱에서 일어나면 중첩된 overlay의 스크롤 락/바디 스타일 변경이 겹치며 iOS Safari에서 스크롤 복원/점프가 발생할 수 있다

해결 방법

1. Timing (실행 시점 분리)

onClose나 클릭 핸들러에서 즉시 Dialog를 띄우지 않고, onAnimationEnd 이벤트를 활용하여 Drawer의 정리가 완전히 끝난 후 Dialog를 호출해야 한다.

<Drawer
onAnimationEnd={(open) => {
// Drawer가 완전히 닫힌 후(!open) 실행
if (!open) {
setDialogOpen(true);
}
}}
>

2. Configuration (iOS 호환성 옵션)

Vaul 컴포넌트에 다음 옵션을 추가하여 Safari의 불완전한 스타일 재계산을 방지한다.

<Drawer
disablePreventScroll={true}
repositionInputs={false}
shouldScaleBackground={false}
>
  • disablePreventScroll={true}
    • Vaul이 body에 강제로 스크롤 락 스타일을 적용하지 않도록 한다
    • iOS에서 발생하는 스크롤 점프 현상을 완화
    • 단, Drawer가 열려 있는 동안 배경 스크롤이 허용될 수 있음
  • repositionInputs={false}
    • 가상 키보드 대응을 위한 viewport 재배치 로직 비활성화
    • iOS Safari에서 input + 모달 조합 시 발생하는 레이아웃 충돌 방지
  • shouldScaleBackground={false}
    • Drawer 오픈 시 배경을 scale 처리하지 않음
    • 불필요한 body 스타일 변경을 줄여 안정성 확보

이 옵션들은 완전한 해결책이라기보다는 iOS Safari 환경에서의 리스크를 줄이기 위한 방어설정에 가깝긴하다.

Ref

OpenQASM

· 약 9분

OpenQASM 2.0

OPENQASM 2.0; // 언어 버전 선언

qreg q[1]; // 큐비트 레지스터 q를 선언 (큐비트 1개, 초기 상태 |0⟩)
creg c[1]; // 고전 비트 레지스터 c를 선언 (측정 결과 저장용)

h q[0]; // Hadamard 게이트 적용: |0⟩ → (|0⟩ + |1⟩) / √2

measure q[0] -> c[0]; // q[0]을 측정하고 결과(0 또는 1)를 c[0]에 저장
OPENQASM 2.0;

qreg q[10]; // ∣0000000000⟩
creg c[10];

// x 게이트를 사용하여 처음 세 큐비트를 ∣1⟩ 상태로 변경
x q[0];1000000000
x q[1];1100000000
x q[2];1110000000

measure q[0] -> c[0]; // 1
measure q[1] -> c[1]; // 1
measure q[2] -> c[2]; // 1

Linear Algebra

Vectors

  • In quantum computing, vectors represent quantum states.
  • A 2-dimensional vactor can be written as:
  • ψ=(ψ0ψ1)|\psi\rangle = \begin{pmatrix} \psi_0 \\ \psi_1 \end{pmatrix}

Computational Basis

0=(10),1=(01)|0\rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad |1\rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix}

  • ψ|\psi\rangle is a linear combination of basis states as follows:
  • ψ=ψ00+ψ11=(ψ0ψ1)|\psi\rangle = \psi_0|0\rangle + \psi_1|1\rangle = \begin{pmatrix} \psi_0 \\ \psi_1 \end{pmatrix}

Indentity Matrix

  • Iψ=[ψ0ψ1]\mathbb I \psi\rangle = \begin{bmatrix} \psi_0 \\ \psi_1 \end{bmatrix}
  • I2=[1001]\mathbb I^2 = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

Conjugate Transpose

  • Bra: ψ=(ψ0ψ1)=(ψ0ψ1)=ψ|\psi\rangle^\dagger = \begin{pmatrix} \psi_0 \\ \psi_1 \end{pmatrix}^\dagger = \begin{pmatrix} \overline{\psi_0} & \overline{\psi_1} \end{pmatrix} = \langle \psi|
    • ψ:=ψ\langle \psi| := |\psi\rangle^\dagger
    • ket: ψ|\psi\rangle
    • bra: ψ\langle \psi|
    • bra + ket: dot product ϕψ\langle \phi|\psi\rangle
  • Dagger: A=[abcd]    A=[acbd]A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \implies A^\dagger = \begin{bmatrix} \overline{a} & \overline{c} \\ \overline{b} & \overline{d} \end{bmatrix}
psi_dagger = psi.conjugate().T
psi_dagger = psi.conjugate().transpose()
psi_dagger = psi.H
  • (αA)=αA(\alpha A)\dagger = \overline\alpha A^\dagger
  • (A)=A(A^\dagger)^\dagger = A
  • (A+B)=A+B(A + B)^\dagger = A^\dagger + B^\dagger
  • (AB)=BA(AB)^\dagger = B^\dagger A^\dagger

Hermitian

  • A matrix is equal to its conjugate transpose.
  • H=HH = H^\dagger
A_dagger = A.H
AA_dagger = A * A_dagger
# (AA†)†=AA†?
is_hermitian = AA_dagger.is_hermitian

Unitary

  • A matrix whose conjugate transpose is also its inverse.
  • UU=UU=IU^\dagger U = U U^\dagger = \mathbb I
  • U=(cosθsinθsinθcosθ)U = \begin{pmatrix} \cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{pmatrix}
U = Matrix([
[cos(theta), sin(theta)],
[-sin(theta), cos(theta)]
])

U_dagger = U.H
U_dagger_U = trigsimp(U_dagger * U)

is_unitary = U_dagger_U == I

Inner Product

  • ψϕ=(ψ0ψ1)(ϕ0ϕ1)=ψ0ϕ0+ψ1ϕ1\langle \psi|\phi\rangle = \begin{pmatrix} \overline{\psi_0} & \overline{\psi_1} \end{pmatrix} \begin{pmatrix} \phi_0 \\ \phi_1 \end{pmatrix} = \overline{\psi_0}\phi_0 + \overline{\psi_1}\phi_1
  • ψϕ2=ψϕϕψ|\langle \psi|\phi\rangle|^2 = \langle \psi|\phi\rangle \langle \phi|\psi\rangle
  • ψϕ=ϕψ|\langle\psi|\phi\rangle| = |\langle\phi|\psi\rangle|

Orthogonality

  • 0=(10),1=(01)|0\rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad |1\rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix}
    • orthonormal basis
  • 01=(10)(01)=0\langle 0 | 1 \rangle = \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} 0 \\ 1 \end{pmatrix} = 0
    • 10=0\langle 1 | 0 \rangle = 0
  • 00=(10)(10)=1\langle 0 | 0 \rangle = \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = 1
    • 11=1\langle 1 | 1 \rangle = 1

ψϕ=ψ0ϕ0+ψ1ϕ1\langle \psi|\phi\rangle = \overline{\psi_0}\phi_0 + \overline{\psi_1}\phi_1

  • ψ=ψ00+ψ11,ϕ=ϕ00+ϕ11|\psi\rangle = \psi_0|0\rangle + \psi_1|1\rangle, \quad |\phi\rangle = \phi_0|0\rangle + \phi_1|1\rangle
  • ψ=ψ00+ψ11\langle \psi| = \overline{\psi_0}\langle 0| + \overline{\psi_1}\langle 1|
  • ψϕ=ψ0ϕ000+ψ0ϕ101+ψ1ϕ010+ψ1ϕ111\langle \psi|\phi\rangle = \overline{\psi_0}\phi_0\langle 0|0\rangle + \overline{\psi_0}\phi_1\langle 0|1\rangle + \overline{\psi_1}\phi_0\langle 1|0\rangle + \overline{\psi_1}\phi_1\langle 1|1\rangle
  • 01=0,10=0\langle 0|1\rangle = 0, \quad \langle 1|0\rangle = 0

Magnitude

ψ2=ψ02+ψ12\|\psi\rangle\|^2 = |\psi_0|^2 + |\psi_1|^2

  • ψ=ψψ=ψ02+ψ12\|\psi\rangle\| = \sqrt{\langle \psi|\psi\rangle} = \sqrt{|\psi_0|^2 + |\psi_1|^2}
  • ψ2=ψψ\| |\psi\rangle\|^2 = \langle \psi|\psi\rangle
    • ψψ=ψ0ψ0+ψ1ψ1=ψ02+ψ12\langle \psi|\psi\rangle = \overline{\psi_0}\psi_0 + \overline{\psi_1}\psi_1 = |\psi_0|^2 + |\psi_1|^2

Outer product

  • ψϕ=(ψ00+ψ11)(ϕ00+ϕ11)=ψ0ϕ000+ψ0ϕ101+ψ1ϕ010+ψ1ϕ111|\psi\rangle\langle\phi| = (\psi_0 | 0\rangle + \psi_1 |1\rangle)(\overline{\phi_0}\langle 0| + \overline{\phi_1}\langle 1|) \\ \quad =\psi_0\overline{\phi_0}|0\rangle\langle 0| + \psi_0\overline{\phi_1}|0\rangle\langle 1| + \psi_1\overline{\phi_0}|1\rangle\langle 0| + \psi_1\overline{\phi_1}|1\rangle\langle 1|
  • ψϕ=(ψ0ψ1)(ϕ0ϕ1)=(ψ0ϕ0ψ0ϕ1ψ1ϕ0ψ1ϕ1)|\psi\rangle\langle\phi| = \begin{pmatrix} \psi_0 \\ \psi_1 \end{pmatrix} \begin{pmatrix} \overline{\phi_0} & \overline{\phi_1} \end{pmatrix} = \begin{pmatrix} \psi_0\overline{\phi_0} & \psi_0\overline{\phi_1} \\ \psi_1\overline{\phi_0} & \psi_1\overline{\phi_1} \end{pmatrix}
  • 0=(10),1=(01)|0\rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad |1\rangle = \begin{pmatrix} 0 & 1 \end{pmatrix}
  • 00=(10)(01)=(0100)|0\rangle\langle 0| = \begin{pmatrix} 1 \\ 0 \end{pmatrix} \begin{pmatrix} 0 & 1 \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}
  • A=(a00a01a10a11)=a0000+a0101+a1010+a1111A = \begin{pmatrix} a_{00} & a_{0 1} \\ a_{1 0} & a_{1 1} \end{pmatrix} \\ \quad = a_{00}|0\rangle\langle 0| + a_{01}|0\rangle\langle 1| + a_{10}|1\rangle\langle 0| + a_{11}|1\rangle\langle 1|

Tensor Product

  • ψϕ=(ψ0ψ1)(ϕ0ϕ1)=(ψ0ϕ0ψ0ϕ1ψ1ϕ0ψ1ϕ1)|\psi\rangle \otimes |\phi\rangle = \begin{pmatrix} \psi_0 \\ \psi_1 \end{pmatrix} \otimes \begin{pmatrix} \phi_0 \\ \phi_1 \end{pmatrix} = \begin{pmatrix} \psi_0\phi_0 \\ \psi_0\phi_1 \\ \psi_1\phi_0 \\ \psi_1\phi_1 \end{pmatrix}
  • ψϕψϕψϕ|\psi\rangle \otimes |\phi\rangle \equiv |\psi\rangle|\phi\rangle \equiv |\psi\phi\rangle
  • AB=(a00Ba01Ba10Ba11B)A \otimes B = \begin{pmatrix} a_{00}B & a_{01}B \\ a_{10}B & a_{11}B \end{pmatrix}
  • 0110=(0100)(0010)|0\rangle \langle 1| \otimes |1\rangle \langle 0| = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} \otimes \begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}
  • 0110=(0(0010)1(0010)0(0010)0(0010))=(0000001000000000)|0\rangle \langle 1| \otimes |1\rangle \langle 0| = \begin{pmatrix} 0\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix} & 1\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix} \\ 0\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix} & 0\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix} \end{pmatrix} = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix}
  • 0110(01)(10)01010110|0\rangle \langle 1| \otimes |1\rangle \langle 0| \equiv (|0\rangle \otimes |1\rangle)(\langle 1| \otimes \langle 0|) \\ \quad \equiv |0\rangle|1\rangle \langle 0|\langle 1| \\ \quad \equiv |01\rangle \langle 10|
    • ketket,brabraket \otimes ket, \quad bra \otimes bra

Qubit

ψ=α0+β1|\psi\rangle = \alpha|0\rangle + \beta|1\rangle

  • where α,β\alpha, \beta are complex numbers satisfying α2+β2=1|\alpha|^2 + |\beta|^2 = 1.
  • phase factor: eiϕe^{i\phi}, turn the state by angle ϕ\phi in the complex plane, but does not affect measurement probabilities.
    • eiϕ=1|e^{i\phi}| = 1

One-Qubit Gates

Identity Gate

I=(1001)\mathbb{I} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}

Pauli-X Gate

X=(0110)X = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}

  • NOT gate

Pauli-Y Gate

Y=(0ii0)Y = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix}

Pauli-Z Gate

Z=(1001)Z = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}

Hadamard Gate

H=12(1111)H = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}

Rotation Gate

R(θ)=(cosθsinθsinθcosθ)R(\theta) = \begin{pmatrix} \cos{\theta} & \sin{\theta} \\ -\sin{\theta} & \cos{\theta} \end{pmatrix}

The Bloch Sphere

ψ=cos(θ)0+eiϕsin(θ)1|\psi\rangle = \cos(\theta) |0\rangle + e^{i\phi}\sin(\theta)|1\rangle

  • where 0θπ0 \leq \theta \leq \pi and 0ϕ<2π0 \leq \phi < 2\pi.
  • θ\theta: the polar (or colatitude) angle, measured from the "north pole" of the sphere.
    • polar angle: 편각
  • ϕ\phi: the azimuthal (or longitude) angle around the equator.
    • azimuthal angle: 방위각

bloch_sphere

Two-Qubit Gates

CNOT Gate

Controlled-NOT or CX gate

CNOT=(1000010000010010)CNOT = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{pmatrix}

  • CNOT gate flips the second qubit (target) if the first qubit (control) is 1|1\rangle.

SWAP Gate

SWAP=(1000001001000001)SWAP = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}

  • SWAP gate exchanges the states of the two qubits.

Controlled-Z Gate

CZ=(1000010000100001)CZ = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & -1 \end{pmatrix}

  • CZ gate applies a Z gate to the second qubit if the first qubit is in state 1|1\rangle.

Bases

  • Computational Basis: {0,1}\{|0\rangle, |1\rangle\}
  • two qubits: {00,01,10,11}\{|00\rangle, |01\rangle, |10\rangle, |11\rangle\}
  • three qubits: {000,001,010,011,100,101,110,111}\{|000\rangle, |001\rangle, |010\rangle, |011\rangle, |100\rangle, |101\rangle, |110\rangle, |111\rangle\}

Rule of Thumb

What starts on the left of the tensor product stays on the left.

  • ψϕψϕψϕ|\psi\rangle \otimes |\phi\rangle \equiv |\psi\rangle|\phi\rangle \equiv |\psi\phi\rangle
  • (ψϕ)=ψϕ(|\psi\rangle \otimes |\phi\rangle)^* = \langle\psi| \otimes \langle\phi|
  • (αψ+βϕ)ω=αψω+βϕω(\alpha |\psi\rangle + \beta |\phi\rangle) \otimes |\omega\rangle = \alpha|\psi\rangle \otimes |\omega\rangle + \beta |\phi\rangle \otimes | \omega\rangle
  • (ψϕ)(ωη)=ψωϕη)(\langle\psi| \otimes \langle\phi|)(|\omega\rangle \otimes |\eta\rangle) = \langle\psi|\omega\rangle \cdot \langle\phi|\eta\rangle)
  • (A+B)C=AC+BC(A + B) \otimes C = A \otimes C + B \otimes C
  • A(B+C)=AB+ACA \otimes (B + C) = A \otimes B + A \otimes C
  • (AB)(CD)=(AC)(BD)(A \otimes B)(C \otimes D) = (AC) \otimes (BD)
  • (AB)=AB(A \otimes B)^* = A^* \otimes B^*

Entanglement

  • Ψ=α0000+α0101+α1010+α1111|\Psi\rangle = \alpha_{00} |00\rangle + \alpha_{01} |01\rangle + \alpha_{10} |10\rangle + \alpha_{11} |11\rangle
    • where Ψ2=1\|\Psi\rangle\|^2 = 1
  • If the state is not separable, it is entangled.
  • A state is separable if it can be written as a tensor product of two individual qubit state.
  • Ψ=12(00+01)=120(0+1)=012(0+1)|\Psi\rangle = \frac{1}{\sqrt{2}} (|00\rangle + |01\rangle) \\ \quad = \frac{1}{\sqrt{2}} |0\rangle \otimes (|0\rangle + |1\rangle) \\ \quad = |0\rangle \otimes \frac{1}{\sqrt{2}} (|0\rangle + |1\rangle)
    • separable
  • Φ=12(00+11)|\Phi\rangle = \frac{1}{\sqrt{2}} (|00\rangle + |11\rangle)
    • Φ=(a0+b1)(c0+d1)=ac00+ad01+bc10+bd11|\Phi\rangle = (a|0\rangle + b|1\rangle) \otimes (c|0\rangle + d|1\rangle) \\ \quad = ac|00\rangle + ad|01\rangle + bc|10\rangle + bd|11\rangle
    • ad=0,bc=0ad = 0, \quad bc = 0 which is impossible.
    • entangled

Latex

  • \texttip{}: Displays a tooltip when hovering over the equation.
  • \toggle{}: Toggles between two expressions.
  • \begin{align}: Aligns equations, where the ampersand (&) marks the alignment points.
  • \bbox[color, padding]{}: Puts a bounding box with the specified color and padding around an expression.
  • \boldsymbol{}: Renders a bold version of symbols like variables.
  • \cancel{}: Strikes through an expression.
  • \cancelto{value}{}: Strikes through and labels with the specified value.
  • \begin{cases}: Creates a piecewise function with conditions.
  • \color{}: Applies a color to text or math expressions. You can use predefined colors or hex values.
  • \enclose{}: Encloses an expression with various effects like circles or strikes.
  • [mathcolor="color", mathbackground="color"]: Adds custom colors to the enclosing effect.
  • \xmapsto{}: Creates an arrow with a label for mapping.
  • \xlongequal{}: Creates a long equals sign with a label.
  • \ce{}: Renders chemical equations or formulas.
  • \newcommand{\ket}[1]{\left|#1\right\rangle}: defines a custom command for ket notation. \ket{\psi}
  • \tag{}: Assigns a custom tag to an equation.
  • \unicode{}: Inserts a Unicode character using its code.

Ref

AI 시대의 개발자

· 약 3분

AI 시대에 개발자는 어떻게 살아남아야 할까? LLM의 코드 퀄리티는 이미 나보다 탁월하고, 더 빠른 속도로 코드를 찍어낸다. 이런 상황에서 나는 무엇을 해야 할까? 5년 안에 대부분의 개발자가 대체될 것이라는 두려움 속에서, 나는 지금 무엇을 배워야 할까?

ChatGPT 이후의 모델들을 사용하며 느낀 점은, 결국 내가 문제를 어떻게 분해하고, 어떤 데이터를 먹이로 주느냐(feed)가 결과물의 퀄리티를 결정한다는 것이다. 예를 들면, 시니어 개발자가 AI를 활용해 더 뛰어난 아웃풋을 낼 수 있는 이유는 좋은 코드와 아키텍처를 수없이 봐왔고, 개념을 피부로 느껴왔으며, 협업과 실무 경험을 통해 축적된 노하우가 있기 때문이다. 즉, 모델에 입력할 수 있는 키워드와 컨텍스트의 크기가 다르다는 뜻이다.

어디서 좋은 코드와 아키텍처를 배울 수 있을까? 소프트웨어 엔지니어링의 정수는 오픈소스에 있기에, 그곳에서 답을 찾을 수 있을 것이다.

나는 매주 Github 인기 레파지토리를 30분씩 훑어보는(Skim reading) 루틴을 가지고 있다. 관심 가는 프로젝트는 스타를 눌러 북마크 해두고, 폴더 구조와 사용된 패키지, 그리고 그 구현체를 뜯어본다. 해당 소스에서 참신함이 느껴진다면, 포크해서 LLM을 통해 개괄하고, 핵심 기능을 찾아 바닥부터 직접 만들어 보기도 한다.

나보다 더 많은 시간을 쏟고, 해당 문제에 대해 더 깊이 고민했을 오픈소스 컨트리뷰터들의 방법론을 내 것으로 만드는 것. 그것이 AI-native 시대에 대체되지 않는 개발자가 되는 빠른 길 중 하나일 것이다.

그렇다면 AI 시대의 개발자의 역할은 한마디로 무엇일까? 나는 생성적 적대 신경망(GAN)에서의 Discriminator, 즉 판별자의 역할이라고 생각한다.

AI 모델이 쏟아내는 코드와 아키텍처가 논리적으로 맞는지, 효율적인지, 혹은 더 나은 방법은 없는지를 판단하는 능력. 그리고 그 판단에 맞춰 프롬프트를 조정하고, AI에게 피드백을 주어 결과물을 개선해 나가는 능력. 이것이 개발자가 가져야 할 핵심 역량이다.

가트너는 이러한 개발 방식을 AI-native Software Engineering이라고 정의한다 (Khandabattu & Tamersoy, 2025). 단순 코딩은 AI에게 위임하고, 개발자는 더 본질적인 과업(Meaningful tasks)에 집중해야 한다는 것이다. 기계적인 구현에서 벗어나 비판적 사고(Critical thinking), 인간 고유의 독창성(Ingenuity), 그리고 사용자를 향한 공감(Empathy) 같은 영역 말이다. 결국 우리가 판별자가 되어야 하는 이유는, 인간만이 할 수 있는 이 고유한 가치를 지키고 확장하기 위함이다.

다른 한편으론, 리누스 토발즈의 말처럼 지금의 AI 하이프(Hype)의 90%는 마케팅이고 10%만이 진짜일 것이다 (TFiR, 2024, 37:59). 그 10%를 가려내기 위해서는 이론을 더 깊게 파고들어야 한다. 그리고 나서 이론이 어떻게 엔지니어링을 통해 구현되는지를 경험해보면 판별할 수 있는 눈이 생길 것이다.

프레임워크나 라이브러리는 금방 변한다. 하지만 그 기저에 있는 개념들은 바뀌지 않는다. 왜냐하면 모든 프로그래밍은 결국 자료구조와 분할 정복으로 귀결되기 때문이다. AI는 단지 그 추상화된 레이어를 한 단계 더 높여줄 뿐이다.

다수는 AI 시대에 학위가 필요 없어질 것이라 말하지만, 나는 다르게 생각한다. 정말로 그 개념을 제대로 알고 있는지가 중요해지는 만큼, 학위나 자격증처럼 기초 지식을 증명하는 수단이 오히려 더 중요해질 것이다.

공학이 중요하다. 다른 모든 것은 그 개념의 implementation일 뿐이다.

Ref

π0.5 Review

· 약 5분

1. Abstract

  • Core Concept: π0.5\pi_{0.5} is a model designed for broad generalization by utilizing co-training on heterogeneous tasks.
  • Method: It combines hybrid multi-modal examples including image observations, language commands, object detection, semantic subtask prediction, and low-level actions.
  • Impact: This knowledge transfer is essential for effective generalization, enabling the execution of long-horizon and dexterous manipulation skills in the wild.

2. Introduction

  • Goal: Design training recipes that provide the breadth of knowledge required for robots to generalize at multiple levels of abstraction, from physical behaviors to scene semantics.
  • Unified Framework: By casting different modalities into a single sequence modeling framework, VLAs can be trained on diverse sources: robot data, language data, computer vision tasks, and combinations thereof.
  • Capabilities: The model can control mobile manipulators to perform varied household tasks even in homes never seen during training.
  • Hierarchical Architecture:
    • Training: Pre-trains on a heterogeneous mixture of tasks, then fine-tunes specifically for mobile manipulation using both low-level action examples and high-level semantic actions (e.g., predicting "pick up the cutting board").
    • Inference: At runtime, the model first predicts a semantic subtask (inferring appropriate next behavior based on scene semantics) and then predicts the robot action chunk based on this subtask.

3. Model Structure

Pi 0.5 model architecture

Unified Transformer Architecture

  • The model corresponds to a transformer taking in NN multimodal input tokens x1:Nx_{1:N} (images, text, and actions) and producing multimodal outputs.
  • Input Processing: Different token types are processed by specific encoders (e.g., Vision Encoder for images, Embedding Matrix for text).
  • Output Split: The output is split into two streams:
    • Text Logits (y1:Mly^{l}_{1:M}): Used for QA, reasoning, and dividing the task (predicting subtasks l^\hat{l}).
    • Action Tokens (y1:Hay^{a}_{1:H}): Produced by a separate Action Expert to create continuous outputs for robot control.

Probabilistic Decomposition

The distribution captured by the model is decomposed using the chain rule and a conditional independence assumption:

πθ(at:t+H,l^ot,l)=πθ(at:t+Hot,l^)πθ(l^ot,l)\pi_{\theta}(a_{t:t+H}, \hat{l} | o_{t}, l) = \pi_{\theta}(a_{t:t+H} | o_{t}, \hat{l}) \cdot \pi_{\theta}(\hat{l} | o_{t}, l)
  • Assumption: The action distribution (at:t+Ha_{t:t+H}) does not depend on the overall task prompt (ll), but only on the predicted subtask (l^\hat{l}).
  • High-Level Inference: πθ(l^ot,l)\pi_{\theta}(\hat{l} | o_{t}, l) (Predicting "what to do next").
  • Low-Level Inference: πθ(at:t+Hot,l^)\pi_{\theta}(a_{t:t+H} | o_{t}, \hat{l}) (Predicting "how to move").

4. Combining Discrete & Continuous Actions

The model employs a hybrid approach to balance training efficiency with inference speed and quality.

  • The Dilemma:
    • Discrete Tokens (FAST): Fast training, but requires slow autoregressive decoding during inference.
    • Continuous (Flow Matching): High quality and smooth control, but computationally expensive to train from scratch on massive datasets.
  • The Solution: Train on discretized actions (FAST) but use Flow Matching for inference.
    • Attention Masking: Ensures discrete and continuous action representations do not attend to each other during joint training.

Hybrid Loss Function

The model minimizes a combined objective:

E[H(x,fθl)Cross Entropy+αωafθa2MSE for Flow]\mathbb{E} \left[ \underbrace{H(x, f^l_\theta)}_{\text{Cross Entropy}} + \alpha \underbrace{\| \omega - a - f^a_\theta \|^2}_{\text{MSE for Flow}} \right]
  • Cross Entropy: For text and discrete action tokens.
  • MSE: For the Flow Matching vector field (Action Expert).

5. Training Recipe

The training is split into two distinct stages based on the α\alpha parameter and the inclusion of the Action Expert.

Stage 1: Pre-training (α=0\alpha = 0)

  • Goal: Efficient large-scale learning.
  • Method: Action Expert is OFF. Trains as a standard auto-regressive transformer using next-token prediction for text and discrete FAST action tokens.
  • Datasets:
    • MM: Mobile Manipulator data (100+ homes).
    • ME: Multi-Environment non-mobile robots.
    • CE: Cross-Embodiment laboratory data (diverse tasks like folding).
    • HL: High-Level subtask prediction data.
    • WD: Multimodal Web Data (VQA, captioning).

Stage 2: Post-training (α=10.0\alpha = 10.0)

  • Goal: Specialization for mobile manipulation and enabling continuous control.
  • Method: Action Expert is ON.
    • Initialized with random weights.
    • Jointly trains next-token prediction (to preserve text capabilities) and Flow Matching for continuous actions.
  • Key Addition (Verbal Instructions - VI):
    • Data collected by "teleoperating" the robot using language commands (e.g., expert users selecting sub-tasks step-by-step).
    • Crucial for training the model to predict high-quality subtasks (l^\hat{l}).

6. Evaluation

Methodology

  • Settings: Tested in entirely new kitchens and bedrooms not seen during training.
  • Tasks: Long-horizon tasks like cleaning kitchens, putting laundry away, and making beds.
  • Metrics: Task progress (percentage of steps completed) and Language Following Rate.

Key Findings

  • Generalization: π0.5\pi_{0.5} successfully performs multi-stage tasks in real, unseen homes.
  • Scaling: Performance improves consistently as the number of training environments increases.
  • Ablation Studies:
    • Cross-Embodiment (CE/ME): Excluding data from other robots significantly degrades performance, indicating strong transfer learning.
    • Web Data (WD): While less critical for general task progress, it is essential for Out-of-Distribution (OOD) object generalization and language following.
  • Comparison: Significantly outperforms π0\pi_0 and the π0\pi_0-FAST+Flow baseline.

7. Conclusions & Future Work

  • Current Status: π0.5\pi_{0.5} demonstrates that co-training with heterogeneous data enables end-to-end robotic systems to perform long-horizon, dexterous skills in open-world settings.
  • Limitations:
    • Struggles with physical constraints (hard-to-open cabinets) or partial observability.
    • Limited to relatively simple prompts based on training data.
  • Future Directions:
    • Incorporating richer context and memory for better handling of partial observability.
    • Expanding data sources, particularly exploring verbal instructions as a powerful new supervision modality.

Ref

  • Intelligence, P., Black, K., Brown, N., Darpinian, J., Dhabalia, K., Driess, D., Esmail, A., Equi, M., Finn, C., & Fusai, N. (2025). π0.5: a Vision-Language-Action Model with Open-World Generalization. arXiv preprint arXiv:2504.16054.

Conditional GAN Review

· 약 4분

1. Problem Statement

Modeling tabular data poses unique challenges for GANs, which existing statistical and deep neural network models fail to address properly:

  • Mixed Data Types: Tabular data contains a mix of discrete and continuous columns.
  • Non-Gaussian & Multimodal Distributions: Continuous columns often have multiple modes (peaks) and do not follow a simple Gaussian distribution.
  • Imbalanced Discrete Columns: Categorical columns are often heavily imbalanced (e.g., 90% 'Normal', 10% 'Fraud'), leading to mode collapse where minor categories are ignored.

2. Methodology

To address these challenges, the authors propose CTGAN, which introduces Mode-specific Normalization, a Conditional Generator, and a Training-by-Sampling strategy.

A. Mode-Specific Normalization

  • Challenge: Representing continuous values with arbitrary, non-Gaussian distributions is non-trivial. Simple Min-Max normalization to [-1, 1] fails on multimodal data.

  • Solution: Treat each continuous column CiC_i independently using a Variational Gaussian Mixture Model (VGM).

    1. Estimate the number of modes mim_i and fit a Gaussian mixture.

    2. Represent each value as a concatenation of:

      • One-hot vector (β\beta): Indicates which mode the value belongs to.
      • Scalar (α\alpha): Represents the normalized value within that mode.

B. Conditional Generator and Training-by-Sampling

  • Challenge: Random sampling during training neglects minor categories in imbalanced columns, causing the generator to fail in learning them.
  • Solution: Condition the generator to produce specific discrete values.
    • Conditional Vector: defined as cond=m1...mNdcond = m_1 \oplus ... \oplus m_{N_d}.

      • Example: For columns D1={1,2,3}D_1=\{1,2,3\} and D2={1,2}D_2=\{1,2\}, the condition (D2=1)(D_2=1) is represented as mask vectors m1=[0,0,0]m_1=[0,0,0] (ignored) and m2=[1,0]m_2=[1,0] (selected).
    • Generator Loss: Penalize the generator if it fails to produce the requested condition. This is done by adding the cross-entropy between the input mask mim_{i^*} and the generated output d^i\hat{d}_{i^*} to the loss.

    • Training-by-Sampling (Curriculum):

      1. Create zero-filled mask vectors.
      2. Randomly select a discrete column DiD_i.
      3. Construct a PMF based on the log-frequency of values in that column (giving minor classes a higher chance).
      4. Sample a value kk^* based on this PMF and set the mask bit to 1.
      5. This ensures the model evenly explores all possible discrete values, not just the majority classes.

CTGAN Model

C. Network Structure (CTGAN)

  • Architecture: Two fully-connected hidden layers for both Generator and Critic.
    • Generator: Batch Normalization + ReLU.
    • Critic: Dropout + Leaky ReLU.
  • Optimization: WGAN loss with gradient penalty + Adam optimizer (lr=2104lr=2 \cdot 10^{-4}).

Generator Flow:

h0 = z ⊕ cond
h1 = h0 ⊕ ReLU(BN(FC_256(h0)))
h2 = h1 ⊕ ReLU(BN(FC_256(h1)))
α_hat = tanh(FC(h2)) # Continuous scalar
β_hat = gumbel_0.2(FC(h2)) # Continuous mode (one-hot)
d_hat = gumbel_0.2(FC(h2)) # Discrete value (one-hot)

Critic Flow:

h0 = r1 ⊕ ... ⊕ r10 ⊕ cond1 ⊕ ... ⊕ cond10
h1 = drop(leaky_0.2(FC_256(h0)))
h2 = drop(leaky_0.2(FC_256(h1)))
Score = FC_1(h2)

D. TVAE (Tabular Variational AutoEncoder)

The authors also propose TVAE as a robust baseline for comparison.

  • Uses two networks to model pθ(rjzj)p_\theta(r_j|z_j) and qϕ(zjrj)q_\phi(z_j|r_j).
  • Optimized using Evidence Lower-Bound (ELBO) loss.
  • Treats continuous variables (α\alpha) as Gaussian and discrete variables (β,d\beta, d) using softmax.

3. Evaluation & Benchmarks

Evaluation Metrics

  1. Likelihood Fitness (Simulated Data):

    • Uses a known Oracle SS (Gaussian Mixture or Bayesian Network).
    • Lsyn\mathcal{L}_{syn}: Likelihood of synthetic data on original Oracle SS. (Prone to overfitting).
    • Ltest\mathcal{L}_{test}: Train a new Oracle SS' using synthetic data TsynT_{syn}, then compute likelihood of real test data TtestT_{test} on SS'. (Detects mode collapse).
  2. Machine Learning Efficacy (Real Data):

    • Train classifiers/regressors on Synthetic Data (TsynT_{syn}).
    • Test them on Real Test Data (TtestT_{test}).
    • Metrics: Accuracy, F1-Score (Classification), R2R^2 (Regression).

Benchmarks

  • Baselines: 2 Bayesian Networks (CLBN, PrivBN) + 3 Deep Learning methods (MedGAN, VeeGAN, TableGAN).
  • Simulated Datasets: Grid, GridR (Grid + Offset), Ring (GMM Oracles), and Bayesian Networks (Alarm, Child, Asia, Insurance).
  • Real Datasets: 6 UCI datasets (Adult, Census, etc.), Credit (Kaggle), MNIST28.

4. Outcomes & Conclusion

  • Performance: CTGAN outperforms all deep learning methods and surpasses Bayesian networks on 87.5% of datasets.
  • TVAE vs CTGAN: TVAE is highly competitive and outperforms CTGAN in several cases. However, CTGAN is preferred for privacy applications (easier to implement Differential Privacy) since the generator doesn't access real data during inference.
  • Key Contributions:
    • Mode-specific normalization solves the non-Gaussian/multimodal distribution issue.
    • Conditional Generator & Training-by-sampling effectively solve the imbalanced data issue.

Ref

  • Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling tabular data using conditional gan. Advances in neural information processing systems, 32.