自己的个人毕设,联邦学习相关,主要是多方安全计算中秘密分享的一种算法实现,以完成不需要第三方的安全隐私传输。(An algorithm implementation of secret sharing in multi-party secure computing to complete secure privacy transmission without a third party.)
人工智能发展近70年来,机器学习对合适的数据形式与足够的数据量的需求只增不减,特别是过去的十几年里,随着深度学习神经网络被提出,加上近年来算法和算力的巨大提升和大数据的出现,人工智能对训练数据的渴求空前高涨。
但是现实的情况却不尽人意:很多领域的数据量十分有限且数据质量参差不齐,同时这些数据涉及的多个领域中,数据很多是以孤岛的形式存在的,由于各种不可避免的社会因素从中影响,分散在各处的数据难以相互整合。此外,重视数据隐私和安全已经成为了世界性的趋势,各领域不论行业或个人都不希望泄露相关数据。为了打破数据壁垒,同时保障用户隐私,联邦学习应运而生。
本文设计并实现了一个基于秘密分享协议的逻辑回归联邦模型,使用Python实现,用于进行纵向联邦学习,并对所得模型预测。秘密分享协议把用户输入的模型参数与特征值作为秘密进行分片,通过Syft-Torch框架进行加法秘密分享给各参与方,保证需要所有参与方协作进行合并才能还原出完整秘密;再利用离线-在线架构,将离线模式生成的一系列随机数运用到在线模式的计算中,与秘密分片一同进行逻辑回归训练,得到反馈更新后的模型参数,最后利用得到的模型参数进行预测,判断出用户是否为目标用户。
在预测阶段,通过导入MNIST数据集,利用对模型精度AUC(Area Under the Curve)与样本区分度KS(Kolmogorov-Smirnov)的计算,从正确性、运行效率及可拓展性对整体算法进行评估。同时,加入操作特性曲线(Receiver Operating Characteristic, ROC)的绘制,从视觉角度更加直观的能感受到所得联邦模型的准确性,最后将预测与MNIST数据集中的标签值进行比较。
关键词:联邦学习;秘密分享;隐私保护;Syft-Torch;人工智能
With the development of artificial intelligence for nearly 70 years, the demand of machine learning for appropriate data form and sufficient data volume only increases, especially in the past decade, with the deep learning neural network being proposed, coupled with the great improvement of algorithm and computing power and the emergence of big data in recent years, the demand of artificial intelligence for training data is unprecedented.
However, the reality makes people sigh: the amount of data in many fields is very limited and the quality of data is uneven. At the same time, many of the data involved in many fields exist in the form of isolated islands. Due to the influence of various inevitable social factors, it is difficult for scattered data to integrate with each other. In addition, it has become a worldwide trend to pay attention to data privacy and security, and no industry or individual wants to disclose relevant data. In order to break the data barrier and protect the privacy of users, federal learning emerges as the times require.
In this paper, we design and implement a logistic regression Federation model based on secret sharing protocol, which is implemented in Python for vertical Federation learning and prediction of the model. In secret sharing protocol, the model parameters and eigenvalues input by users are divided into pieces as secrets, and the added secrets are shared to each participant through syft torch framework, which ensures that all participants need to cooperate to merge to restore the complete secret; Then, using the offline online architecture, a series of random numbers generated by the offline mode are applied to the calculation of the online mode, and the logistic regression training is carried out together with the secret partition to get the model parameters after the feedback update. Finally, the model parameters are used to predict and judge whether the user is the target user.
In the prediction stage, the whole algorithm is evaluated from the correctness, operation efficiency and expansibility by importing MNIST data set and using the calculation of the model precision AUC (area under the curve) and the sample division KS (Kolmogorov Smirnov). At the same time, the accuracy of the federal model can be felt more intuitively from the visual point of view by adding the drawing of receiver operating characteristic (ROC). Finally, the prediction is compared with the label value in MNIST data set.
KeyWords:federal learning; secret sharing; privacy policy; syft-torch; artificial intelligence