ããã«ã¡ã¯ãRed Hatã§ã½ãªã¥ã¼ã·ã§ã³ã¢ã¼ããã¯ãããã¦ããç³å·ã§ãã
æ¬æ¥Meta社ãæ°ããªãªã¼ãã³LLMã¨ãã¦Llama 3.1ãçºè¡¨ããLLMçéã§ã¯å¤§ããªçãä¸ãããè¦ãã¦ãã¾ãã
ããã¾ã§ãMeta社ã¯ç¬èªã®LLMã¨ãã¦Llamaã·ãªã¼ãºãçºè¡¨ãã¦ãã¦ãã¾ããããæ§è½é¢ã§ã¯OpenAIã®GPT-4oããAnthropicã®Claude 3.5 Sonnetãªã©ã®ã¯ãã¼ãºãLLMã«ä¸æ©åãã§ãã¾ããã§ãããããããæ°ããªLlama 3.1ã§ã¯ããããLLMã«ãè² ããªãããããããã¤ãã®ãã³ããã¼ã¯ã§ã¯ä¸åãçµæãåºããä»é常ã«å¤§ããªæ³¨ç®ãéãã¦ãã¾ãã
ããã ãã§ããªã¼ãã³LLMã«ã¨ã£ã¦å¤§ããªåé²ã¨è¨ãã¾ãããæ´ã«Llama 3.1ã§é©ç¨ãããæ°ããªã©ã¤ã»ã³ã¹ã¯Llama 3.1ã§çæãããã¼ã¿ãä»ã®LLMã®å¦ç¿ã«å©ç¨ãããã¨ã許å¯ãã¦ãã¾ããããã¾ã§å¤ãã®ã¯ãã¼ãºãLLMã§ã¯å©ç¨è¦ç´ã®ä¸ã§ããããå©ç¨ãç¦æ¢ãã¦ãã¾ããããä»åLlama 3.1ããã®ãããªæ¹éã示ãããã¨ã§ãä»å¾å°ä¸è¦æ¨¡ã®LLMã§çæãããå¦ç¿ãã¼ã¿ãæ´»ç¨ãããã¡ã¤ã³ãã¥ã¼ãã³ã°ãè¡ãããé«æ§è½ãªã¢ãã«ãããå¢ãã¦ãããã¨ãæå¾
ã§ãã¾ãã
対å¿ããè¨èªã¨ãã¦æ¥æ¬èªã¯å«ã¾ãã¦ãã¾ããããä»å¾æ´¾çããã¢ãã«ã§å¯¾å¿ãè¡ããã¦ããã¨èãããã¾ãã
æ¬ããã°ã§ã¯ãããªLlama 3.1ãOpenShift AIä¸ã§åããã¦ã¿ããã¨æãã¾ãã
ãªãæ¬è¨äºã¯ããã¾ã§å人çãªæ¤è¨¼ã§ãããããå
容ã«ã¤ãã¦ãµãã¼ãã¸ã®ãåãåããã¯ãé æ
®ä¸ããã
ç°å¢ã®æºå
OpenShiftç°å¢
ä»åã¯OpenShiftç°å¢ã¨ãã¦ROSA HCPãå©ç¨ãã¾ãã主é¡ããå¤ããããæ§ç¯æ¹æ³ã¯å²æã¨ããã¦é ãã¾ãããå¼ç¤¾ã¡ã³ãã¼ã®ããã°è¨äºãªã©ãåç §é ãã¨ç°¡åã«æ§ç¯ã§ããã¨æãã¾ãã
Worker Nodeã¨ãã¦AWSã®ä¸è¨ã®ã¤ã³ã¹ã¿ã³ã¹ã¿ã¤ããæºåããæ¤è¨¼ãè¡ãã¾ããã
ãããªã¼ãã¼ã¹ããã¯æ°å³ãªã®ã§ãå®éã«ã¯ããå°ãå°ããã¤ã³ã¹ã¿ã³ã¹ã¿ã¤ãã§ãååã ã¨æãã¾ãã
ã»m5.4xlarge(16vCPU,64GiB Memory) * 3
ã»g5.4xlarge(16vCPU,64GiB Memory, A10G GPU 24GiB VRAM)
LLMãå®è¡ããä¸ã§éè¦ã¨ãªãã®ãGPUã®VRAMã®å¤§ããã§ãã
æ¬ããã°ã§ã¯ãã©ã¡ã¼ã¿ã8Bã®Llama 3.1ãåãããããããã8 * 2 =16GiBã»ã©ã®VRAM容éãè¦æ±ãããã¨äºæ³ã§ãã¾ããAWSã®G5ã¤ã³ã¹ã¿ã³ã¹ã«æè¼ãããA10G GPUã§ããã°ååãªå®¹éã確ä¿ã§ãã¾ãã
Operatorã®ã¤ã³ã¹ãã¼ã«
OpenShiftã¯ã©ã¹ã¿ãæ§ç¯ã§ããããå¿
è¦ãªOperatorãOperatorHubããã¤ã³ã¹ãã¼ã«ãã¦ããã¾ãã
ä»åå¿
è¦ã¨ãªãOperatorã¯ä»¥ä¸ã®éãã§ãã
ã»Node Feature Discovery(NFD) Operator
ã»NVIDIA GPU Operator
ã»OpenShift Serverless Operator
ã»OpenShift ServiceMesh Operator
ã»Authorino Operator
ã»OpenShift AI Operator
ä¸2ã¤ã®NFD Operatorã¨GPU Operatorã¯OpenShiftã¯ã©ã¹ã¿ã®ä¸ã§GPUãå©ç¨ããããã«å¿ è¦ã¨ãªãOperatorã§ãã
ããããã®Operatorãã¤ã³ã¹ãã¼ã«ããå¾ã«ãNFDã§ã¯NodeFeatureDiscoveryãGPU Operatorã§ã¯ClusterPolicyã¨ããã«ã¹ã¿ã ãªã½ã¼ã¹ãä½æãã¾ããè¨å®å¤ã«ã¤ãã¦ãä»åã®æ¤è¨¼ã§ã¯ä¸¡è
ã¨ãããã©ã«ãå¤ã®ã¾ã¾ã§å¤§ä¸å¤«ã§ãã
NFD OperatorãGPUããã¤ã¹ãæè¼ããNodeã«ã©ãã«ãä»ä¸ãããã®ã©ãã«ãå ã«GPU Operatorã対象ã®Nodeã¸GPUãã©ã¤ãçã®ã³ã³ãã¼ãã³ãããã¼ããã¾ãã
ClusterPolicy CRãä½æå¾ãnvidia-gpu-operator
NameSpaceã§nvidia-driver Podãèµ·åãã¦ããã°OKã§ãã
oc get pod -n nvidia-gpu-operator | grep nvidia-driver --- nvidia-driver-daemonset-416.94.202407030122-0-skdd2 2/2 Running 0 3h34m
ç¶ãã¦æ®ãã®Operatorãã¤ã³ã¹ãã¼ã«ãã¾ãã
OpenShift Serverlessã¨ServiceMeshãAuthorino Operatorã¯OpenShift AIã§LLMãå®è¡ããéã«ç¨ããKServeã¨ããOSSã®ä¾åé¢ä¿ã¨ãã¦ã¤ã³ã¹ãã¼ã«ãå¿
è¦ã¨ãªãã¾ãã
OpenShift AIã®KServeã§ã¯2ã¤ã®ãããã¤ãã¿ã¼ã³ãå©ç¨ã§ãã¾ããä»åã®ããã«LLMãªã©æ¯è¼ç大ããã¢ãã«ãå®è¡ããéã¯ã1ã¤ã®ã©ã³ã¿ã¤ã ã«1ã¤ã®ã¢ãã«ããã¼ãããSingle model servingã¨å¼ã°ããæ¹æ³ã§ãããã¤ãè¡ãã¾ãã
docs.redhat.com
ããä¸ã¤ã®ãããã¤ãã¿ã¼ã³ã¨ãã¦Multi model serving(Model Mesh)ã¨ããè¤æ°ã®ã¢ãã«ã1ã¤ã®ã©ã³ã¿ã¤ã ã§å®è¡ããæ¹æ³ãåå¨ãã¾ããããã¡ãã¯ããå°ããªã¢ãã«ã対象ã¨ãã¦ãããä»åã¯ä½¿ç¨ãã¾ããã
OperatorHubããServerlessãServieMeshãAuthorinoãæ¤ç´¢ãã¤ã³ã¹ãã¼ã«ãã¾ããããNFD/GPU Operatorã¨ç°ãªãèªåã§ã«ã¹ã¿ã ãªã½ã¼ã¹ãä½æããå¿
è¦ã¯ããã¾ããã
æå¾ã«OpenShift AIã®ã¤ã³ã¹ãã¼ã«ãè¡ãã¾ããããã¾ã§ã¨åæ§OperatorHubã§æ¤ç´¢ããã¤ã³ã¹ãã¼ã«ãå®è¡ãã¾ããOperatorã®ã¤ã³ã¹ãã¼ã«ãå®äºããã¨DataScienceClusterã¨ããã«ã¹ã¿ã ãªã½ã¼ã¹ã®ä½æãæ±ãããããããããã©ã«ãè¨å®ã®ã¾ã¾ä½æãã¾ãã
ã¹ãã¼ã¿ã¹ãReadyã¨ãªãã¨OpenShiftã³ã³ã½ã¼ã«ã®å³ä¸ã®ã¡ãã¥ã¼ããOpenShift AIã³ã³ã½ã¼ã«ã¸ã®ãªã³ã¯ã追å ããã¾ãã
ãªã³ã¯ãéããèªè¨¼ãè¡ãã¨OpenShift AIã³ã³ã½ã¼ã«ã«ã¢ã¯ã»ã¹ã§ãã¾ãã
OpenShift AIã«ãããå種è¨å®
ããããã¯OpenShift AIã§LLMã®ãµã¼ãã³ã°ãè¡ãã«ãããå¿
è¦ãªå種è¨å®ãè¡ãªã£ã¦ããã¾ãã
OpenShift AIã®KServeãå©ç¨ãLLMãå®è¡ããå ´åã以ä¸ã®å³ã®ããã«LLMãä¸ç«¯ãªãã¸ã§ã¯ãã¹ãã¬ã¼ã¸ã«æ ¼ç´ããããããã¢ãã«ã®å®è¡ã©ã³ã¿ã¤ã (vLLMãç)ã«ãã¼ããå®è¡ãã¾ãã
ãã®ãããã¾ãåãã«ã¢ãã«ã®ä¿åå
ã¨ãªããªãã¸ã§ã¯ãã¹ãã¬ã¼ã¸ã®è¨å®ãã©ã³ã¿ã¤ã ã®ç»é²ãè¡ãã¾ãã
Data Science Project 㨠Data Connectionã®è¨å®
OpenShift AIã§ã¯ãã¦ã¼ã¶ã¼ãåãçµãAIããã¸ã§ã¯ããã¨ã«ãData Science Projectã¨ããåä½ã§ãªã½ã¼ã¹ã管çãã¾ããOpenShift AIã®ã³ã³ã½ã¼ã«ããç»é¢å·¦ã®Data Science Projectsãé¸æããæ°ããªProjectãä½æãã¾ããããå®æ ã¨ãã¦ã¯ãããã¯ã©ãã«ã®ä»ããNameSpaceã§ããããã®ä¸ã§Podçã®å種ãªã½ã¼ã¹ãä½æãå®è¡ããã¾ãã
ä¸å³ã®Data Science Projectãä½æããã¨ãllm-serving
ã¨ããNameSpaceãOpenShiftã«æ°ãã«ä½æããã¾ãã
Projectãä½æã§ããããç¶ãã¦Data connectionãä½æãã¾ãã
Data connectionã§ã¯å©ç¨ãããªãã¸ã§ã¯ãã¹ãã¬ã¼ã¸ã¸ã®ã¢ã¯ã»ã¹æ
å ±ãè¨å®ãã¾ãã
ä»åã¯AWS S3ã«ã¦my-trial-llm
ã¨ãããã±ããããããããä½æãã¦ãããããã¸ã®ã¢ã¯ã»ã¹æ
å ±ãè¨å®ãã¾ããã
S3ã®ã¨ã³ããã¤ã³ãã«ã¤ãã¦ã¯ä»¥ä¸ã®ããã¥ã¡ã³ãã確èªããèªåãå©ç¨ãããªã¼ã¸ã§ã³ã®ã¨ã³ããã¤ã³ããè¨å®ãã¾ãããã
docs.aws.amazon.com
Data Connectionã§ã¯S3äºæã®ãªãã¸ã§ã¯ãã¹ãã¬ã¼ã¸ã§ããã°å©ç¨ãããã¨ãã§ãããããä¾ãã°MinioãOpenShift Data Foundationãªã©ãåæ§ã«è¨å®ãããã¨ãã§ãã¾ãã
ããã§è¨å®ããå
容ã¯Secretã¨ãã¦ç®¡çããã¾ãã
便å©ãã¼ã«(ODH Tools)ã®è¿½å
KServeã«ãã£ã¦LLMããã¹ãããã«ã¯ãHuggingFaceã§å ¬éããã¦ããLLMã®ã¢ãã«ãã¡ã¤ã«ããªãã¸ã§ã¯ãã¹ãã¬ã¼ã¸ãã±ããã«ä¿åããå¿ è¦ãããã¾ãã å¿ è¦ãªãã¡ã¤ã«ãä¸ç«¯å ¨ã¦èªèº«ã®ç«¯æ«ã«ãã¦ã³ãã¼ããã¦ãããããã¾ãã¢ãããã¼ããããã¨ããæ¹æ³ã§ã対å¿ã§ãã¾ããã便å©ãã¼ã«ã¨ãã¦Open Data Hub Tools & Extensions Companionã¨ãããã®ãã³ãã¥ããã£ã§å ¬éããã¦ãã¾ãã github.com ãã¡ãã使ããã¨ã§HuggingFaceä¸ã®ã¢ãã«ãç´æ¥ãã±ããã«ä¿åãããã¨ãã§ãã¾ãã
å©ç¨ã«ã¯OpenShift AIã®SettingsããODH Toolsã®ã³ã³ããã¤ã¡ã¼ã¸ãã«ã¹ã¿ã Notebook imageã¨ãã¦ç»é²ãã¾ãããã®è¨å®ã¯cluster-adminãã¼ã«ã®æ¨©éãå¿
è¦ãªã®ã§æ³¨æãã¦ä¸ãããï¼kubeadminä¸å¯ã®ãã注æï¼
Image locationã¨ãã¦quay.io/rh-aiservices-bu/odh-tec:latest
ãç»é²ãã¾ãããã
ç»é²ãå®äºããã¨ããã©ã«ãã§æºåããã¦ããNotebook imageã¨åæ§ã«å©ç¨ãããã¨ãã§ããããã«ãªãã¾ãã
ã«ã¹ã¿ã Serving Runtimeã®è¿½å
Llama 3.1ããã¹ãããã«ãããã¢ãã«ã®ã©ã³ã¿ã¤ã ã¨ãã¦vLLMãå©ç¨ãã¾ããOpenShift AIã§ã¯vLLMã®ã³ã³ããã¤ã¡ã¼ã¸ãããã©ã«ãã§ãµãã¼ãããã¾ãããvLLMããã¸ã§ã¯ãã®ãã¡ãã®issueã確èªããã¨ãLlama 3.1ããã¹ãããã«ã¯ææ°ã®vLLMãã¼ã¸ã§ã³(v0.5.3.post1)ã¨transformersã©ã¤ãã©ãª(4.43.1)ãå¿ è¦ã¨ã®ãã¨ã§ããç¾è¡ã®OpenShift AIï¼v2.11ï¼ã§ã¯ãã¡ãã®ãã¼ã¸ã§ã³ã«å¯¾å¿ãã¦ããªãããã代æ¿æ段ã¨ãã¦ã«ã¹ã¿ã ã®ã©ã³ã¿ã¤ã ã¤ã¡ã¼ã¸ãç»é²ãã¾ããOpenShift AIã®æ¬¡æãã¼ã¸ã§ã³ï¼v2.12ï¼ä»¥éã§ã¯ãµãã¼ããããvLLMããã¼ã¸ã§ã³ã¢ãããããã®ä½æ¥ã¯ä¸è¦ã¨ãªãã§ãããã
Settingsã¡ãã¥ã¼ããServing runtimesãé¸æããPre-installedãããvLLMãè¤è£½ï¼duplicateï¼ãã¾ãã
ããã§2ç¹å¤æ´ãå ãã¾ãã
apiVersion: serving.kserve.io/v1alpha1 kind: ServingRuntime metadata: annotations: opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]' openshift.io/display-name: Latest vLLM ServingRuntime for KServe labels: opendatahub.io/dashboard: "true" name: vllm-runtime-copy spec: annotations: prometheus.io/path: /metrics prometheus.io/port: "8080" containers: - args: - --port=8080 - --model=/mnt/models - --served-model-name={{.Name}} - --distributed-executor-backend=mp - --max-model-len=8192 #å®è¡ãªãã·ã§ã³è¿½å command: - python - -m - vllm.entrypoints.openai.api_server env: - name: HF_HOME value: /tmp/hf_home image: quay.io/jishikaw/vllm-openai-ubi9:0.5.3.post1 #ã¤ã¡ã¼ã¸å¤æ´ name: kserve-container ports: - containerPort: 8080 protocol: TCP multiModel: false supportedModelFormats: - autoSelect: true name: vLLM
ä¸ã¤ã¯å©ç¨ããã³ã³ããã¤ã¡ã¼ã¸ã®å¤æ´ã§ãã
ææ°ã®ã©ã¤ãã©ãªãå«ãã³ã³ããã¤ã¡ã¼ã¸ããã«ãããQuayã§å
¬éãã¦ããã®ã§ãã¡ããå©ç¨ãã¾ãã
quay.io/jishikaw/vllm-openai-ubi9:0.5.3.post1
äºã¤ç®ã¨ãã¦vLLMã³ã³ããå®è¡æã®ãªãã·ã§ã³ã¨ãã¦--max-model-len=8192
ã追å ãã¾ãã
Llama 3.1ã¯ã³ã³ãã¯ã¹ãé·ã¨ãã¦128kã¾ã§ãµãã¼ããã¦ãã¾ããããã®ã¾ã¾ã§ã¯å©ç¨ããVRAMãµã¤ãºã«ç´ã¾ããªãããããªãã·ã§ã³ã«ããå¶éãããã¦ãã¾ãã
ã«ã¹ã¿ã Serving Runtimeã®ç»é²ãè¡ãã¨Pre-installedãããã©ã³ã¿ã¤ã ã¨åæ§ã«ã¢ãã«ãµã¼ãã³ã°æã«å©ç¨ãããã¨ãã§ãã¾ãã
ã¢ãã«ã®ãã¦ã³ãã¼ã
ããã§ã¯å ã»ã©ç»é²ããODH toolsã使ã£ã¦HuggingFaceããLlama 3.1ã®ã¢ãã«ãã¡ã¤ã«ããã¦ã³ãã¼ããã¾ãããã
ä»åå©ç¨ããMeta-Llama-3.1-8B-Instructã¯HuggingFaceã®ä»¥ä¸ã®URLã§å
¬éããã¦ãã¾ãã
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
ãã¡ãã¯Gated modelã¨ãªã£ã¦ãããå©ç¨ã«ã¯ä»¥ä¸ãå¿
è¦ã§ãã
ã»HuggingFaceã¢ã«ã¦ã³ãã®ä½æ
ã»HF Tokenã®æãåºã
ã»Llama 3.1å©ç¨ç³è«
æ¢ã«HuggingFaceã¢ã«ã¦ã³ããæã£ã¦ããã¢ã¯ã»ã¹Tokenãããå ´åã¯ãä¸ï¼ã¤ã¯ã¹ããããã¦ä¸ããã
3ç¹ç®ã®Llama 3.1ã¸ã®å©ç¨ç³è«ã¯ãå
ã»ã©ã®URLããè¡ãã¾ããç³è«ãã¦ããå©ç¨æ¿èªãå¾ãã¾ã§è¥å¹²æéããããã¾ããï¼10-20åç¨åº¦ï¼
æ¿èªãå¾ãããã¨ä»¥ä¸ã®ãããªã¡ã¼ã«ãéããã¦ãã¾ãã
ããã§ã¯OpenShift AIã®ã³ã³ã½ã¼ã«ã«æ»ããå
ã»ã©ä½æããData Science Projectããæ°ããªWorkbenchãèµ·åãã¾ãã
Workbenchã®èµ·åç»é¢ã«ã¦ä»¥ä¸ã®è¨å®ãè¡ãã¾ãã
ã»Notebook image / image selection: å
ã»ã©ç»é²ããODH toolsãé¸æ
ã»Environment variables: Keyã¨ãã¦"HF_TOKEN"ãValueã«åå¾ããTokenãè¨å®
ã»Data connections: Use existing data connectionãé¸æããä½ææ¸ã¿ã®Data Connectionãé¸æ
ä»ã®é
ç®ã¯ããã©ã«ãã®ã¾ã¾ã§OKã§ãã
Workbenchãèµ·åããã¨ä»¥ä¸ã®ãããªãªãã¸ã§ã¯ãã¹ãã¬ã¼ã¸ã®ãã©ã¦ã¸ã³ã°ç»é¢ã表示ããã¾ãã
ããã§Import HF modelãé¸æãã以ä¸ã®ããã«Llama 3.1ã®ãªãã¸ããªãé¸æãã¾ããã¢ã¯ã»ã¹æ¿èªæ¸ã¿ã§ãããTokenã®è¨å®ãæ£ããè¡ããã¦ããã°ããã¡ã¤ã«ã®ãã¦ã³ãã¼ããéå§ããã¾ãã
ãã¦ã³ãã¼ããå®äºããã¨Llama 3.1ã®safetensorsãã¡ã¤ã«çããªãã¸ã§ã¯ãã¹ãã¬ã¼ã¸ã«ä¿åãããã®ã確èªã§ããã¯ãã§ãã
ã¢ãã«ã®ãµã¼ãã³ã°
ããã§ã¯Llama 3.1ãå®è¡ãã¾ããããåã³Data Sceince Projectã«æ»ããModelsã¿ããéãã¾ããSingle model servingãé¸æãããããã¤ã®ããã®è¨å®ãè¡ãã¾ãã
ã»Model name: llama31
ã»Serving runtime: å
ã»ã©è¨å®ããã«ã¹ã¿ã ã©ã³ã¿ã¤ã
ã»Model framework (name - version): vLLM
ã»Model server replicas: 1
ã»Compute resources per replica / Model server size: Custom
ã»CPU requests / limits: 1 - 8
ã»Memory requests / limits: 4 - 28
ã»Accelerator: NVIDIA GPU
ã»Number of accelerators: 1
ã»Model location: Existing data connection
ã»Name: ä½ææ¸ã¿ã®Data connection
ã»Path: ãã¦ã³ãã¼ãããsafetensorsãã¡ã¤ã«ãåå¨ãããªãã¸ã§ã¯ãã¹ãã¬ã¼ã¸ã®ãã¹
è¨å®å¾Deployãæ¼ãã¨ã¢ãã«ã®ãµã¼ãã³ã°ãå§ã¾ãã¾ãã
ãªãã¸ã§ã¯ãã¹ãã¬ã¼ã¸ããã®ã¢ãã«ãã¡ã¤ã«ã®åå¾ãã容éã®å¤§ããã©ã³ã¿ã¤ã ã³ã³ããã¤ã¡ã¼ã¸ã®pullãè¡ããããæåã¯ç¹ã«æéããããã¾ãã
ãããã¤ãæåããã¨ä»¥ä¸ã®ç»åã®ããã«æ¨è«ã®APIã¨ã³ããã¤ã³ãã表示ããã¾ãã
ããã§ç¡äºLlama 3.1ã®ãããã¤ãå®äºãã¾ããã
æ¨è«ã®å®è¡
æå¾ã«æ¨è«ãå®è¡ãã¦ã¿ã¾ãããã Llama 3.1ã¯å¯¾å¿ããè¨èªã¨ãã¦æ¥æ¬èªã¯å«ã¾ãã¦ãã¾ããããããã§ã¯ããã¦æ¥æ¬èªã§è³ªåãããã¨æãã¾ãã 以ä¸ã®ã³ãã³ããå®è¡ãã¾ãã
export LLM_ROUTE={表示ãããæ¨è«APIã¨ã³ããã¤ã³ã} curl ${LLM_ROUTE}/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama31", "messages": [ {"role": "system", "content": "ããªãã¯ä¸å¯§ãªã¢ã·ã¹ã¿ã³ãã§ã.ã¦ã¼ã¶ã¼ããã®è³ªåã«åçãã¦ä¸ãã"}, {"role": "user", "content": "Red Hatã¨ã¯ã©ããªä¼ç¤¾ã§ããï¼"} ] }' | jq .
çµæã¯ä»¥ä¸ã®éãã§ãã
{ "id": "chat-ff392dae4bc24e6e803cb29325896953", "object": "chat.completion", "created": 1721818057, "model": "llama31", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Red Hatã¯ããªã¼ãã³ã½ã¼ã¹ã½ããã¦ã§ã¢ã®éçºã販売ããµãã¼ããææããç±³å½ã®ä¼æ¥ã§ãã主ã«Linuxãªãã¬ã¼ãã£ã³ã°ã·ã¹ãã ã¨ãã®å¨è¾ºã½ããã¦ã§ã¢ã®éçºã¨è²©å£²ãä¸å¿ã«äºæ¥ãè¡ã£ã¦ãã¾ãã", "tool_calls": [] }, "logprobs": null, "finish_reason": "stop", "stop_reason": null } ], "usage": { "prompt_tokens": 47, "total_tokens": 106, "completion_tokens": 59 } }
ç¡äºåçãå¾ããã¾ãããä½åº¦ãããåããããã¨ãã¯ãæ¥æ¬èªå¯¾å¿ã¯ä¸èªç¶ãªç¹ãããã¾ããããã¡ãã«ã¤ãã¦ã¯ä»å¾åºã¦ããã§ãããæ¥æ¬èªãã¥ã¼ãã³ã°ãããã¢ãã«ã®ç»å ´ãå¾ ã¡ããã¨æãã¾ãã
OpenShift AIã§å®è¡ããvLLMã«ã¤ãã¦ã¯ä»¥ä¸ã®éãé¢é£ããã¡ããªã¯ã¹ãPrometheusã«ããåå¾ãã表示ãããã¨ãã§ãã¾ãã
ã¾ã¨ã
ä»åã¯ä»ããããªLLMã§ããLlama 3.1ãOpenShift AIã§å®è¡ãã¾ããããªã³ãã¬ãã¹ãã¨ãã¸ç°å¢ãªã©ãå®è¡ç°å¢ãé¸ã°ãå©ç¨å¯è½ãªãªã¼ãã³LLMã¯ä»å¾ããçºå±ãè¦è¾¼ã¾ããã¨èãããã¾ããRed Hatã¯AIã§ãOpen Hybrid Cloudãå®ç¾ããAIæ´»ç¨ã®æ´ãªãå¯è½æ§ãåºãã¦ããã¾ãã