Welcome to your guide for enhancing your development workflow with AI assistance! Whether you're new to AI-powered coding or looking to optimize your setup, this tutorial will walk you through seamless IDE integration.

Level : Intermediate Reading Time : 16min

Introduction
Cursor the dedicated IDE for seamless AI integration
Implementing AI Assistance
- Model providers
  - Code dedicated providers
  - Generalist Models Providers
- LLMs Models
  - Open Source Models
  - Proprietary Models
Hands-on implementation guide using Ollama and Continue
- Setup Ollama as the Local Model Provider
- Installation and usage of Continue.dev extension

Introduction

Modern development environments now leverage AI capabilities to enhance code synthesis, automate routine operations, and optimize development workflows. IDE integration of language models enables contextual code assistance, automated testing, and intelligent refactoring while maintaining established development practices.

The Pros: Boosting Productivity and Innovation

Increased productivity : AI tools can automate repetitive coding tasks, generate code snippets, and provide intelligent code suggestions, allowing developers to focus on more complex problems and accelerate development cycles.
Improved code quality : AI-powered code analysis can detect bugs, security vulnerabilities, and code smells, helping developers write cleaner and more maintainable code.
Enhanced learning experience : AI assistants can provide real-time guidance, explanations, and best practice recommendations, facilitating the learning process for novice developers.
Accessibility : AI tools can make coding more accessible to non-experts by bridging the gap between novice and experienced developers, enabling a broader range of individuals to participate in software development.

The Cons: Challenges and Considerations

Potential bias and errors : AI models can inherit biases from their training data or make mistakes, leading to incorrect code suggestions or flawed analysis.
Over-reliance and loss of skills : Excessive dependence on AI tools may hinder developers' ability to write code independently and erode their problem-solving skills over time.
Security and privacy concerns : AI tools may inadvertently expose sensitive data or introduce security vulnerabilities if not properly secured and maintained.
Integration challenges : Integrating AI tools into existing development workflows and IDEs can be complex, requiring additional configuration and compatibility considerations.
Cost and accessibility : Advanced AI tools may come with significant licensing costs or require specialized hardware, making them less accessible to individual developers or smaller organizations.

Caution

Code assistance models complement developer expertise by augmenting development workflows through contextual suggestions and documentation integration. These systems enhance implementation speed and technical learning while relying on established programming principles and human oversight.

Always validate AI-generated code against official documentation and established development standards. Implement thorough code review practices when using automated suggestions.

Cursor the dedicated IDE for seamless AI integration

Cursor integrates language models into a VS Code-based editor, providing contextual code completion, codebase analysis, and natural language editing capabilities. The platform supports both cloud-based and local AI models.

Currently Supported Models :

Claude 3.5 Sonnet by Anthropic
o1-preview / o1-mini and gpt4-o by OpenAI

Some key features of Cursor include:

Code understanding and context : Cursor.sh can understand the context of your entire codebase, allowing it to provide tailored suggestions, generate unit tests, and even implement new features by modifying the relevant files.
Natural language editing : You can edit code using natural language prompts, such as changing an entire method or class with a single instruction.
Code generation : Cursor.sh can generate code from scratch based on simple instructions provided by the user.
Copilot++ : An enhanced version of GitHub Copilot that can suggest mid-line completions and entire diffs, trained to autocomplete based on sequences of edits.
Integration with VSCode : Cursor.sh is a fork of VSCode, allowing users to import all their extensions, themes, and keybindings with a single click.
Privacy options : Cursor.sh offers a privacy mode where none of your code is stored on their servers or logs, catering to security-critical work.

Implementing AI Assistance

Model providers

First Let's pick the right AI provider for your coding needs! Consider What features you'll use most, Where your code will live (local or cloud), Your privacy preferences, and How it fits with your favorite IDE.

Tip

Take a moment to explore different options - finding the right fit makes all the difference!

Reference the Artificial Analysis Leaderboard for comparative analysis of LLM providers across key performance metrics: pricing, token generation speed, response latency, and context window capabilities.

Caution

Security Consideration : Cloud-based code assistance services process codebase data through remote infrastructure, potentially exposing sensitive information to third-party systems. Consider data privacy implications and compliance requirements when implementing proprietary code processing services.

Proprietary code assistance services may utilize submitted code for model training and refinement. Review provider terms of service regarding data collection and model training practices.

Code dedicated providers

Cloud-based Proprietary Providers

Tool	Description	Licence	Pricing
AskCodi	An AI-powered coding assistant that offers code suggestions, debugging help, and explanations for code snippets.
Blackbox	An AI platform that helps businesses automate processes, make predictions, and optimize decision-making.
Boxy	An AI coding assistant by CodeSandbox providing real-time code suggestions and completions.
Codeium	An AI-powered code completion tool that helps developers write code faster and more accurately.
CodeWhisperer	Developed by Amazon, provide real-time code suggestions and completions.
Codium	An AI-powered tool that analyze your code, docstring, and comments and suggests tests as you code.
Copilot	Developed by GitHub and OpenAI, provide real-time code suggestions and completions.
JetBrains AI	JetBrains is working on integrating AI capabilities into their development tools.
Replit AI	A coding assistant and tutorial platform developed by Replit, offering code suggestions and explanations.
Tabnine	An AI-powered code completion tool that helps developers write code faster and more accurately.

Open-Source Providers

Tool	Description	Licence	Pricing
Aider	an AI-powered pair programming tool designed to assist developers in writing and editing code directly from the command line.
Continue	An open-source autopilot for software development that enables developers to create their own AI code assistant within their integrated development environment (IDE) like VS Code or JetBrains IDEs.
Llamacoder	An open source Claude Artifacts – generate small apps with one prompt.
Open Interpreter	Open Interpreter is an innovative open-source project that allows language models to execute code on a user's computer to complete various tasks.

Generalist Models Providers

Cloud-based Providers

Tool	Description	Pricing
	Known for their language models like Jurassic-1 Jumbo focused on quality, safety, and controllability.
	Offers models like Amazon CodeWhisperer for code generation and understanding through their SageMaker platform.
	Known for their constitutional AI model Claude, focused on being helpful, harmless, and honest.
Cerebras	An AI company that has developed innovative hardware and software solutions for AI computing.
	Provides an enterprise AI platform with models like Cohere Generate for custom content creation.
	A unified, open analytics platform that provides tools and services for data processing, analytics, and artificial intelligence at scale.
DeepInfra	A platform that provides scalable and cost-effective infrastructure for deploying machine learning models.
	An AI company that has developed several notable AI models and technologies
	A comprehensive solution for companies looking to deploy AI into production, focusing on performance, cost-effectiveness, and developer experience.
	Provides models like LaMBDA, PaLM, and Bard for language understanding, generation, and multimodal AI tasks.
	Specializes in high-performance AI inference with custom LPU (Language Processing Unit) hardware, offering models like Meta's Llama 3.
	The AI dedicated github, Offers a platform with most open-source models like BERT, GPT-Neo, and Llama for various AI tasks.
LeptonAI	A platform that provides cloud-based infrastructure and tools for deploying and running AI applications efficiently.
	A comprehensive suite of AI services and tools designed to help developers and organizations build, deploy, and manage AI applications at scale.
	A French artificial intelligence company that specializes in developing large language models (LLMs) and AI products.
OctoAI	A full-stack inference platform designed specifically for generative AI applications.
	Offers models like GPT-4, DALL-E, and Whisper for natural language processing, image generation, and speech recognition.
	A versatile platform designed to provide access to a wide range of large language models (LLMs) from both proprietary and open-source sources.
	An online platform that provides free access to various powerful open-source large language models (LLMs) for experimentation and use in a wide range of applications.
Poe	An AI chatbot aggregator platform developed by Quora that provides users access to multiple advanced language models and chatbots within a single interface.
	An AI company that develops advanced multimodal AI models and technologies.
Replicate	A cloud platform that allows developers to easily run and deploy open-source machine learning models.
SambaNova	An artificial intelligence company that provides a comprehensive AI platform for enterprises.
Together	A cloud platform designed for building and running generative AI applications.
Vercel	A powerful tool for developers looking to explore and integrate various AI models into their applications efficiently.

Local Providers

Important

Deploy LLMs locally with our implementation guide for privacy-focused language processing and model experimentation on your hardware.

Tool	Description	OS	Models
AnythingLLM	an open-source, full-stack application that allows users to chat with their documents in a private and enterprise-friendly environment.	All	Open source Models and Proprietary models via API
Enchanted	iOS and macOS app for chatting with private self hosted language models.	MacOS/IOS	Open source Models
GPT4ALL	An open-source software ecosystem developed by Nomic AI that enables users to run powerful large language models (LLMs) locally on their personal computers.	All	Open source Models
Jan	Clean UI with useful features like system monitoring and LLM library.	All	Open source Models.
LM Studio	Elegant UI with the ability to run every Hugging Face repository.	All	Open source Models
Msty	an AI chat application that offers a user-friendly interface for interacting with both local and online AI language models.	All	Open source Models and Proprietary models via API
Ollama	Fastest when used on the terminal, and any model can be downloaded with a single command.	All	Open sources Models

LLMs Models

Now that you've chosen your provider, let's find the right model for your coding needs.

We designed two tables to assist those who require guidance in selecting the ideal model for coding purposes.

The first table highlights Open-source models, which you can typically run locally on your machine (Probably not hte Massive Models). To ensure a smooth experience running these models locally, I will provide the necessary hardware requirements for optimal performance. Please note that while you may be able to run certain models with lower hardware specifications, you can expect slower output and performance as a result.

The second table features Proprietary models that usually operate on cloud-based providers.

The models are ranked according to LiveCodeBench Pass@1 Code Generation scores (with higher scores indicating better performance). Pass@1 is the probability of passing a given problem in one attempt. LiveCodeBench offers a more comprehensive, up-to-date, and contamination-aware evaluation of code-related capabilities compared to HumanEval.

Caution

Please note that the hardware requirements provided are only indicative, and the specified VRAM requirements in the tables below apply specifically to the default Ollama Q4_0 quantization version of the models. Learn more about LLMs Quantization.

Tip

You can always try different models to find your perfect match. Many developers mix and match based on different tasks!

Badges provide direct links to model downloads, provider services, and official documentation.

Open Source Models

Massive models : Challenging for local deployment due to computational requirements. Use them via any of the listed Cloud-based Providers (e.g. Groq / Together / Mistral or OpenRouter).

Model	Model Size	Hardware requirement	Pass@1 score (/100)	Cloud-based providers
DeepSeek-R1	685B	405GB+ VRAM GPU (5XH100 or better)	77.9
Qwen2.5-72B-Instruct	72.2B	47GB+ VRAM GPU (2xRTX 4090 or better)	48.7
DeepSeek-V3	671B	405GB+ VRAM GPU (5XH100 or better)	56.3
DeepSeek-Coder-V2-Instruct	236B	130GB+ VRAM GPU (2XH100 or better)	41.6
Llama-3.1-405b-Instruct	405B	230+ VRAM GPU (3XH100 or better)	39.9
Mistral-Large-2-Instruct	123B	70GB+ VRAM GPU (1XH100 or better)	39.1
Qwen2-72B-Instruct	72.7B	47GB+ VRAM GPU (2xRTX 4090 or better)	30.5
Llama-3.1-70b-Instruct	70B	40GB+ VRAM GPU (2xRTX 4090 or better)	30
Llama3-70B-instruct	70B	40GB+ VRAM GPU (2xRTX 4090 or better)	25.8
Mixtral-8x22b-Instruct-v0.1	141B	80GB+ VRAM GPU (1xH100 or better)	24.9
Command R+	104B	60GB+ VRAM GPU (3xRTX 4090 or better)	19.2
CodeLlama-70B-Instruct	70B	40GB+ VRAM GPU (2xRTX 4090 or better)	12.6	None

Mid-sized models : Suitable for deployment on a high-performance local workstation.

Model	Model Size	Hardware requirement	Pass@1 score (/100)	Cloud-based providers
Qwen2.5-32B-Instruct	32.8B	20GB+ VRAM GPU (RX 7900 XT or RTX 4090 or better)	46.6
Codestral-22B	22B	14GB+ VRAM GPU (RX 7800 or RTX 4070 Ti or better)	31.8
DeepSeek-Coder-V2-Lite-Instruct	16B	10GB+ VRAM GPU (RX 7800 or RTX 4070 or better)	27.9
DeepSeek-Coder-33B-instruct	33B	20GB+ VRAM GPU (RX 7900 XT or RTX 4090 or better)	26.7
OpenCodeInterpreter-DS-33B	33B	20GB+ VRAM GPU (RX 7900 XT or RTX 4090 or better)	19	None

Small models : Lightweight and deployable on most local machines.

Model	Model Size	Hardware requirement	Pass@1 score (/100)	Ollama libraries	Cloud-based providers
Qwen2.5-Coder-7B-Instruct	7B	5GB+ VRAM GPU (RX 6500 or RTX 3050 or better)	34
Yi-Coder-9B-Chat	9B	5GB+ VRAM GPU (RX 6500 or RTX 3050 or better)	31.2
Qwen2.5-7B-Instruct	7.62B	6GB+ VRAM GPU (RX 7600 or RTX 4060 or better)	29.9
Mamba-codestral-7B-v0.1	7B	5GB+ VRAM GPU (RX 6500 or RTX 3050 or better)	21.1	No
CodeQwen1.5-7B-Chat	7B	5GB+ VRAM GPU (RX 6500 or RTX 3050 or better)	19.6		None
Llama3.1-8B-instruct	8B	5GB+ VRAM GPU (RX 6500 or RTX 3050 or better)	19.2
DeepSeek-Coder-6.7B-instruct	7B	5GB+ VRAM GPU (RX 6500 or RTX 3050 or better)	18.9
OpenCodeInterpreter-DS-6.7B	6.7B	5GB+ VRAM GPU (RX 6500 or RTX 3050 or better)	18.7	No
CodeGeeX4-All-9B	9B	8GB+ VRAM GPU (RX 7600 or RTX 4060 or better)	17.8

Proprietary Models

Typically accessible through a paid API or web interface.

Provider	Model	Pass@1 score (/100)	Pricing
	o1	75.9
	o3-Mini	74.5
	Gemini Flash 2.0	55.9
	Claude-3.5 Sonnet	50.9
	GPT-4o	47.7
	Gemini Pro 1.5	43
	GPT-4 Turbo	41.8
	GPT-4o-mini	38.7
	Gemini Flash 1.5	37.2
	Claude-3 Opus	35.3

Hands-on implementation guide using Ollama and Continue

Step-by-step guide for privacy-preserving code assistance using open-source solutions.

Setup Ollama as the Local Model Provider

For those who prioritize keeping everything local and private, I'll be using Ollama as the provider in this tutorial. If you haven't already, be sure to check out our A Practical Tutorial to Run a Local Model to learn how to install Ollama and get started.

Note

In this tutorial, I am using the Codestral:22b model. This particular model is specialized for coding and has been meticulously trained on vast datasets by Mistral.ai.

This model stands out for its exceptional size and capabilities, selected specifically for the hardware I use on my PC. As with any model, it has its unique strengths and weaknesses. To get the most out of this tool, be sure to choose wisely among the available options.

Install Continue extension

Official Continue.dev website

Click on the Extensions icon in the left sidebar or use the keyboard shortcut: Ctrl + Shift + X (Windows/Linux) or Cmd + Shift + X (macOS).
In the Extensions marketplace, search for "Continue" to find the extension.
Click on the "Continue" extension to open its page.
Click the "Install" button to begin the installation process.
Wait for the extension to download and install. This might take a few seconds.
Once the installation is complete, click on the "Reload Required" button to reload VS Code.

Sidebar chat Model

Open the Continue.dev extension by clicking on its icon in the left sidebar.
The extension will prompt you for a Model Provider; select the Local Provider option.
If you have set up Ollama correctly, it should automatically detect your models.
At the bottom of the window, you will find a list of all your installed models. For this tutorial, I choose Codestral:22b.

Inline autocomplete Model

By default, the extension is set to use starcoder-2-3b for autocomplete. If you want to keep using this model, ensure it is installed with your Ollama setup. Otherwise, if you wish to switch to a more capable model, click on the settings icon at the bottom right of the extension window.
This will open the config.json file, where you can customize the extension's behavior.
Navigate to the following snippet:

 "tabAutocompleteModel": {
    "title": "starcoder-2-3b",
    "provider": "ollama",
    "model": "starcoder-2-3b"
  },

Replace starcoder-2-3b with the model you prefer. For example, if you want to use Codestral:22b, update the code to:

"tabAutocompleteModel": {
   "title": "codestral:22b",
   "provider": "ollama",
   "model": "codestral:22b"
 },

or by Qwen2.5-Coder-7B-Instruct for less powerfull computers :

"tabAutocompleteModel": {
   "title": "qwen2.5-coder:7b-instruct",
   "provider": "ollama",
   "model": "qwen2.5-coder:7b-instruct"
 },

Save your modifications and close the config file.

That's it for this tutorial ! You now have an AI model integrated into your VSCode setup.

(Bonus) Setup Groq as the Cloud-based Model Provider

If you don't have the necessary hardware to run models locally or want to tap into more powerful capabilities, this tutorial is for you. I'll be using Groq as the provider, leveraging their infrastructure's fast inference capabilities.

Note

In this tutorial, I'll be leveraging Groq, but feel free to substitute it with your preferred provider - whether that's OpenAI, Anthropic, or another that suits your needs.

When choosing your model, Keep in mind that each model has its strengths and weaknesses.

Before we get started, make sure you have a Groq account set up - if you haven't already, take a moment to create one.

Click on the Continue icon in the left sidebar to open the Continue panel.
If it is the first time you open it, The extension will prompt you for a Model Provider; select the Cloud Provider option.
Select Groq as your preferred AI provider.
Head to the GroqCloud section of the Groq website (look for the icon at the bottom).
Create a new API key and copy it to your clipboard.
Return to the window and paste your API key into the apiKey input field.
Select the AI model you want to use from the available options. Here on Groq, I'll choose the "Llama-3-70b Model".
You now have access to a capable model, ready to be used as a programming assistant within Visual Studio Code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrating-ai-models-into-ide.md

integrating-ai-models-into-ide.md

Table of Contents

Introduction

Cursor the dedicated IDE for seamless AI integration

Implementing AI Assistance

Model providers

Code dedicated providers

Generalist Models Providers

LLMs Models

Open Source Models

Proprietary Models

Hands-on implementation guide using Ollama and Continue

Setup Ollama as the Local Model Provider

Install Continue extension

Sidebar chat Model

Inline autocomplete Model

(Bonus) Setup Groq as the Cloud-based Model Provider

Files

integrating-ai-models-into-ide.md

Latest commit

History

integrating-ai-models-into-ide.md

File metadata and controls

Table of Contents

Introduction

Cursor the dedicated IDE for seamless AI integration

Implementing AI Assistance

Model providers

Code dedicated providers

Generalist Models Providers

LLMs Models

Open Source Models

Proprietary Models

Hands-on implementation guide using Ollama and Continue

Setup Ollama as the Local Model Provider

Install Continue extension

Sidebar chat Model

Inline autocomplete Model

(Bonus) Setup Groq as the Cloud-based Model Provider