Skip to content

Instantly share code, notes, and snippets.

View ishan-marikar's full-sized avatar
📚
Learning Data Visualisation and Data Science

Ishan Marikar ishan-marikar

📚
Learning Data Visualisation and Data Science
View GitHub Profile
@ishan-marikar
ishan-marikar / test.md
Created December 27, 2024 18:47 — forked from ityonemo/test.md
Zig in 30 minutes

A half-hour to learn Zig

This is inspired by https://fasterthanli.me/blog/2020/a-half-hour-to-learn-rust/

Basics

the command zig run my_code.zig will compile and immediately run your Zig program. Each of these cells contains a zig program that you can try to run (some of them contain compile-time errors that you can comment out to play with)

@ishan-marikar
ishan-marikar / Dockerfile
Created June 21, 2024 16:00 — forked from ItsWendell/Dockerfile
Postgres Dockerfile with Custom Extensions using pgxn
## Alternatives: postgres:15-alpine
ARG BASE_IMAGE=postgis/postgis:15-3.4-alpine
## Custom Alpine Postgres docker file with custom extensions
FROM ${BASE_IMAGE} as builder
# Install required dependencies
RUN apk --no-cache add \
python3 \
@ishan-marikar
ishan-marikar / reproducibility.md
Created June 21, 2024 04:41 — forked from Guitaricet/reproducibility.md
Notes on reproducibility in PyTorch

Reproducibility

ML experiments may be very hard to reproduce. You have a lot of hyperparameters, different dataset splits, different ways to preprocess your data, bugs, etc. Ideally, you should log data split (already preprocessed), all hyperparameters (including learning rate scheduling), the initial state of your model and optimizer, random seeds used for initialization, dataset shuffling and all of your code. Your GPU is also should be in deterministic mode (which is not the default mode). For every single model run. This is a very hard task. Different random seed can significantly change your metrics and even GPU-induced randomness can be important. We're not solving all of these problems, but we need to address at least what we can handle.

For every result you report in the paper you need (at least) to:

  1. Track your model and optimizer hyperparameters (including learning rate schedule)
  2. Save final model parameters
  3. Report all of the parameters in the pap
@ishan-marikar
ishan-marikar / normcore-llm.md
Created May 25, 2024 03:24 — forked from veekaybee/normcore-llm.md
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

@ishan-marikar
ishan-marikar / script.js
Created May 17, 2024 11:46 — forked from gd3kr/script.js
Download a JSON List of twitter bookmarks
/*
the twitter api is stupid. it is stupid and bad and expensive. hence, this.
Literally just paste this in the JS console on the bookmarks tab and the script will automatically scroll to the bottom of your bookmarks and keep a track of them as it goes.
When finished, it downloads a JSON file containing the raw text content of every bookmark.
for now it stores just the text inside the tweet itself, but if you're reading this why don't you go ahead and try to also store other information (author, tweetLink, pictures, everything). come on. do it. please?
*/
@ishan-marikar
ishan-marikar / ollama_dspy.py
Created April 4, 2024 11:07 — forked from jrknox1977/ollama_dspy.py
ollama+DSPy using OpenAI APIs.
# install DSPy: pip install dspy
import dspy
# Ollam is now compatible with OpenAI APIs
#
# To get this to work you must include `model_type='chat'` in the `dspy.OpenAI` call.
# If you do not include this you will get an error.
#
# I have also found that `stop='\n\n'` is required to get the model to stop generating text after the ansewr is complete.
# At least with mistral.
@ishan-marikar
ishan-marikar / genRegexFromJSON.ts
Created March 25, 2024 06:26 — forked from hrishioa/genRegexFromJSON.ts
Generate Regexes for ReLLM from Typescript types
// Meant to work with the code in https://github.com/hrishioa/socrate, specifically the functions in the `gpt/base.ts`.
// Created by Hrishi Olickel (@hrishioa) Sub 7 May
import { Messages, askChatGPT } from '../base';
async function generateRegexFromTypeSpec(typeSpec: string): Promise<string | null> {
//prettier-ignore
const prompts = {
systemPrompt: () => `You are a Typescript type to Regex converter that can only output valid Regexes.`,
specPrompt: (spec: string) => `Convert the following TYPESCRIPT_TYPE into a valid Regex that matches the type.
@ishan-marikar
ishan-marikar / load_and_process_open_source_licenses.ts
Created March 25, 2024 06:25 — forked from hrishioa/load_and_process_open_source_licenses.ts
Simple Typescript file demonstrating chunked, chained LLM calls to process large amounts of text.
// Requires the gpt library from https://github.com/hrishioa/socrate and the progress bar library.
// Created by Hrishi Olickel ([email protected]) (@hrishioa). Reach out if you have trouble running this.
import { ThunkQueue } from '../../utils/simplethrottler';
import {
AcceptedModels,
Messages,
askChatGPT,
getMessagesTokenCount,
getProperJSONFromGPT,
@ishan-marikar
ishan-marikar / PostgreSQL-EXTENSIONs.md
Created March 22, 2024 20:44 — forked from joelonsql/PostgreSQL-EXTENSIONs.md
1000+ PostgreSQL EXTENSIONs

🗺🐘 1000+ PostgreSQL EXTENSIONs

This is a list of URLs to PostgreSQL EXTENSION repos, listed in alphabetical order of parent repo, with active forks listed under each parent.

⭐️ >= 10 stars
⭐️⭐️ >= 100 stars
⭐️⭐️⭐️ >= 1000 stars
Numbers of stars might not be up-to-date.

# Configuration file for jupyterhub.
#------------------------------------------------------------------------------
# Application(SingletonConfigurable) configuration
#------------------------------------------------------------------------------
## This is an application.
## The date format used by logging formatters for %(asctime)s
#c.Application.log_datefmt = '%Y-%m-%d %H:%M:%S'