Skip to content

It's a python based n-gram langauage model which calculates bigrams, probability and smooth probability (laplace) of a sentence using bi-gram and perplexity of the model.

Notifications You must be signed in to change notification settings

burhanharoon/N-Gram-Language-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

N-Gram Model Description

The Corpus for this task should be prepared by yourself. The corpus should consist of 10 different domains and each domain should have 50 distinct files. You are supposed to implement following Python functions.

The text files are not tokenized. You need to implement a function with name tokenize () that takes the file path as its argument and returns the tokenized sentences.

Write a function Ngram () that should accept two required argument, n the order of the n-gram model & sentences and returns the n-grams.

Write a function SentenceProb () that should accept a sentence and returns the probability of the given sentence using Bigram model.

Write a function SmoothSentenceProb () that should accept a sentence and returns the probability of the given sentence using Bigram model and with Laplace smoothing.

Write a method Perplexity (), that calculates the perplexity score for a given sequence of sentences

About

It's a python based n-gram langauage model which calculates bigrams, probability and smooth probability (laplace) of a sentence using bi-gram and perplexity of the model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages