N-Gram Model Description

The Corpus for this task should be prepared by yourself. The corpus should consist of 10 different domains and each domain should have 50 distinct files. You are supposed to implement following Python functions.

The text files are not tokenized. You need to implement a function with name tokenize () that takes the file path as its argument and returns the tokenized sentences.

Write a function Ngram () that should accept two required argument, n the order of the n-gram model & sentences and returns the n-grams.

Write a function SentenceProb () that should accept a sentence and returns the probability of the given sentence using Bigram model.

Write a function SmoothSentenceProb () that should accept a sentence and returns the probability of the given sentence using Bigram model and with Laplace smoothing.

Write a method Perplexity (), that calculates the perplexity score for a given sequence of sentences

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
corpus		corpus
.gitignore		.gitignore
n-gram.py		n-gram.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

N-Gram Model Description

About

Releases

Packages

Languages

burhanharoon/N-Gram-Language-Model

Folders and files

Latest commit

History

Repository files navigation

N-Gram Model Description

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages