Papers by MUHAMMAD ZILANY

Acoustic information can be detected and processed through the auditory pathway in a very fast an... more Acoustic information can be detected and processed through the auditory pathway in a very fast and complicated way. A large number of studies have investigated sound encoding at different levels of the auditory system by recording direct neural responses to different types of stimuli. However, process¬ ing of more complex stimuli at higher auditory centres is not well understood yet. Computational modeling has emerged as a new approach in order to obtain at least some insight into mechanisms underlying processing of complex sounds such as speech, animal vocalization, and music. In this study, the main goal is to develop a phenomenological and computer-based model of octopus neurons in the posterior ventral cochlear nucleus to simulate the physiological responses to simple and complex stimuli. Octopus cells receive synaptic inputs from a number of auditory nerve (AN) fibers; as a result, an AN model developed by Zilany and colleagues has been used to provide input to the proposed model. The summation of weighted outputs from the AN model has been subjected to a power-law adaptation function to simulate octopus cell responses. Model responses are compared to the actual physiological data recorded from octopus neurons. Output of the proposed model can be applied as an excitatory input to model responses of superior paraolivary nucleus neurons located in the superior olivary complex and also in the model of sound localization.
It is hereby declared that this thesis or any part of it has not been submitted elsewhere for the... more It is hereby declared that this thesis or any part of it has not been submitted elsewhere for the award of any degree or diploma. Signature of the candidatẽ ...~.
2002 11th European Signal Processing Conference, 2002
This paper presents an efficient thresholding technique for wavelet speech enhancement. The signa... more This paper presents an efficient thresholding technique for wavelet speech enhancement. The signal-bias compensated noise level is used as the threshold parameter. The noise as well as signal level is estimated from the detail wavelet packet (WP) coefficients in the first scale. Both hard and soft thresholding are applied successively. The regions for hard thresholding are identified by estimating their signal to noise ratio (SNR) in the wavelet domain. Soft thresholding is applied to the rest of the regions. The performance of the proposed scheme is evaluated on speech recorded in real conditions with artificial noise added to it.

Journal of the Acoustical Society of America, Mar 1, 2019
Computational models of auditory perception offer a time-efficient method of assessing the effect... more Computational models of auditory perception offer a time-efficient method of assessing the effects of distortion on speech perception. Several objective metrics have been proposed to predict speech intelligibility, especially when speech is obscured by the presence of background noise. Novel approaches to full-reference and reference-free speech intelligibility metrics have emerged in recent years, but deciphering the best metric for predicting speech intelligibility still requires investigation. This study assessed the predictive accuracy of several reliable, full, and reference-free speech intelligibility metrics. Speech perception scores were measured on listeners with normal hearing and hearing loss in quiet and noise. Acoustic recordings were made of the presented speech stimuli and combined with a computational model of the auditory nerve to simulate behavioral scores using several established metrics such as the STOI, NSIM, SRMR, SII, SNRloss, and BSIM. The estimated speech scores were correlated with behavioral speech recognition scores to assess predictive accuracy of the model simulations. Several of the predicted scores correlated well with behavioral scores. Evaluation of individual phonemes revealed differential sensitivity of the metrics across different phonemic classifications.

Experimental Brain Research, 2019
Various studies on medial olivocochlear (MOC) efferents have implicated it in multiple roles in t... more Various studies on medial olivocochlear (MOC) efferents have implicated it in multiple roles in the auditory system (e.g., dynamic range adaptation, masking reduction, and selective attention). This study presents a systematic simulation of inferior colliculus (IC) responses with and without electrical stimulation of the MOC. Phenomenological models of the responses of auditory nerve (AN) fibers and IC neurons were used to this end. The simulated responses were highly consistent with physiological data (replicated 3 of the 4 known rate-level responses all MOC effects-shifts, high stimulus level reduction and enhancement). Complex MOC efferent effects which were previously thought to require integration from different characteristic frequency (CF) neurons were simulated using the same frequency inhibition excitation circuitry. MOC-induced enhancing effects were found only in neurons with a CF range from 750 Hz to 2 kHz. This limited effect is indicative of the role of MOC activation on the AN responses at the stimulus offset.

IFMBE proceedings, Dec 18, 2015
This randomized cross-over pilot study aimed to evaluate the effect of hearing augmentation on co... more This randomized cross-over pilot study aimed to evaluate the effect of hearing augmentation on cognitive assessment scores and duration to complete cognitive assessment among the elderly in-patients in a teaching hospital. A hearing amplifier was used for hearing augmentation and the Montreal Cognitive Assessment (MoCA) test was used to assess cognition. Seventy one patients were allocated into Group A (n=33) or Group B (n=38) using block randomization. There was no significant difference in total MoCA scores with and without hearing augmentation (p = 0.622). There was a significant improvement in the total scores on the second test that suggests a learning effect (p < 0.05). There was also no significant difference in time taken to complete cognitive assessment with and without hearing augmentation (p = 0.879). Similar statistical tests performed on a subgroup of patients with hearing impairment did not reveal significant results. The results of this study will now inform a larger randomized controlled study evaluating the use of hearing amplifiers as cost-effective solutions to hearing impairment in our older population. © International Federation for Medical and Biological Engineering 2016.

A novel feature based on the simulated neural response of the auditory periphery was proposed in ... more A novel feature based on the simulated neural response of the auditory periphery was proposed in this study for a speaker identification system. A well-known computational model of the auditory-nerve (AN) fiber by Zilany and colleagues, which incorporates most of the stages and the relevant nonlinearities observed in the peripheral auditory system, was employed to simulate neural responses to speech signals from different speakers. Neurograms were constructed from responses of innerhair-cell (IHC)-AN synapses with characteristic frequencies spanning the dynamic range of hearing. The synapse responses were subjected to an analytical function to incorporate the effects of absolute and relative refractory periods. The proposed IHC-AN neurogram feature was then used to train and test the text-dependent speaker identification system using standard classifiers. The performance of the proposed method was compared to the results from existing baseline methods for both quiet and noisy conditions. While the performance using the proposed feature was comparable to the results of existing methods in quiet environments, the neural feature exhibited a substantially better classification accuracy in noisy conditions, especially with white Gaussian and street noises. Also, the performance of the proposed system was relatively independent of various types of distortions in the acoustic signals and classifiers. The proposed feature can be employed to design a robust speech recognition system.

2014 IEEE 19th International Functional Electrical Stimulation Society Annual Conference (IFESS), 2014
Higher-order Spectral (HOS) techniques can be used to detect deviations from linearity, stationar... more Higher-order Spectral (HOS) techniques can be used to detect deviations from linearity, stationarity or Guassianity in the signal. Most of the biomedical signals are non-linear, non-stationary, and non-Gaussian in nature. It is more useful to analyze them with HOS compared to the use of second-order statistics (power spectrum). There are some features in the bispectrum that are capable of differentiating between the signal and signal with noise. This study presents a technique of HOS to investigate the effect of noise on the features of third order statistics (bispectrum). The results show that the magnitudes of the bispectrum are consistently changing as a function of the amount of noise. In addition, these features can be extracted from the speech signal to compare with the respective behavioral responses, and thus a new metric to assess speech intelligibility and quality can be developed.
Modelling the effects of cochlear impairment on the neural

2019 5th International Conference on Advances in Electrical Engineering (ICAEE)
Speaker identification (SID) systems need to be robust to extrinsic variations in the speech sign... more Speaker identification (SID) systems need to be robust to extrinsic variations in the speech signal, such as background noise, to be applicable in many real-life scenarios. Mel-frequency cepstral coefficient (MFCC)-based i-vector systems have been defined as the state-of-the-art technique for speaker identification, but it is well-known that the performance of traditional methods, in which features are mostly extracted from the properties of the acoustic signal, degrades substantially under noisy conditions. This study proposes a robust SID system using the neural responses of a physiologically-based computational model of the auditory periphery. The 2-D neurograms were constructed from the simulated responses of the auditory-nerve fibers to speech signals from the TIMIT database. The neurogram coefficients were trained using the i-vector based systems to generate an identity model for each speaker, and performances were evaluated and compared in quiet and under noisy conditions with the results from existing methods such as the MFCC, frequency-domain linear prediction (FDLP) and Gammatone frequency cepstral coefficient (GFCC). Results showed that the proposed system outperformed all existing acoustic-signal-based methods for both in quiet and under noisy conditions.

2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC)
This paper presents the effect of speech-shaped noise on consonant recognitions in Malay. Scores ... more This paper presents the effect of speech-shaped noise on consonant recognitions in Malay. Scores were measured using a 22-alternative forced choice task. Based on responses, consonants were grouped into low, medium and high scoring sets, and the results were compared with some previous reports on English consonants. Our results showed that fricatives were less affected, whereas laterals and nasals were severely affected by speech-shaped noise. Large differences in consonant recognition scores at unfavorable signal-to-noise ratios (e.g., −10 dB) suggest that speech-shaped noise masked the Malay consonants non-uniformly. This was a key finding that differs from research reports with uniform speech masking in white noise. Masking patterns had some similarities and notable differences between English and Malay. The noted differences may have clinical implications for the design of signal processing strategies for hearing devices that are intended to improve speech understanding in noise for non-English speakers such as Malaysians.

The Journal of the Acoustical Society of America
An objective metric that predicts speech intelligibility under different types of noise and disto... more An objective metric that predicts speech intelligibility under different types of noise and distortion would be desirable in voice communication. To date, the majority of studies concerning speech intelligibility metrics have focused on predicting the effects of individual noise or distortion mechanisms. This study proposes an objective metric, the spectrogram orthogonal polynomial measure (SOPM), that attempts to predict speech intelligibility for people with normal hearing under adverse conditions. The SOPM metric is developed by extracting features from the spectrogram using Krawtchouk moments. The metric's performance is evaluated for several types of noise (steady-state and fluctuating noise), distortions (peak clipping, center clipping, and phase jitters), ideal time-frequency segregation, and reverberation conditions both in quiet and noisy environments. High correlation (0.97-0.996) is achieved with the proposed metric when evaluated with subjective scores by normal-hearing subjects under various conditions.

Computer Speech & Language
Abstract This study proposes a new non-intrusive measure of speech quality, the neurogram speech ... more Abstract This study proposes a new non-intrusive measure of speech quality, the neurogram speech quality measure (NSQM), based on the responses of a biologically-inspired computational model of the auditory system for listeners with normal hearing. The model simulates the responses of an auditory-nerve fiber with a characteristic frequency to a speech signal, and the population response of the model is represented by a neurogram (2D time-frequency representation). The responses of each characteristic frequency in the neurogram were decomposed into sub-bands using 1D discrete Wavelet transform. The normalized energy corresponding to each sub-band was used as an input to a support vector regression model to predict the quality score of the processed speech. The performance of the proposed non-intrusive measure was compared to the results from a range of intrusive and non-intrusive measures using three standard databases: the EXP1 and EXP3 of supplement 23 to the P series (P.Supp23) of ITU-T Recommendations and the NOIZEUS databases. The proposed NSQM achieved an overall better result over most of the existing metrics for the effects of compression codecs, additive and channel noises.

Computer Speech & Language
Abstract The presence of background noise or nonlinear distortions encountered in real-world situ... more Abstract The presence of background noise or nonlinear distortions encountered in real-world situations often reduces the intelligibility of speech signals. Several objective measurements and prediction procedures have been developed to assess speech intelligibility in noise. Most of the existing measures are, however, suitable for only a subset of specified forms of distortion. This study developed a reliable, reference-free speech intelligibility metric that uses the properties of an acoustic signal to predict the effects of a wide range of distortions that influence speech intelligibility in quiet and noisy conditions. The bispectral speech intelligibility metric (BSIM), was developed by extracting the features from the spectrogram of speech signals using the third-order statistics, which are collectively known as the bispectrum. Speech intelligibility scores predicted by the BSIM were compared to behavioral speech intelligibility scores in quiet and noise. The performance of the BSIM was also compared with that of several widely used speech intelligibility metrics. Results showed that the BSIM can successfully predict nonlinear distortions, such as peak-clipping and center-clipping, as well as time domain distortions, such as phase-jitter and reverberation. Unlike existing metrics, such as the articulation index and speech transmission index, the BSIM successfully captured the effect of fluctuating noise on speech intelligibility and predicted the effects of the degradation of noisy speech processed by the ideal time-frequency segregation method. The BSIM presents a reliable, reference-free, and objective measure of speech intelligibility that can provide real-time predictions of the effect of signal processing and acoustics distortion on speech intelligibility in quiet and noise. In addition, the BSIM could be used to analyze algorithms that process noisy speech.

IET Signal Processing
Classification of speech phonemes is challenging, especially under noisy environments, and hence ... more Classification of speech phonemes is challenging, especially under noisy environments, and hence traditional speech recognition systems do not perform well in the presence of noise. Unlike traditional methods in which features are mostly extracted from the properties of the acoustic signal, this study proposes a new feature for phoneme classification using neural responses from a physiologically based computational model of the auditory periphery. The two-dimensional neurogram was constructed from the simulated responses of auditory-nerve fibres to speech phonemes. Features of neurogram images were extracted using the Discrete Radon Transform, and the dimensionality of features was reduced using an efficient feature selection technique. A standard classifier, Support Vector Machine, was employed to model and test the phoneme classes. Classification performance was evaluated in quiet and under noisy conditions in which test data were corrupted with various environmental distortions such as additive noise, room reverberation, and telephone-channel noise. Performances were also compared with the results from existing methods such as the Mel-frequency cepstral coefficient, Gammatone frequency cepstral coefficient, and frequency-domain linear prediction-based phoneme classification methods. In general, the proposed neural feature exhibited a better classification accuracy in quiet and under noisy conditions compared with the performance of most existing acoustic-signal-based methods.

Journal of the Association for Research in Otolaryngology : JARO, 2017
A phenomenological model of the responses of neurons in the superior paraolivary nucleus (SPON) o... more A phenomenological model of the responses of neurons in the superior paraolivary nucleus (SPON) of the rodent is presented in this study. Pure tones at the characteristic frequency (CF) and broadband noise stimuli evoke offset-type responses in these neurons. SPON neurons also phase-lock to the envelope of sinusoidally amplitude-modulated (SAM) stimuli for a range of modulation frequencies. Model SPON neuron received inhibitory input that was relayed by the ipsilateral medial nucleus of the trapezoid body from the contralateral model ventral cochlear nucleus neuron. The SPON model response was simulated by detecting the slope of its inhibitory postsynaptic potential. Responses of the proposed model to pure tones at CF and broadband noise were offset-type independent of the duration of the input stimulus. SPON model responses were also synchronized to the envelope of SAM stimuli with precise timing for a range of modulation frequencies. Modulation transfer functions (MTFs) obtained f...

IEEE Access, 2017
In order to mimic the capability of human listeners identifying speech in noisy environments, thi... more In order to mimic the capability of human listeners identifying speech in noisy environments, this paper proposes a phoneme classification technique using simulated neural responses from a physiologically based computational model of the auditory periphery instead of using features directly from the acoustic signal. The 2-D neurograms were constructed from the simulated responses of the auditory-nerve fibers to speech phonemes. The features of the neurograms were extracted using the Radon transform and used to train the classification system using a deep neural network classifier. Classification performance was evaluated in quiet and under noisy conditions for different types of phonemes extracted from the TIMIT database. Based on simulation results, the proposed method outperformed most of the traditional acoustic-property-based phoneme classification methods for both in quiet and under noisy conditions. The proposed method could easily be extended to develop an automatic speech recognition system.
Uploads
Papers by MUHAMMAD ZILANY