1 Abstract
This paper considers the problem of neural decoding from parallel neural measurements systems such as microelectrocorticography (ECoG). In systems with large numbers of array elements at very high sampling rates, the dimension of the raw measurement data may be large. Learning neural decoders for this highdimensional data can be challenging, particularly when the number of training samples is limited. To address this challenge, this work presents a novel neural network decoder with a lowrank structure in the first hidden layer. The lowrank constraints dramatically reduce the number of parameters in the decoder while still enabling a rich class of nonlinear decoder maps. The lowrank decoder is illustrated on ECoG data from the primary auditory cortex (A1) of awake rats. This decoding problem is particularly challenging due to the complexity of neural responses in the auditory cortex and the presence of confounding signals in awake animals. It is shown that the proposed lowrank decoder significantly outperforms models using standard dimensionality reduction techniques such as principal component analysis (PCA).
Keywords: auditory decoding; neural networks; lowrank filter; dimensionality reduction
2 Introduction
Advancements in neural recording technologies, particularly calcium imaging and highdimensional microelectrocorticography (
ECoG), now enable measurements of tremendous numbers of neurons or brain regions in parallel
Chang (2015); Fukushima . (2015); Stosiek . (2003). While these recordings offer the potential to observe neural activity at unprecedented level of detail, the highdimensionality presents a fundamental challenge for learning neural decoding systems from data.This dimensionality problem is particularly acute for the focus of this work, namely neural decoding of signals in the primary auditory signals from stateoftheart ECoG. Most importantly, in modern ECoG systems, the dimensionality of the meaured responses often exceeds the number of training examples. For example, in the application we discuss below the responses from the ECoG array Insanally . (2016) for each stimuli consists of approximately 160 time samples across 61 electrodes, resulting in a raw feature dimension of . However, due to experimental limits on the duration of the experiments, there are less than 400 training examples. Moreover, responses in the primary auditory cortex are known to be complex Zatorre . (2002); Młynarski McDermott (2018). Also, for awake animals, the responses may have confounding components from movements. Consequently, neural decoding systems must be sufficiently rich to enable nonlinear decoding and confounding signal rejection.
This work presents a novel approach for neural decoding from parallel neural measurements with a small number of parameters while being able to capture complex nonlinear relationships between the measurements and stimulus. The approach is based on a traditional neural network structure, but with two key novel properties: (1) A discretecosine transform (DCT) preprocessing stage used to reduce the sampling rate; and (2) An initial linear layer of the neural network with a low rank structure. We argue that both structures are welljustified based on the physical processes and can dramatically reduce the number of parameters. The method is demonstrated in neural decoding in the rat primary auditory cortex (A1) from a new highdimensional ECoG array Insanally . (2016).
2.1 Previous work
Despite advancements in machine learning tools, traditional methods are still common in auditory decoding
Glaser . (2017). Some of these methods consider both linear and nonlinear mapping of the neural responses to the auditory spectrogram Pasley . (2012). Linear neural decoders like support vector machines (SVM) have also been widely discussed to classify behavioral responses using population activity
Francis . (2018). As in de Cheveigné . (2018) other methods like canonical correlation analysis (CCA) have also been used as linear models to measure the correlation between the stimulus and response as a goodness of fit after transforming them.Multilayer neural networks showed remarkable success in feature extraction and classification in machine vision and speech processing
Yamins DiCarlo (2016). Since auditory signals arrive to the cortex after having been passed through a number of sensory processing areas, these networks are appealing to model the responses in auditory cortex Hackett (2011).There is also a large body of literature in dimensionality reduction methods for highdimensional neural recordings Cunningham Byron (2014); Mazzucato . (2016); Williamson . (2016); Sadtler . (2014)
. The methods are largely based on the unlabeled data and attempt to find a lowdimensional latent representation that can capture the bulk of the signal variance. Neural decoders can then be trained on the lowdimensional representation to reduce the number of parameters. As we will see in the results section below, our method can outperform these dimensionality reductionbased techniques since the proposed method operates on the labeled data and, in essence, find the directions of variance that are best tuned for the neural decoding task.
3 Model Description
We consider the problem of decoding stimuli from dimensional neural responses recorded from some area of the brain. Such responses can arise from any parallel measurements system including responses measured by an ECoG microelectrode array with channels, calcium traces from neurons, or signals recorded by the recently developed Neuropixel probes Callaway Garg (2017). Let be the response to some stimulus recorded in a time window of length after the stimulus is applied. Given inputoutput sample pairs
, the neural decoding problem is to learn a decoder that can estimate the stimulus
from a new response . Depending on whether the stimuli is discrete or continuousvalued, the decoding problem can be viewed either as a classification or regression.The key challenge in this decoding problem is the potential highdimensionality of the input to the decoder . Since the response has features, even linear classification or regression would require parameters. This number of parameters may easily exceed the number of trials on which the decoder can be trained. Thus, some form of dimensionality reduction or structure on the decoder is required.
To address this challenge, we propose a novel lowrank neural network structure to reduce the number of parameters while still enabling rich nonlinear maps from the response to the stimulus estimate. Here, we present the model for a regression problem with a scalar target . However, the same model can be used for classification or multitarget regression with minor modifications. Figure 1 shows the structure of the model proposed for decoding multidimensional neural processes. The first stage preprocesses the data by passing each of the time samples of the components through a discrete cosine transform (DCT). To lowpass filter the signal, only the first
coefficients in the frequency domain are retained, hence reducing the dimension from
to . The lowpass filtering is welljustified assuming that the neural responses to the stimuli are typically bandlimited.After the lowpass filtering, the resulting frequencydomain matrix is passed through a neural network with two hidden layers and one output layer,
(1)  
(2)  
(3) 
where
is the sigmoid function. The key novel feature of this network is in the first layer (
1), where each hidden unit is computed from inner product of the input with a rank one matrix . The second hidden layer (2) and output layer (3) are mostly standard. The only slightly nonstandard component is that, in the output, we have assumed that the stimuli is bounded as scaled to a range so that we can use a sigmoid output.The main motivation of the rank one structure (1) is to reduce the number of parameters. A standard fully connected layer would require parameters for each hidden unit, requiring a total of parameters. In contrast, the rank one layer (1) uses only parameters. We will see in the results section that this savings can be considerable.
The low rank structure can be justified, at least heuristically, under the assumption of a low rank structure of the neural responses. Specifically suppose that the frequencydomain neural responses,
, are approximately given by,(4) 
where are some latent variables caused by the stimuli, , and and are, respectively, the responses of the latent variable over the measurement channel index and frequency index . Under this assumption, a natural way to estimate the stimulus , is to first estimate the vector of latent variables from and then estimate from the vector . Now, we can write (4) as where is a linear map. The (regularized) least squares estimate for given is then given by for some regularization level . Due to the separability structure (4), it is easily verified that each estimate will be of the form,
for some weights and . Hence, the first layers (1) and (2) of the proposed neural network can be interpreted as recovering the latent variables under a linear lowrank output model.
4 Results
4.1 ECoG data from auditory cortex
We evaluate the performance of our model using in vivo ECoG recordings of A1 area of auditory cortex in moving rodents. Signals are recorded from a high resolution ECoG array with electrodes with 420 spacing. The electrodes were arranged in an 8 x 8 grid where three corner electrodes were omitted Insanally . (2016). In each experiment, single frequency tones with different frequencies are played for every second and the responses are recorded. Figure 2 shows the experiment setup and the electrode array. Recorded signals are then downsampled to for further processing. There are a total of 390 tones played in each experiment.
4.2 Decoder performance
To train our model and test its performance, we generate a dataset . Each sample consists of the frequency of the stimulus as the input and a window extracted from the signals after the stimulus is applied as the response . Since the sampling frequency is , each is a matrix with channels and time samples. The input frequencies are shifted and rescaled to fall inside the interval . Taking the point DCT of the signal where , we choose the first 55 frequencies to reduce the dimensionality. We then pass the signal through a lowrank layer with 10 rankone units. This layer is followed by a Dense layer with 4 hidden units and sigmoid activation. The output layer is a single linear unit with a sigmoid nonlinearity which gives us the predicted frequency index. We have used regularization with in learning the weights of both separable and fully connected layers. The model is trained on of the whole dataset and evaluated on the remaining as the test set. The goal is to estimate the index of the frequency as a regression problem and Rsquared score is used as a measure of closeness of data to the fitted regression model.
We compare the performance of the proposed lowrank neural network with three commonly used models:

PCA + linear: top
principal components of the input are used for linear regression. There are a total of
parameters in this model. We use both and regularizers. 
PCA + SVM: top principal components of the input are taken followed by a support vector regressor. There are a total of parameters in this model.

PCA + NN: top principal components of the input are taken followed by a neural network with one hidden layer composed of units. There are a total of parameters in this model. We use regularization for the weights.
For all three models we take top
principal components. Cross validation is used to tune the parameters. For SVM, both linear and radial basis function (RBF) kernel were tried and it was found that RBF gives better results.
Figure 3 shows the performance of all four models in estimating the stimulus frequency on the test dataset. Estimated frequency (
) from each model with one standard deviation error is plotted against the true frequency (
). The dashed line shows the line , corresponding to a perfect model. Therefore, the distance of the prediction curve of each model to this line corresponds to the bias of the estimator and the error shades correspond to the variance of the estimator. The lowrank neural network is closest to the reference line, showing that it is performing better the other models. Table 1 summarizes the performance of each model in estimating the logfrequency of the stimulus in terms of the Rsquared metric along with the rootmeansquare errors (RMSE).Method  Rsquared score  RMSE 

PCA + Linear  0.484  0.179 
PCA + SVM  0.476  0.181 
PCA + NN  0.510  0.174 
Lowrank NN  0.761  0.121 
5 Conclusion
The problem of decoding multidimensional neural responses can be challenging due to high dimensionality of the data. In this work, we presented a neural network model with lowrank structure weights as the first hidden layer which significantly reduces the number of parameters compared to a fully connected network. We tested the model for decoding ECoG data recorded from A1 area of auditory cortex of awake rats. We compared the proposed model with some of the most widely used models for decoding neural signals. We showed that our model performs much better in predicting the frequency of the stimulus.
References
 Callaway Garg (2017) callaway2017brainCallaway, EM. Garg, AK. 2017. Brain technology: Neurons recorded en masse Brain technology: Neurons recorded en masse. Nature5517679172.
 Chang (2015) chang2015towardsChang, EF. 2015. Towards largescale, humanbased, mesoscopic neurotechnologies Towards largescale, humanbased, mesoscopic neurotechnologies. Neuron86168–78.
 Cunningham Byron (2014) cunningham2014dimensionalityCunningham, JP. Byron, MY. 2014. Dimensionality reduction for largescale neural recordings Dimensionality reduction for largescale neural recordings. Nature neuroscience17111500.
 de Cheveigné . (2018) de2018decodingde Cheveigné, A., Wong, DD., Di Liberto, GM., Hjortkjær, J., Slaney, M. Lalor, E. 2018. Decoding the auditory brain with canonical component analysis Decoding the auditory brain with canonical component analysis. NeuroImage172206–216.
 Francis . (2018) francis2018smallFrancis, NA., Winkowski, DE., Sheikhattar, A., Armengol, K., Babadi, B. Kanold, PO. 2018. Small Networks Encode DecisionMaking in Primary Auditory Cortex Small networks encode decisionmaking in primary auditory cortex. Neuron974885–897.
 Fukushima . (2015) fukushima2015studyingFukushima, M., Chao, ZC. Fujii, N. 2015. Studying brain functions with mesoscopic measurements: Advances in electrocorticography for nonhuman primates Studying brain functions with mesoscopic measurements: Advances in electrocorticography for nonhuman primates. Current opinion in neurobiology32124–131.
 Glaser . (2017) glaser2017machineGlaser, JI., Chowdhury, RH., Perich, MG., Miller, LE. Kording, KP. 2017. Machine learning for neural decoding Machine learning for neural decoding. arXiv preprint arXiv:1708.00909.
 Hackett (2011) hackett2011informationHackett, TA. 2011. Information flow in the auditory cortical network Information flow in the auditory cortical network. Hearing research27112133–146.
 Insanally . (2016) insanally2016lowInsanally, M., Trumpis, M., Wang, C., Chiang, CH., Woods, V., PalopoliTrojani, K.Viventi, J. 2016. A lowcost, multiplexed ECoG system for highdensity recordings in freely moving rodents A lowcost, multiplexed ecog system for highdensity recordings in freely moving rodents. Journal of neural engineering132026030.
 Mazzucato . (2016) mazzucato2016stimuliMazzucato, L., Fontanini, A. La Camera, G. 2016. Stimuli reduce the dimensionality of cortical activity Stimuli reduce the dimensionality of cortical activity. Frontiers in systems neuroscience1011.
 Młynarski McDermott (2018) mlynarski2018learningMłynarski, W. McDermott, JH. 2018. Learning midlevel auditory codes from natural sound statistics Learning midlevel auditory codes from natural sound statistics. Neural computation303631–669.
 Pasley . (2012) pasley2012reconstructingPasley, BN., David, SV., Mesgarani, N., Flinker, A., Shamma, SA., Crone, NE.Chang, EF. 2012. Reconstructing speech from human auditory cortex Reconstructing speech from human auditory cortex. PLoS biology101e1001251.
 Sadtler . (2014) sadtler2014neuralSadtler, PT., Quick, KM., Golub, MD., Chase, SM., Ryu, SI., TylerKabara, EC.Batista, AP. 2014. Neural constraints on learning Neural constraints on learning. Nature5127515423.
 Stosiek . (2003) stosiek2003vivoStosiek, C., Garaschuk, O., Holthoff, K. Konnerth, A. 2003. In vivo twophoton calcium imaging of neuronal networks In vivo twophoton calcium imaging of neuronal networks. Proceedings of the National Academy of Sciences100127319–7324.
 Williamson . (2016) williamson2016scalingWilliamson, RC., Cowley, BR., LitwinKumar, A., Doiron, B., Kohn, A., Smith, MA. Byron, MY. 2016. Scaling properties of dimensionality reduction for neural populations and network models Scaling properties of dimensionality reduction for neural populations and network models. PLoS computational biology1212e1005141.

Yamins DiCarlo (2016)
yamins2016usingYamins, DL. DiCarlo, JJ.
2016.
Using goaldriven deep learning models to understand sensory cortex Using goaldriven deep learning models to understand sensory cortex.
Nature neuroscience193356.  Zatorre . (2002) zatorre2002structureZatorre, RJ., Belin, P. Penhune, VB. 2002. Structure and function of auditory cortex: music and speech Structure and function of auditory cortex: music and speech. Trends in cognitive sciences6137–46.
Comments
There are no comments yet.