N^2 Statistic

The script below is used to generate the N^2 Statistic using the Max Correlation Coefficient Classifier as a function of the number of neurons used to perform the analysis. The script runs from the minimum to the maximum number of neurons present on the given file and using the parallel computing toolbox under matlab runs the analysis using 25 resample runs. The algorithm for running the N^2 statistic is similar to Cell Population Analysis with minor differences.

For more information about the N^2 Statistic and it’s application’s, please refer to Meyers et al paper ‘Incorporation of new information into prefrontal cortical activity after learning working memory tasts’ (2012).

The analysis is divided into three sections, please use the table of contents to browse through this page.

  1. Generate decoding result files to be used as null distribution files.
  2. Use the p_value object to generate the p_values using a real decoding result file and the files generated in Step 1
  3. Run an Running the ANOVA |ANOVA and plot the p-value of the neuron at the latency when it has its maximal selectivity.

Note

This section assumes that the reader is familiar with the basic workings of the Neural Decoding Toolbox (NDT). For further information, please see the official website official website. Additionally, please refer to the NDT Components.

Generate Decoding Result File

Adding to Path

Before the script can be run, the toolbox must be added to the Matlab Path.

% add the path to the NDT so add_ndt_paths_and_init_rand_generator can be called
toolbox_basedir_name = '/Users/UserName/Matlab/NDT_Toolbox/';
addpath(toolbox_basedir_name);
 
% add the NDT paths using add_ndt_paths_and_init_rand_generator
add_ndt_paths_and_init_rand_generator

Initialization

Some directory paths are set to let the script know where to locate and store files. The ‘iFILENAME’ variable is passed from a master script which gets the name of the files in a directory.

homeDir = pwd;
dataToClassifyDir = '/Volumes/DATA_21/DEC_MAT_RES_TOCLASSIFY_FINAL';
CLASSIFIED = '/Volumes/DATA_21/DEC_MAT_RES_BASIC_DS_CLASSIFIED_PopSize_MaxN';
 
% --- LOAD FILES
cd (CLASSIFIED)
mkdir (iFILENAME)
cd (dataToClassifyDir);
load(iFILENAME)

Some Preprocessing

This script finds the number of times each stimulus condition was repeated, as this is useful for determining the number of cross validation splits that may be used. Ideally, the number of cross validation (CV) splits should be large, but a smaller CV size maybe used if computation time is critical.

for k = 1:65
    inds_of_sites_with_at_least_k_repeats = find_sites_with_k_label_repetitions(binned_labels.value_highlow, k);
    num_sites_with_k_repeats(k) = length(inds_of_sites_with_at_least_k_repeats);
end
clear inds_of_sites_with_at_least_k_repeats
clear k

In case the program terminates prematurely, a function ‘findRemaining’ is written in order to run the analysis on the remaining cell population.

cd(homeDir)
toDecode = findRemaining(CLASSIFIED,iFILENAME,length(binned_data));
cd(homeDir)

Specify the label name to look for in our data set file.

specific_label_name_to_use = 'value_highlow';

Choose a suitable CV split size based on the ‘k’ value obtained earlier. A larger split size may give better results at the cost of CPU computation time.

num_cv_splits = 20;

Parallel Computing

We now run the script in matlab using the parfor command. The maximum simultaneous number of workers that may be used depends on Matlab licensing as well as the number of processor’s available.

parfor AA = 1:length(toDecode) %Cycle through the different population Sizes each time!

Create a Basic Datasource

The datasource created below is used to generate k leave-one-fold-out cross validation splits of the data. We have specified k=20 as above. In addition, the ‘sites_to_use’ parameter is set to limit the number of sites which are randomly selected to run the Decoding Analysis. In this case, we are only selecting one site at a time to run the N^2 analysis. In addition, we have the ‘randomly_shuffle_labels_before_running’ argument set to ‘1’ so that we can generate the null distribution files needed for the p_value object in the next section. The second loop iteration ‘for I’ is used to generate atleast 10 null distribution values. In our example, we use 9 of these as null distribution files, and the remaining one as the ‘real decoding’ file for our analysis.

cd (dataToClassifyDir)
ds = basic_DS(iFILENAME, specific_label_name_to_use, num_cv_splits);
ds.sites_to_use = AA; %Creates a random pseudopopulation with the number of given sites
 
for I = 1:10
ds.randomly_shuffle_labels_before_running = 1; % This is to be able to generate the p-values

Feature Pre-processor

Any preprocessing is applied using any of the three available feature preprocessors. In the following example script, the zscore_normalize was applied to consider neurons having different ranges of firing rates. See Feature Preprocessors at the readout.info website for more information.

fp = {}; 
fp{1}= zscore_normalize_FP;
Insert Equation Here

Choosing a Classifier

The max correlation coefficient classifier is used in this script. For a detailed description of the classifiers see Classifiers.

the_classifier = max_correlation_coefficient_CL;

Cross Validator

The Cross Validator runs a cross validation decoding analysis by testing and training the classifier with data obtained from the Datasource after being optionally processed through any feature preprocessors. In our case, we used the zscore_normalize feature preprocessor. The scheme is run a total of 25 times.

the_cross_validator = standard_resample_CV(ds, the_classifier, fp);
the_cross_validator.num_resample_runs = 25;
the_cross_validator.confusion_matrix_params.create_confusion_matrix = 1;

Run the Cross Validator

DECODING_RESULTS = the_cross_validator.run_cv_decoding;

Save Script to File

The script is saved by calling a function ‘iSaveX’ which is necessary for handling requests in a parallel computing environment.

DECODING_RESULTS = the_cross_validator.run_cv_decoding;
 
    save_file_name = strcat('#N','_',int2str(AA),'_',int2str(I),'_RESULTS');
 
    cd (homeDir)
 
    iSaveX(save_file_name,DECODING_RESULTS,homeDir,CLASSIFIED,iFILENAME);
end

P Value Object

In order to generate the p_values, the Neural Decoding Toolbox specifies that we must have the null distribution files in a directory, and the real decoding file must be located somewhere else. Thus we begin by first sorting our files into directories, where the directory name corresponds to one of the sites used. The real decoding result files are labeled according to the number of the site they are coming from and are stored in the root directory with respect to the null distribution files.

Please refer to the File ‘ExtractAndPlot.m’ in the N2_Statistic directory for the code below.

The folder structure generated for such is as follows:

//Root Folder

   //1
      //Null distribution files for site 1
   //2
     //Null distribution files for site 2
   .
   .
   .
   Real_Decoding_Result_File_Site#.mat
   .
   .
   .

Program Structure

The program is divided in a somewhat awkward layout for someone that has not seen this implementation before. Future revisions may be needed to remedy the structure shown below.

Extraction Script.m –> ExtractandPlot.m –> Run_Anova.m
Main script to be called by the user. This file sets up the correct path names of the input and output files. The ‘rootDir’ variable refers to the folder which contains the pre-generated decoding results in the previous section. Sorting of the files and into real_decoding_result_files and null_distribution_file_folders as well as calling the Run_Anova Function to run on the original Decoding Results. Finally, outputs the data to a file once the latency time and ehta_value have been extracted. The Analysis of Variation that was run will be further explained in a future wiki documentation.

Sorting Files and Folders

The code block above shows the file structure generated by the code below. As stated earlier, this structure is necessary to run the p_value object script

%Go into the directory ex(250_colloc..etc)
    cd(dirName)
    list = dir('*.mat');
 
 
    %% Sort the files
    sourcePath = pwd; %Purpose of this is to ensure absolute paths so no mistakes are made
    for i = 1:length(list)
        if list(i).isdir == 0 %Ensure it is a file that we are dealing with
           name = list(i).name;
           k = strfind (name,'_');
           %Now that we know the neuron number, etc. we can start moving these
           %files
           sdirName = ((name(k(1)+1:k(2)-1))); %First one indicates neuron number
           %% Create the directory
           destPath = [sourcePath '/' sdirName]; 
            if ~exist(destPath,'dir')
              mkdir(destPath)
            end
            %% Move the files except for the first one
 
            temp = str2num(((name(k(2)+1:k(3)-1))));
            if temp ~= 1
               movefile (name,destPath);
               fprintf('Moving %s\n',name);
            end
 
        end
    end

Preindexing of files

Once we have sorted the files in their correct folders, we simply reread the folder structure so we can find the corresponding array index of the given file that we want to work on. The variable ‘indexFiles’ indexes the location of each neuronal site to it’s array index.

%% Preindex all the files so that we know where each neuron file is
    list = dir('*.mat');
    indexFiles = [];
    temp = 0;
    for i=1:length(list)
        if list(i).isdir == 0 %Looks for files only
           name = list(i).name;
           k = strfind (name,'_');
           temp = str2num(((name(k(1)+1:k(2)-1))));
           indexFiles = [indexFiles; i temp];
 
        end
    end

Run the P_ValueObject

Before we run this algorithm, we first set some parameters; the initial directory number and the structure of the output Decoding Results file.

dirNum = 0;
    DECODING_RESULTS_STRUCT = struct('p_values',{[]} ,'null_distributions',{[]} ,'PVALUE_PARAMS',{[]},'latency_time',{[]} ,'latency_ind',{[]});
 
    list = dir;
    totalFiles = length(dir('*.mat'));
    global count;
    count = 0;
 

Now we can continue and run the actual script for generating the pvalues.For further questions.

for i = 1:length(list)
        if (list(i).isdir == 1) %Ensure that we are looking into that directory
           [dirNum, status] = str2num(list(i).name); %Get the dirNumber
           %Ensures only the relavent directories are loaded (None with '.'
           %etc.
           if status == 0
               continue
           end
 
           %Find the corresponding file that will be used for the 'null
           %distribution' permutation test.
 
           %% Find the file number
           fileNameNum = find(indexFiles(:,2) == dirNum);
 
           fprintf('Currently working on file %d of %d\n', count,totalFiles);
           count = count + 1;
 
           %% Find the corresponding filename
           fileName = list(fileNameNum).name;
           fprintf('File: %s. dirNum: %d\n',fileName,dirNum);
           %null_distribution_directory_name = [dirName '/' int2str(dirNum)];
           null_distribution_directory_name = [ int2str(dirNum) '/'];
           %Start the P_value_Object
           pval_obj = pvalue_object(fileName, null_distribution_directory_name);
 
           %% Create the pvalues       
           [p_values null_distributions PVALUE_PARAMS] = pval_obj.create_pvalues_from_nulldist_files;
 
 
 
 
 
 
 
           %% Save this information to file
           DECODING_RESULTS_STRUCT(fileNameNum).p_values = p_values;
           DECODING_RESULTS_STRUCT(fileNameNum).null_distributions = null_distributions;
           DECODING_RESULTS_STRUCT(fileNameNum).PVALUE_PARAMS = PVALUE_PARAMS;
           [~,B] = min(p_values);
           DECODING_RESULTS_STRUCT(fileNameNum).pTime = B;
           %DECODING_RESULTS_STRUCT(fileNameNum).latency_time = latency_time;
           %DECODING_RESULTS_STRUCT(fileNameNum).latency_ind = latency_ind;
 
 
        end
 
    end

Note that after running this script, nothing has been saved to file, everything is still in memory. This practice is not recommended due to the long computation time required to complete, and at the risk of premature program termination, data may be lost. For future implementations it is recommended that the user place a ‘save point’ here.

Running the ANOVA

The analysis of variation itself is run by a separate function. More information about this will be added in a future revision of this wiki.

anovaDir = '/Volumes/DATA_21/DEC_MAT_RES_TOCLASSIFY_FINAL/';
    cd(rootDir)
    DECODING_RESULTS = Run_Anova(outputName);

Finally, the data is stored to file.

    latency_time = [];
    ehta_value = [];
    for i = 1:length(DECODING_RESULTS_STRUCT)
        % 1) Find the smallest p_value for the particular neuron and its
        % corresponding time
        latency_time(i)= DECODING_RESULTS_STRUCT(i).pTime;
        % 2) Extract the n^2 statistic
        ehta_value (i) = DECODING_RESULTS.explainedVariance(i);
 
 
    end
 
    cd(outputDir)
    save(outputName,'latency_time','ehta_value')
    fprintf('Successfully Completed: %s',outputName);