Effects of Data Distributions and Distance Measures in Representational Similarity Analysis
Name
Savelii Vorontcov
Abstract
Representational Similarity Analysis (RSA) is an analysis technique often used in Computational Neuroscience.
In the context of measured brain data, it allows us to get representations of various stimuli in the brain and compare these representations between different brain regions, between different species and different modalities of measured data.
Comparing data gathered using different modalities is a particularly challenging task in neuroscience because it would require us to perform mapping between modalities in question, which in some cases can be ill-defined.
The task of comparing brain-activity data with computational or behavioral models might be even more challenging. RSA addresses all mentioned issues.
One question that arises is how much linear correlations get distorted after applying RSA, which is addressed in this study.
We consider in detail how correlations between two arrays of underlying data influence correlations between corresponding representations after applying RSA.
Results show that in all cases rank correlations in processed data are lower or equal than linear correlations in initial data. This effect is particularly noticeable for intermediate values of linear correlation (0.3-0.6).
The implication is that RSA underestimates linear correlations captured by underlying data. In other words, correlations in initial data tend to be higher or equal compared to the ones calculated through RSA.
Since some brain studies involving RSA make conclusions about dependence structure in data based on correlations between calculated representations, it is be useful to know how the real correlation structure gets distorted.
In a broader perspective, it might influence what we consider a "high" or "low" correlation in the context of RSA and when correlation is significant enough for us to conclude that two arrays of data are interdependent.
In the context of measured brain data, it allows us to get representations of various stimuli in the brain and compare these representations between different brain regions, between different species and different modalities of measured data.
Comparing data gathered using different modalities is a particularly challenging task in neuroscience because it would require us to perform mapping between modalities in question, which in some cases can be ill-defined.
The task of comparing brain-activity data with computational or behavioral models might be even more challenging. RSA addresses all mentioned issues.
One question that arises is how much linear correlations get distorted after applying RSA, which is addressed in this study.
We consider in detail how correlations between two arrays of underlying data influence correlations between corresponding representations after applying RSA.
Results show that in all cases rank correlations in processed data are lower or equal than linear correlations in initial data. This effect is particularly noticeable for intermediate values of linear correlation (0.3-0.6).
The implication is that RSA underestimates linear correlations captured by underlying data. In other words, correlations in initial data tend to be higher or equal compared to the ones calculated through RSA.
Since some brain studies involving RSA make conclusions about dependence structure in data based on correlations between calculated representations, it is be useful to know how the real correlation structure gets distorted.
In a broader perspective, it might influence what we consider a "high" or "low" correlation in the context of RSA and when correlation is significant enough for us to conclude that two arrays of data are interdependent.
Graduation Thesis language
English
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Raul Vicente Zafra
Defence year
2024