[519993基金]Spearman's rank correlation coefficient (斯皮尔曼秩相关系数)

Spearman'srankcorrelationcoefficient

FromWikipedia,thefreeencyclopedia

Jumpto:navigation,search

Instatistics,Spearman'srankcorrelationcoefficientorSpearman'srho,namedafterCharlesSpearmanandoftendenotedbytheGreekletterρ(rho)orasrs,isanon-parametricmeasureofcorrelation–thatis,itassesseshowwellanarbitrarymonotonicfunctioncoulddescribetherelationshipbetweentwovariables,withoutmakinganyassumptionsaboutthefrequencydistributionofthevariables.

Contents

1Calculation

2Example

3Determiningsignificance

4CorrespondenceanalysisbasedonSpearman'srho

5Seealso

6Notes

7References

8Externallinks

Calculation

Inprinciple,ρissimplyaspecialcaseofthePearsonproduct-momentcoefficientinwhichtwosetsofdataXiandYiareconvertedtorankingsxiandyibeforecalculatingthecoefficient.[1]Inpractice,however,asimplerprocedureisnormallyusedtocalculateρ.Therawscoresareconvertedtoranks,andthedifferencesdibetweentheranksofeachobservationonthetwovariablesarecalculated.

Iftherearenotiedranks,i.e.

thenρisgivenby:

where:

di=xi?yi=thedifferencebetweentheranksofcorrespondingvaluesXiandYi,and

n=thenumberofvaluesineachdataset(sameforbothsets).

Iftiedranksexist,classicPearson'scorrelationcoefficientbetweenrankshastobeusedinsteadofthisformula:[1]

Onehastoassignthesameranktoeachoftheequalvalues.Itisanaverageoftheirpositionsintheascendingorderofthevalues:

Anexampleofaveragingranks

Inthetablebelow,noticehowtherankofvaluesthatarethesameisthemeanofwhattheirrankswouldotherwisebe.

VariableXi?Positioninthedescendingorder?Rankxi

0.8?5?5

1.2?4?

1.2?3?

2.3?2?2

18?1?1

Inthiscasewecannotusetheshortcutformula(becauseofthetiedranksinthedata)andmustusethesecond,product-momentform.

Example

TherawdatausedinthisexampleisshownbelowwherewewanttocalculatethecorrelationbetweentheIQofsomeonewiththenumberofhoursspentinfrontofTVperweek.

IQ,Xi?HoursofTVperweek,Yi

106?7

86?0

100?27

101?50

99?28

103?29

97?20

113?12

112?6

110?17

Thefirststepistosortthisdatabythefirstcolumn.Next,twomorecolumnsarecreated(xiandyi).Thelastofthesecolumns(yi)isassigned1,2,3,...n,andthenthedataissortedbythefirstoriginalcolumn(Xi).Thefirstofthenewlycreatedcolumns(xi)isassigned1,2,3,...n.Thenacolumndiiscreatedtoholdthedifferencesbetweenthetworankcolumns(xiandyi).Finallyanothercolumnshouldbecreated.Thisisjustcolumndisquared.

Afterdoingthisprocesswiththeexampledatayoushouldendupwithsomethinglike:

IQ,Xi?HoursofTVperweek,Yi?rankxi?rankyi?di?

86?0?1?1?0?0

97?20?2?6?-4?16

99?28?3?8?-5?25

100?27?4?7?-3?9

101?50?5?10?-5?25

103?29?6?9?-3?9

106?7?7?3?4?16

110?17?8?5?3?9

112?6?9?2?7?49

113?12?10?4?6?36

Thevaluesinthecolumncannowbeaddedtofind.Thevalueofnis10.Sothesevaluescannowbesubstitutedbackintotheequation,

whichevaluatestoρ=?0.175758whichshowsthatthecorrelationbetweenIQandhourspendbetweenTVisreallylow(barelyanycorrelation).Inthecaseoftiesintheoriginalvalues,thisformulashouldnotbeused.Instead,thePearsoncorrelationcoefficientshouldbecalculatedontheranks(wheretiesaregivenranks,asdescribedabove).

Determiningsignificance

Themodernapproachtotestingwhetheranobservedvalueofρissignificantlydifferentfromzero(wewillalwayshave1≥ρ≥?1)istocalculatetheprobabilitythatitwouldbegreaterthanorequaltotheobservedρ,giventhenullhypothesis,byusingapermutationtest.Thisapproachisalmostalwayssuperiortotraditionalmethods,unlessthedatasetissolargethatcomputingpowerisnotsufficienttogeneratepermutations,orunlessanalgorithmforcreatingpermutationsthatarelogicalunderthenullhypothesisisdifficulttodevisefortheparticularcase(butusuallythesealgorithmsarestraightforward).

Althoughthepermutationtestisoftentrivialtoperformforanyonewithcomputingresourcesandprogrammingexperience,traditionalmethodsfordeterminingsignificancearestillwidelyused.Themostbasicapproachistocomparetheobservedρwithpublishedtablesforvariouslevelsofsignificance.Thisisasimplesolutionifthesignificanceonlyneedstobeknownwithinacertainrangeorlessthanacertainvalue,aslongastablesareavailablethatspecifythedesiredranges.Areferencetosuchatableisgivenbelow.However,generatingthesetablesiscomputationallyintensiveandcomplicatedmathematicaltrickshavebeenusedovertheyearstogeneratetablesforlargerandlargersamplesizes,soitisnotpracticalformostpeopletoextendexistingtables.

AnalternativeapproachavailableforsufficientlylargesamplesizesisanapproximationtotheStudent'st-distributionwithdegreesoffreedomN-2.Forsamplesizesaboveabout20,thevariable

hasaStudent'st-distributioninthenullcase(zerocorrelation).Inthenon-nullcase(i.e.totestwhetheranobservedρissignificantlydifferentfromatheoreticalvalue,orwhethertwoobservedρsdiffersignificantly)testsaremuchlesspowerful,thoughthet-distributioncanagainbeused.

AgeneralizationoftheSpearmancoefficientisusefulinthesituationwheretherearethreeormoreconditions,anumberofsubjectsareallobservedineachofthem,andwepredictthattheobservationswillhaveaparticularorder.Forexample,anumberofsubjectsmighteachbegiventhreetrialsatthesametask,andwepredictthatperformancewillimprovefromtrialtotrial.AtestofthesignificanceofthetrendbetweenconditionsinthissituationwasdevelopedbyE.B.PageandisusuallyreferredtoasPage'strendtestfororderedalternatives.

CorrespondenceanalysisbasedonSpearman'srho

Classiccorrespondenceanalysisisastatisticalmethodwhichgivesascoretoeveryvalueoftwonominalvariables,inthiswaythatPearson'scorrelationcoefficientbetweenthemismaximized.

Thereexistsanequivalentofthismethod,calledgradecorrespondenceanalysis,whichmaximizesSpearman'srhoorKendall'stau[2].

Seealso

Statisticsportal

Kendalltaurankcorrelationcoefficient

Rankcorrelation

Chebyshev'ssuminequality,rearrangementinequality(ThesetwoarticlesmayshedlightonthemathematicalpropertiesofSpearman'sρ.)

Pearsonproduct-momentcorrelationcoefficient,asimilarcorrelationmethodthatinsteadreliesonthedatabeinglinearlycorrelated.

Notes

^abMyers,JeromeL.;ArnoldD.Well(2003).ResearchDesignandStatisticalAnalysis,secondedition,LawrenceErlbaum,p.508.ISBN0805840370.

^Kowalczyk,T.;PleszczyńskaE.,RulandF.(eds.)(2004).GradeModelsandMethodsforDataAnalysiswithApplicationsfortheAnalysisofDataPopulations,StudiesinFuzzinessandSoftComputingvol.151.BerlinHeidelbergNewYork:SpringerVerlag.ISBN9783540211204.

References
金融工程,数学算法,the,ranks

原文发布于宽客论坛,点击阅览原文
发布于 2024-01-30 22:01:24
收藏
分享
海报
81
目录

    推荐阅读