您好,欢迎来到宝玛科技网。
搜索
您的当前位置:首页Communication Architecture Tuners A Methodology for the Design of High-Performance Communic

Communication Architecture Tuners A Methodology for the Design of High-Performance Communic

来源:宝玛科技网
AcceptedforpresentationatDAC2000

CommunicationArchitectureTuners:AMethodologyfortheDesignofHigh-PerformanceCommunicationArchitecturesforSystem-on-Chips

KanishkaLahiri,AnandRaghunathan,GaneshLakshminarayana,andSujitDey

Dept.ofElectricalEngg.,Univ.ofCalifornia,SanDiego,CA

C&CResearchLabs,NECUSA,Princeton,NJ

Abstract

Inthispaper,wepresentageneralmethodologyforthedesignofcustomsystem-on-chipcommunicationarchitec-tures.Ourtechniqueisbasedontheadditionofalayerofcircuitry,calledtheCommunicationArchitectureTuner(CAT),aroundanyexistingcommunicationarchitecturetopology.Theaddedlayerrendersthesystemcapableofadaptingtothechangingcommunicationneedsofitsconstituentcomponents.Forexample,morecriticaldatamaybehandleddifferently,leadingtolowercommunicationlatencies.TheCATmonitorstheinternalstateof,andcommunicationtransactionsgeneratedby,eachsystemcomponent,and“predicts”therelativeimportanceofcom-municationtransactionsintermsoftheirimpactondifferentsystem-levelperformancemetrics.Itthenconfigurestheprotocolparametersoftheunderlyingcommunicationarchitecture(e.g.priorities,DMAmodes,etc.)tobestsuitthesystem’schangingcommunicationneeds.

WeillustrateissuesandtradeoffsinvolvedinthedesignofCAT-basedcommunicationarchitectures,andpresentalgorithmstoautomatethekeysteps.Experimentalresultsindicatethatperformancemetrics(e.g.numberofmisseddeadlines,averageprocessingtime)forsystemswithCAT-basedcommunicationarchitecturesaresignifi-cantly(sometimes,overanorderofmagnitude)betterthanthosewithconventionalcommunicationarchitectures.

1Introduction

TheevolutionoftheSystem-on-Chip(SOC)paradigminelectronicsystemdesignhasthepotentialtoofferthedesignerseveralbenefits,includingimprovementsinsystemcost,size,performance,powerdissipation,anddesignturn-around-time.Theabilitytorealizethispotentialdependsonhowwellthedesignerexploitsthecustomizabilityofferedbythesystem-on-chipapproach.Whileonedimensionofthiscustomizabilityismanifestedinthediversityandconfigureabilityofthecomponentsthatareusedtocomposethesystem(e.g.,processoranddomain-specificcores,peripherals,etc.),another,equallyimportant,aspectisthecustomizabilityofthesystemcommunicationarchitecture.Inordertosupporttheincreasingdiversityandvolumeofon-chipcommunicationrequirements,whilemeetingstringentperformanceconstraintsandpowerbudgets,communicationarchitecturesneedtobecustomizedtothetargetsystemorapplicationdomaininwhichtheyareused.1.1PaperOverviewandContributions

Inthispaper,wepresentageneralmethodologyforthedesignofcustomsystem-on-chipcommunicationarchi-tectures,whichareflexibleandcapableofadaptingtovaryingcommunicationneedsofthesystemcomponents.Ourtechniquecanbeusedtooptimizeanyunderlyingcommunicationarchitecturetopologybyrenderingitcapableofadaptingtothechangingcommunicationneedsofthecomponentsconnectedtoit.Forexample,morecriticaldatamaybehandleddifferently,leadingtolowercommunicationlatencies.Thisresultsinsignificantimprovementsinvariousqualityofservice(QoS)metrics,includingtheoverallsystemperformance,observedcommunicationbandwidthandbusutilization,andthesystem’sabilitytomeetcriticaldeadlines.Ourtechniqueisbasedontheadditionofalayerofcircuitry,calledtheCommunicationArchitectureTuner(CAT),toeachcomponent.TheCATmonitorsandanalyzestheinternalstateof,andcommunicationtransactionsgeneratedby,asystemcomponentand“predicts”therelativeimportanceofcommunicationtransactionsintermsoftheirimpactondifferentsystem-levelperformancemetrics.TheresultsoftheanalysisareusedbytheCATtoconfiguretheparametersoftheunderlyingcommunicationarchitecturetobestsuitthecomponent’schangingcommunicationneeds.

WemotivatetheneedforCAT-basedcommunicationarchitecturesbyanalyzingexamplesystemsandscenariosinwhichnostaticcustomizationoftheprotocolscancompletelysatisfythesystem’stime-varyingcommunicationre-quirements.WeillustratetheissuesandtradeoffsinvolvedinthedesignofCAT-basedcommunicationarchitectures,anddemonstratethatthehardwareimplementationcomplexityoftheCATneedstobeconsideredinordertomaxi-mallyexploitthepotentialforperformanceimprovements.WepresentageneralmethodologyandalgorithmsforthedesignofCAT-basedSOCcommunicationarchitectures.Givenasystemwithadefinedcommunicationarchitecturetopology,typicalinputtraces,andtargetperformancemetrics,ouralgorithmsdetermineoptimizedcommunicationprotocolsforthevariouschannels/busesinthesystem,andanefficienthardwareimplementationintheformofCATswhichareconnectedinbetweeneachcomponentandthecommunicationarchitecture.Experimentalresultsforseveralexamplesystems,includinganATMswitchportschedulerandaTCP/IPNetworkInterfaceCardsubsys-tem,indicatethatperformancemetrics(e.g.,numberofmisseddeadlines,averageoraggregateprocessingtime,etc.)forsystemswithCAT-basedcommunicationarchitecturesaresignificantly(sometimesoveranorderofmagnitude)betterthansystemswithwell-optimizedconventionalcommunicationarchitectures.Insummary:

CAT-basedcommunicationarchitecturescanextendthepowerofanyunderlyingcommunicationarchitecture.ThetimingbehaviorpresentedbyaCAT-basedcommunicationarchitecturetoeachcomponentconnected

1

toit(suchascommunicationlatencyandbandwidth)isbettercustomizedto,andvariesaccordingto,thecomponent’sneeds.Thisresultsinsignificantlyimprovedsystemperformance.

ThepresentedCATdesignmethodologytradesoffsophisticationofthecommunicationarchitectureprotocolswiththecomplexity(andhence,overheadincurredby)theaddedhardware.

Inseveralcases,theuseofCAT-basedcommunicationarchitecturescanresultinsystemsthatsignificantlyoutperformthosebasedonanystaticcustomizationoftheprotocolparameters.RelatedWork

1.2

Weexaminerelatedworkinthefieldsofsystem-leveldesign,HW/SWco-design,andnetworkingprotocols,andplaceourworkinthecontextofpreviousworkinthosefields.Therehasbeenalargebodyofworkonsystem-levelsynthesisofapplication-specificarchitecturesthroughHW/SWpartitioningandmappingoftheapplicationtasksontopre-designedcoresandapplication-specifichardware[1,2,3,4,5,6,7,8,9].WhilesomeofthesetechniquesattempttoconsidertheimpactofcommunicationeffectsduringHW/SWpartitioningandmapping,theyeitherassumeafixedcommunicationprotocol(e.g.,PCI-basedbuses),orselectfroma“communicationlibrary”ofafewalternativeprotocols.Researchonsystem-levelsynthesisofcommunicationarchitectures[10,11,12,13]mostlydealswithsynthesisofthecommunicationarchitecturetopology,whichreferstothemannerinwhichcomponentsarestructurallyconnectedthroughdedicatedlinksorsharedcommunicationchannels(buses).Whiletopologyselectionisacriticalstepincommunicationarchitecturedesign,equallyimportantisthedesignoftheprotocolsusedbythechannels/busesintheselectedtopology.Forexample,thenatureofcommunicationtrafficgeneratedbythesystemcomponentsmayfavortheuseofatime-slicebasedbusprotocol[14]insomecases,andastaticprioritybasedprotocol[15]inothers.TheVSIAllianceon-chipbusworkinggroup[15]hasrecognizedthatamultitudeofbusprotocolswillbeneededinordertoservethewiderangeofSOCcommunicationrequirements.Further,mostprotocolsofferthedesigneravenuesforcustomizationintheformofparameterssuchasarbitrationpriorities,transferblocksizes,etc.Choosingappropriatevaluesfortheseparameterscansignificantlyimpactthelatencyandtransferbandwidthassociatedwithinter-componentcommunication.Finally,thereisabodyofworkoninterfacesynthesis[16,17,18,19,20,21,22,23],whichdealswithautomaticallygeneratingefficienthardwareimplementationsforcomponent-to-busorcomponent-to-componentinterfaces.Thesetechniquesaddressissuesintheimplementationofspecifiedprotocols,andnotinthecustomizationoftheprotocolsthemselves.

Insummary,webelievethatpreviousworkinthefieldofsystem-leveldesignandHW/SWco-designdoesnotadequatelyaddresstheproblemofcustomizingtheprotocolsusedinSOCcommunicationarchitecturestotheneedsoftheapplication.Anothercharacteristicofpreviousresearchisthatthedesignofthecommunicationarchitectureisperformedstaticallyusinginformationabouttheapplicationanditsenvironment(e.g.,typicalinputtraces).Inseveralapplications,thecommunicationbandwidthrequiredbyeachcomponent,theamountofdataitneedstocommunicate,andtherelative“importance”ofeachcommunicationrequest,maybesubjecttosignificantdynamicvariations.Asshownlaterinthispaper,insuchsituations,protocolsusedinconventionalcommunicationarchitec-turesmaynotbecapableofadaptingtheunderlyingcommunicationtopologytomeetingtheapplication’svaryingneeds.

Inthefieldoftelecommunicationsandnetworkingprotocoldesign,asignificantbodyofresearchhasbeende-votedtothedesignofprotocolstomeetdiverseQoSparameterssuchasconnectionestablishmentdelayandfailure

2

probability,throughput,residualerrorratio,etc.[24].Sophisticatedtechniquessuchasflowandtrafficcontrolalgo-rithmshavebeenproposedinthatcontextforadaptingtheprotocoltoimprovetheabovementionedmetrics.Withincreasingcomplexity,system-on-chipcommunicationarchitectureswillneedtoevolvebydrawinguponsomeofthetechniquesthathavebeendevelopedinthecontextoftelecomnetworks.However,therearesignificantdiffer-encessuchasthelatencyrequirements,errortoleranceandresiliencerequirements,whichdifferentiatetheproblemweareaddressingfromtheproblemsencounteredintelecomnetworkprotocoldesign.

2CommunicationArchitectureTuners:IntroductionandDesignIssues

Inthissection,wefirstmotivatetheneedforCAT-basedcommunicationarchitecturesbyshowinghowthelimitedflexibilityofconventionalcommunicationarchitectures,andtheirinabilitytoadapttothevaryingcommunicationneedsofthesystemcomponents,canleadtosignificantdeteriorationinthesystem’sperformance.WethenintroduceCAT-basedcommunicationarchitecturesandshowhowtheyaddresstheabovementioneddrawbacks.Finally,wediscussthekeyissuesandtradeoffsinvolvedinaCAT-basedcommunicationarchitecturedesignmethodology.Example1:ConsidertheexamplesystemshowninFigure1thatrepresentspartoftheTCP/IPcommunicationspro-tocolusedinanetworkinterfacecard(wehenceforthrefertothissystemastheTCPsystem).ThesystemshowninFigure1performschecksum-basedencoding(foroutgoingpackets)anderrordetection(forincomingpackets),andinterfaceswiththeEthernetcontrollerperipheral(whichimplementsthephysicalandlinklayernetworkprotocols).SincepacketsintheTCPprotocoldonotcontainanynotionofqualityofservice(QoS)[24],wehaveenhancedthepacketdatastructuretocontainafieldintheheaderthatindicatesadeadlineforthepackettobeprocessed.Weassumethattheobjectiveduringtheimplementationofthesystemistominimizethenumberofpacketswithmisseddeadlines.

To/FromnetworkETHER_DRIVERPKT_QUEUEIP_CHECKCHECKSUMTo/fromapplicationlayer(a) SpecificationSharedMemoryProtocolParameters: Priority DMA_mode DMA_size … ...MIPSR3000IP_CHECKCHECKSUMBusinterfaceArbiter(b) ImplementationMemoryFigure1:TCPsystemfromanetworkinterfacecard:(a)Specificationand(b)Implementationutilizingaconven-tionalbus-basedcommunicationarchitecture

Figure1(a)showsthebehavioroftheTCPsystemasasetofconcurrentcommunicatingtasksorprocesses.WeexplainthetasksperformedbytheTCPsystemforapacketreceivedbythesystemfromthenetwork.Theprocessether

queuemaintainsaqueuecontainingselected

informationfromthepacketheaders.Processip

checksumretrievesthepacketfromthesharedmemoryandcomputesthechecksumvalueforeachpacketandreturnsthevaluetotheip

driverand

pkt

check

andchecksumprocessesareimplementedusingdedicatedhardware.Allcommunicationsbetweenthesystemcomponentsareimplementedusingasharedbus.TheprotocolusedinthesharedbussupportsstaticprioritybasedarbitrationandDMA-modetransfer.Thebusarbiterandthebusinterfacesofthecomponentstogetherimplement

thebusprotocol.ThebusprotocolallowsthesystemdesignertospecifyvaluesforvariousparameterssuchasthebusprioritiesandDMAblocksizeforeachcomponent,etc.

WeanalyzedtheperformanceoftheTCPsystemofFigure1forseveraldistinctvaluesofthebusprotocolparameters.Forthisexperiment,foreaseofexplanation,wevariedonlythebuspriorityvaluesforeachcomponent,withfixedvaluesfortheremainingprotocolparameters.Thesystemsimulationwasperformedusingtracesofpacketswithvaryinglaxitiesofdeadlines.AnabstractviewoftheexecutionoftheTCPsystemprocessingfourpackets(numbered

)isshowninFigure2.Thefigureindicatesthetimesatwhicheachpacket

,thedeadlinesareinadifferentorder

arrivesfromthenetwork,andthedeadlinebywhichitneedstobeprocessed.Notethatwhilethearrivaltimesofthepacketsareintheorderand

.Forthesake

ofourillustration,wefocusontwodifferentbuspriorityassignments(

).Whilewedonotexplicitlyconsiderotherpriorityassignmentshere,

itcanbeshownthattheargumentswepresentforoneoftheabovetwocaseswillholdforeveryotherpriorityassignment.

ThefirstwaveforminFigure2representstheexecutionofthesystemwhenthebuspriorityassignment

isused.Afterthecompletionofthe

requestsbusaccesstoprocesspacket

.Thiseffectively

,while

delaystheprocessingofpacket

until

isabletoprocessitwithoutwaitingforpacket

tocomplete.Thisresultsinthedeadlines

forbothpackets

andbeingmet.However,letusconsiderpackets

and

whosedeadlinesareinthe

sameorderastheirarrivaltimes.Afterprocess

forpacket

,andprocess

.Thisdelaystheexecutionof

process

ThedeficiencyofthecommunicationarchitecturethatleadstomisseddeadlinesintheTCPexamplecanbesum-marizedasfollows.Therelativeimportanceofthecommunicationtransactionsgeneratedbythevarioussystemcomponents(

,and

)variesdependingonthedeadlinesofthepacketstheyare

processing.Ingeneral,theimportanceorcriticalityofeachcommunicationtransactionmaydependonseveralfac-torswhichtogetherdeterminewhetherthecommunicationwillbeonthesystem’scriticalpath.Thecommunicationarchitectureneedstobeabletodiscernbetweenmorecriticalandlesscriticalcommunicationrequestsandservethemaccordingly.Asshowninthepreviousexample,conventionalcommunicationarchitecturessufferfromthefollowingdrawbacks:(i)thedegreeofcustomizabilityofferedmaybeinsufficientinsystemswithstringentperfor-mancerequirements,and(ii)theyaretypicallynotcapableofsensingandadaptingtothevaryingcommunicationneedsofthesystemandthevaryingnatureofthedatabeingcommunicated.

CAT-basedcommunicationarchitecturesaddresstheaboveproblemsthroughtheuseofahardwarelayerthatadaptstheunderlyingcommunicationarchitectureaccordingtothechangingneedsofthevariouscomponentscon-nectedtoit.WenextshowhowaCAT-basedcommunicationarchitecturecanbeusedtoimprovetheperformanceoftheTCPsystem.

Example2:ACAT-basedcommunicationarchitecturefortheTCPsystemisshowninFigure3(a).CATsareaddedtothecomponentsthatimplementthe

,and

processes.Further,thebus

controllogic(arbiterandcomponentbusinterfaces)isenhancedtofacilitatetheoperationoftheCATs.AmoredetailedviewofacomponentwithaCATisshowninFigure3(b).ThecomponentnotifiestheCATwhenitgeneratescommunicationrequests.TheCATalsoobservesselecteddetailsaboutthedatabeingcommunicatedandthecomponent’sinternalstate.

Inthisexample,theCATobservesthepacketsizeanddeadlinefieldsfromtheheaderofthepacketcurrentlybeingprocessedbythecomponent.TheCATperformsthefollowingfunctions:(i)itgroupscommunicationeventsbasedonthesizeanddeadlineofthepacketcurrentlybeingprocessed,and(ii)foreventsfromeachgroup,it

5

determinesanappropriateassignmentofvaluestothevariousprotocolparameters.Asaresult,thecharacteristicsofthecommunicationarchitecture(includingthetimerequiredtoperformacommunication)areadaptedaccordingtothedifferentneedsandrelativeimportanceofthecommunicationrequests.Therationalebehindusingthedeadlineisthatpacketswithcloserdeadlinesneedtobegivenhigherimportance.Therationalebehindusingthesizeofthepacketismorecomplex.Incaseswhenallthepacketsinthesystemhaveroughlyequaldeadlines,itisadvantageoustofavorthecompletionofpacketswhicharesmaller,sincetheyhaveabetterchanceofmeetingthedeadline.WeusedthetechniquespresentedlaterinthispapertoimplementtheCAT-basedTCPsystemarchitectureshowninFigure3.Fortheeaseofillustration,theCATswereusedtovaryonlythebuspriorities.AllotherparameterswerespecifiedtothesamevaluesthatasusedinthearchitectureofFigure1.TheCATgroupsthecommunicationrequestsgeneratedfromacomponentbasedonthepackettheybelongto,andthepriorityofallcommunicationrequestsassociatedwithapacketarecomputedusingtheformulapacketsize,deadline,andarrivaltime,respectively.

TheexecutionoftheoptimizedsystemisshowninFigure4.ThesamepacketsequencethatwasusedtoillustratetheinadequacyoftheconventionalcommunicationarchitectureinFigure2wasusedforthisexperiment.Thesystemmeetsthedeadlinesforallthepackets(recallthattheoriginalsystemarchitecturepresentedinFigure1misseddeadlinesforallpriorityassignments).Whenpacketassignstothecommunicationrequestsgeneratedby

where,

and

representthe

(whichhasatightdeadline)arrives,theCAT

meetingitsdeadline.

and

,whicharestillprocessingpacket.Thisleadstopacket

Whenpacket

arrives,however,thecommunicationrequestsgeneratedby

andtoprocesspackettocompletioninordertomeetitstightdeadline.

Pkt. iPkt. i+1arrivesarrivesPkt. i+1Pkt. ideadlinedeadlinePkt. jPkt. j+1arrivesarrivesPkt. jdeadlinePkt. j+1deadlineCAT-basedarch.󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀ii+1i󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀j󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀j+1All pkts meetdeadlinesether_driverip_checkchecksumFigure4:ExecutionoftheCAT-basedarchitecturefortheTCPsystem

Assigningappropriatevaluesforcommunicationprotocolparameters(suchasprioritiesandDMAsizes)tothecriticalevents,andtranslatingtheseresultsintoahigh-performanceimplementation.

Whileseveraltechniqueshavebeenproposedforsystem-levelperformanceanalysis[1,2]andcanbeusedforthefirststep,weuseananalysisofthesystemexecutiontracesasabasisforidentifyingcriticalcommunicationevents.Asignificantadvantageofusingexecutiontracesgeneratedthroughsystemsimulation,isthattheycanbederivedforanysystemforwhichasystem-levelsimulationmodelexists.Thegeneratedtracescanbeanalyzedtoexaminetheimpactofindividual(orgroupsof)communicationeventsonthesystem’sperformance.Communicationeventswhichareonthesystem“criticalpaths”,andwhosedelayssignificantlyimpactthespecifiedperformancemetricscanbeclassifiedascritical.ThedetailsofthetechniqueweusetoidentifycriticalcommunicationeventsareprovidedinSection3.

Sincethesystemexecutiontraceisspecifictotheinputtracesorstimuliused,thereisnosimplewaytocorrelatethecriticalcommunicationeventsinthesimulationtracetocriticalcommunicationeventsthatoccurwhilethesystemexecutes(possiblyunderdifferentstimuli).Forexample,consideracommunicationtracewherethetwentieth,twenty-first,andtwenty-seconddatatransfersafterthestartofsystemexecutionareshowntohaveastrongimpactonsystemperformance.Speedingupthesedatatransferswouldsignificantlyimprovesystemperformanceforthegiveninputtrace.Supposethatwewouldliketotranslatetheseinsightsintoanimprovedcommunicationprotocol.Clearly,anaivesystem,wherethetwentieth,twenty-first,andtwenty-seconddatatransfershaveahigh-priority,mightnotrealizeanyperformancegains,becausethesequenceofeventsthatoccursduringthesystem’sexecutioncoulddiffersignificantlyfromthatofthetrace.Inadditiontoidentifyingcriticalcommunicationevents,weneedtocorrelatetheiroccurrencetoothereasilydetectablepropertiesofthesystemstateanddataitisprocessing.Forexample,ifananalysisofthesimulationtracerevealsthattheoccurrenceofacriticaldata-transferishighlycorrelatedtoaspecificbranchbeingencounteredinthebehaviorofthecomponentexecutingthetransfer,theoccurrenceofthebranchmightbeusedasapredictorforthecriticalityofthedatatransfersgeneratedbythecomponent.Thefollowingexampleexaminessometradeoffsindesigningthesepredictors.

Example3:ConsiderthesystemshowninFigure5,whichisusedtoencryptdataforsecuritybeforetransmissionontoacommunicationsnetwork.Component1processesthedata,determinesthecodingandencryptionschemetobeused,andsendsthedatatoComponent2,whichencodesandencryptsthedatabeforesendingitthroughthesharedbustotheperipheralthattransmitsitontothenetwork.Figure6showsthedatatransfersoccurringonthesystembus.Theshadedellipses,marked

(

),representdatatransfersfromComponent2tothenetwork7

peripheral.LetussupposethatComponent2shouldtransferdataatafixedrate,andeachdatatransfershouldoccurbeforeadeadline(indicatedinFigure6byadottedline).AkeyperformancemetricforthesystemisthenumberofdatatransferscompletedbyComponent2inatimelymanner.Thecommunicationtraceindicatesthatdeadlinesarefrequentlynotmet.Analysisofthesystemexecutiontracealsoidentifiescommunicationeventsthatdidnotmeettheirdeadlines,e.g.,

and

.Inaddition,italsoidentifiescriticalcommunicationevents,i.e.,thosewhich,when

spedup,couldpotentiallyimprovesystemperformance.Sincethewaysofimprovingsystemperformance.Let

canoccuronlyafter

,speedingupimprove.

isoneof

denotethesetofall’ssuchthat

doesnotmeetthedeadline.

Theperformanceofthesystemcanimproveifthecommunicationtimesoftheeventsin

Havingisolatedthecriticalcommunicationeventsfromthesimulationtrace,weneedtodevelopschemestoidentifytheseelementsduringtheexecutionofthesystem.Asmentionedbefore,thisisdonebycorrelatingtheoccurrenceofcriticalcommunicationeventswithinformationaboutthesystemstateanddataitisprocessing.Inthisexample,letussupposethatwechoosetocorrelatecriticalcommunicationeventswiththecontrol-flowhistoryofthecomponentthatgeneratedthem.Wedefineacontrol-floweventasaBooleanvariablewhichassumesavalueofwhenacomponentexecutesaspecificoperation.Forexample,thebehaviorof

showninFigure5

function

isannotatedwithcontrol-flowevents

,

,

,and

.Ingeneral,if

,

,

,

arethecontrol-flowevents

whichareusedtodeterminewhetherornotacommunicationrequestiscritical,wecandefinea

whoseon-setdenotesthesetofcommunicationeventsclassifiedascritical.

Thenumberofcontrol-flowvariablesusedforthisclassificationhasaprofoundimpactontheclassificationofcommunicationevents.Agoodclassificationshouldhavethepropertiesofaone-to-onemap,i.e.everyeventclassifiedascriticalshouldindeedbecritical,andeverycriticaleventshouldbedetectedbytheclassification.Suppose,inthisexample,weareallowedtouseonlyonevariableforclassification.LetuschooseWefindthat,inallthecaseswheredeadlinesaremissed,event

asaclassifier.isusedasa

occurs.Basedonthisinsight,wemaychoose

.However,

oftenoccursalongwithnon-criticalcommunicationeventsaswell.If

classifier,only16%ofthecommunicationeventsclassifiedtobecriticalareindeedcritical.Therefore,suffer.

…..could

mis-classifyseveralcommunicationevents,andincorrectlyincreasetheirpriorities,causingsystemperformanceto

if (u) { // e1 packet = concat(packet,data);}if (…) { // e2…}if (security == high) { //e3packet->encryption_level = 0;} else if (security == medium) {packet->encryption_level = 1;} else {packet->encryption_level = low;}switch (channel_char.) //e4 case 1: packet->code = 1;break; case 2: packet->code = 2; break; default:packet->code = 3; break;…send_packet(x, encryption_unit);Component 1Encoding and encryptionComponent 2Figure5:Adataencryptionsystemthatillustratestradeoffsintheidentificationofcriticalcommunicationevents

8

x1y1x2y2x3y3x4y4x5x6y5Missed deadlinedeadlinesTimeFigure6:AtraceofbusactivityforthesystemshowninFigure5

Figures7(a)and(b)plotthepercentageofcriticalcommunicationeventsincoveredby

,andthepercentageof

,versusthenumberofvariablesthatperformtheclassification,respectively.Theaxisshows

thenumberofvariablesusedtoperformtheclassification.Forexample,thebestclassifierthatusestwovariablescaptures100%ofcriticalcommunicationevents,whileonly50%ofthecommunicationeventsclassifiedas“critical”byitareactuallycritical.Notethat,inthisexample,asthenumberofvariablesincreases,thepercentageofcriticalcommunicationeventsin

increases.Thisisbecause,asthenumberofvariablesincreases,theclassification

coveredasthe

criterionbecomesmorestringent,andnon-criticaleventsarelesslikelytopassthetest.However,simultaneously,criticaleventscouldbemissed,asshowninFigure7(b)(notethedecreaseinthepercentageof

numberofvariablesusedincreases).Therefore,oneneedstojudiciouslychoosetherightnumberofvariables,andtherightclassificationfunctionsinordertomaximallyimprovesystemperformance.Inthisexample,optimalresultsareobtainedbyusingthreevariables(

,

,and

)andaclassificationfunction

3MethodologyandAlgorithmsfortheDesignofCommunicationArchitectureTuners

Inthissection,wepresentastructuredmethodologyandautomationalgorithmsforthedesignofCAT-basedcommunicationarchitectures.Section3.1explainstheoverallmethodologyandoutlinesthedifferentstepsinvolved.Section3.2presentsthealgorithmsusedtoperformthecriticalstepsinmoredetail.

3.1AlgorithmandMethodology:Overview

Inthissection,wedescribeourtechniquesinthecontextofadesignflowwherethesystemisfirstpartitionedandmappedontovariouspre-designedcoresandapplication-specificlogic.Basedonthecommunicationandcon-nectivityrequirementsofthesystem,acommunicationarchitecturetopologyisselected.Theselectedtopologycan

100(e1,e2,e3,e4)(e1,e2,e3)100Percenotcf󰀀a󰀀seiswnf󰀀󰀀󰀀hichar󰀀ienS󰀀󰀀(e2,e3)(e1,e2)(e1,e3)e2,󰀀󰀀e3e123456PercenotScf󰀀a󰀀pturedb󰀀fy󰀀123456Length󰀀of󰀀history󰀀used󰀀to󰀀detect󰀀partitionsLength󰀀of󰀀history󰀀used󰀀to󰀀detect󰀀partitions(a)(b)

Figure7:Aplotofdifferentclassificationmetricswithrespecttothenumberofvariablesusedfortheclassification

9

CATtokensCOMPONENTComm.requestst2cnt < n1t1t1t2 + t3t3cnt < n3CPU1Co-Proc.MPEGdecoderCATcnt < n2CATDatapropertiesP1P2cnt < n4Memory1BridgeVideo Enc.InterfaceData &controlsignalsPrioritygeneratorPartition IDDMAsizegen.ProtocolParametersParam# nArbiter1App.SpecificlogicArbiter2Memory2CATBUS INTERFACETo communication architecture(a)(b)

Figure8:(a)AnexamplesystemwithaCAT-basedcommunicationarchitecture,and(b)detailedviewofacompo-nentwithaCAT

thenbeoptimizedusingtheproposedtechniques.Ouralgorithmtakesasinputsasimulateablepartitioned/mappedsystemdescription,theselectedcommunicationarchitecturetopology,typicalenvironmentstimulusorinputtraces,andobjectivesand/orconstraintsonperformancemetrics.Theperformancemetricscouldbespecifiedintermsoftheamountoftimetakentocompleteaspecificamountofwork(e.g.,aweightedoruniformaverageofprocessingtimes)orintermsofthenumberofoutputdeadlinesmetormissedforapplicationswithreal-timeconstraints.Theoutputofthealgorithmisasetofoptimizedcommunicationprotocolsforthetargetsystem.Fromahardwarepoint-of-view,thesystemisenhancedthroughtheadditionofCommunicationArchitectureTunerswherevernecessary,andthroughthemodificationofthecontrollers/arbitersforthevariouschannelsinthecommunicationarchitecture.AtypicalsystemwithaCAT-basedcommunicationarchitecturegeneratedusingourtechniquesisshowninFig-ure8(a).Thesystemcontainsseveralcomponents,includingaprocessorcore,memories,andperipherals.Theselectedcommunicationarchitecturetopologyisenclosedinthedottedboundary.Thetopologyselectedconsistsofdedicatedchannelsbetweencomponents(e.g.,betweentheprocessorandco-processor),aswellastwosharedbusesthatareconnectedbyabridge.TheportionsofthesystemthatareaddedormodifiedasaresultofourtechniqueareshownshadedinFigure8(a).Ourtechniquecanbeappliedtogeneralcommunicationarchitecturetopologiesthatcanbeexpressedasanarbitraryinterconnectednetworkofdedicatedandsharedchannels.

AmoredetailedviewofacomponentwithaCATisshowninFigure8(b).TheCATconsistsofa“partitiondetector”circuit,whichisshownasafinite-stateautomatoninthefigure,andparametergenerationcircuitsthatgeneratevaluesforthevariouscommunicationarchitectureprotocolparametersduringsystemexecution.Wenextdescribetheroleofthesecircuitsbriefly.

Partitiondetector:Welooselydefineacommunicationpartitionasasubsetofthecommunicationtransactionsgeneratedbythecomponentduringsystemexecution.Foreachcomponent,ouralgorithmidentifiesanumberof

10

TokensComm.transactionsPartitionRecognizerStateParameter(Priority)Comm.DelayS0S1S2S3S0242Figure9:SymbolicillustrationofCAT-optimizedcommunicationarchitectureexecution

partitions,andtheconditionsthatmustbesatisfiedbyacommunicationtransactionforittobeclassifiedundereachpartition.Theseconditionsareincorporatedintothepartitiondetectorcircuit.Thepartitiondetectorcircuitmonitorsandanalyzesthefollowinginformationgeneratedbythecomponent:

Tracertokensgeneratedbythecomponenttoindicatethatitisexecutingspecificoperations.ThecomponentisenhancedtogeneratethesetokenspurelyforthepurposeoftheCAT.

Thecommunicationtransactioninitiationrequeststhataregeneratedbythecomponent.

Anyotherapplication-specificpropertiesofthecommunicationdatabeinggeneratedbythecomponent(e.g.,fieldsinthedatawhichindicateitsrelativeimportance).

Thepartitiondetectorusesspecificsequencesoftracertokensandcommunicationrequeststoidentifythebeginningandendofasequenceofconsecutivecommunicationtransactionsthatbelongtoapartition.Forexample,theregularexpressions

and

maybeusedtodelineatecommunicationeventsthatbelongtopartition

.InSection3.2,wepresentgeneraltechniquestoautomaticallycomputethestartandendconditionsforeach

partition.

Parametergenerationcircuits:Thesecircuitscomputevaluesforcommunicationprotocolparameters(e.g.pri-orities,DMA/blocksizes,etc.)basedonthepartitionIDgeneratedbythepartitiondetectorcircuit,andotherapplication-specificdatapropertiesspecifiedbythesystemdesigner.Thevaluesoftheseparametersaresenttothearbitersandcontrollersinthecommunicationarchitecture,resultinginachangeinthecharacteristicsofthecommu-nicationarchitecture.AutomatictechniquestodesigntheparametergenerationcircuitsarepresentedinSection3.2.ThefunctioningofaCAT-basedcommunicationarchitectureisillustratedusingsymbolicwaveformsinFigure9.Thefirsttwowaveformsrepresenttracertokensgeneratedbythecomponent.Thenexttwowaveformsrepresentthecommunicationtransactionsgeneratedbythecomponent,andthestateofthepartitiondetectorcircuit,respectively.Thestateofthepartitiondetectorcircuitchangesfirstfrom

to

,andlaterfrom

to

,inreactiontothe

tracertokensgeneratedbythecomponent.Thefourthcommunicationtransactiongeneratedbythecomponentafter

Inputs:󰀀Partitioned/mapped󰀀system,Comm.󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀Arch.󰀀󰀀󰀀󰀀󰀀󰀀󰀀topology,󰀀Input󰀀󰀀󰀀󰀀󰀀󰀀traces,󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀󰀀Performance󰀀metricsAnalyze󰀀system,create󰀀Comm.Analysis󰀀GraphCAGPartitioncommunicationinstancesEvaluateclusterPartitions/statisticsclusters3Assignparametervalues󰀀to4clusters12Improved󰀀performance?Re-Analyze󰀀system,Re-computeperformance󰀀metrics5new󰀀comm.arch.󰀀protocolsSynthesize󰀀CATsto󰀀realize󰀀optimizedprotocols6Outputs:Optimized󰀀CAT-basedsystem󰀀communicationarchitectureSystem󰀀withFigure10:AlgorithmfordesigningCAT-basedcommunicationarchitectures

thepartitiondetectorreachesstate

causesittotransitionintostate

.Allcommunicationtransactionsthatoccur

whenthepartitiondetectorFSMisinstateareclassifiedasbelongingtopartition.Thefifthwaveform

showstheoutputoftheprioritygenerationcircuit.TheprioritygenerationcircuitassignsaprioritylevelofallcommunicationtransactionsthatbelongtopartitionofFigure9.

to

.Thisincreaseinpriorityleadstoadecreaseinthe

delayassociatedwiththecommunicationtransactionsthatbelongtopartition,asshowninthelastwaveform

TheoverallalgorithmfordesigningCAT-basedcommunicationarchitecturesisshowninFigure10.Instep1,performanceanalysisisperformedonthepartitioned/mappedsystemdescriptioninordertoderivetheinformationandstatisticsusedinthelatersteps.Inourwork,weusetheperformanceanalysistechniquepresentedin[25],whichiscomparableinaccuracytocompletesystemsimulation,whilebeingmuchmoreefficienttoemployinaniterativemanner.Theoutputofthisanalysisisacommunicationanalysisgraph,(CAG)whichisahighlycompactrepresentationofthesystem’sexecutionunderthegiveninputtraces.Theverticesinthegraphrepresentclustersofcomputationsandabstractcommunicationsperformedbythevariouscomponentsduringthesystemexecution.Theedgesinthegraphrepresenttheinter-dependenciesbetweenthevariouscomputationsandcommunications.Notethatsincethecommunicationanalysisgraphiseffectivelyunrolledintime,itisacyclic,andmaycontainseveraldistinctinstancesofasinglecomputationoperationorcommunicationfromthesystemspecification.Thecommunicationanalysisgraphisconstructedbyextractingnecessaryandsufficientinformationfromadetailedsystemexecutiontrace[25].TheCAGcanbeeasilyanalyzedtodeterminevariousperformancestatisticssuchassystemcriticalpath,averageprocessingtime,numberofmisseddeadlines,etc.

Instep2,wegroupthecommunicationverticesinthecommunicationanalysisgraphintoanumberofparti-tions.Themainrationalebehindthispartitioningisthateachofthepartitionsmayhavedistinctcommunicationrequirements,andhencemaypotentiallyrequireadifferentsetofvaluestobeassignedtotheparametersofthecommunicationprotocol(e.g.,priorities,DMAsizes,etc.)inordertooptimizesystemperformance.Notethatintheextremecase,eachcommunicationvertexinthecommunicationanalysisgraphcanbeassignedtoadistinctpartition.However,thishastwodisadvantages:(i)theareaanddelayoverheadincurredintheCATmaybecomeprohibitive,

12

and(ii)asillustratedinSection2,theuseofverysmallpartitionscanleadtoCAThardwarethatishighlysensitivetovariationsininputtraces.Weproposeanovelmetric,calledsensitivity,whichisusedtogroupcommunicationinstances(vertices)intopartitionsinSection3.2.1.Wealsopresenttechniquesthatenablethedesignertoselectanoptimalgranularityforthepartitions.

Step3evaluatesvariousstatisticsforeachcommunicationpartition,basedonwhich,step4determinesanassign-mentofcommunicationarchitectureparametervaluesforeachpartition.ThedetailsofthesestepsarepresentedinSection3.2.2.Theoutputofstep4isasetofcandidateprotocolsforthesystemcommunicationarchitecture.Step5re-evaluatesthesystemperformancefortheoptimizedprotocolsderivedinStep4.Ifaperformanceimprovementresults,steps1to5arerepeateduntilnofurtherperformanceimprovementisobtained.

Step6dealswithsynthesisofhardware(CATs)toimplementtheoptimizedprotocolsthatweredeterminedinstep4.AsillustratedinSection2,itiscriticaltoconsiderthehardwareimplementationcomplexityandoverheadsinordertofullyexploitthepotentialofCAT-basedcommunicationarchitectures.InSection3.2.3,weformulatetheproblemofgeneratingthepartitiondetectorandparametergenerationcircuitsasaproblemofgeneratingaminimum-complexityfunctiontofitasetofdatapoints,andoutlinehowitcanbeefficientlysolvedusingwell-knowntechniquesfromregressiontheory[26].3.2AlgorithmandMethodology:Details

Inthissectionwedescribethestepsoutlinedaboveinmoredetail.Wepresenttechniquestoobtainpartitionsofthecommunicationeventinstances,discusshowtoselectanoptimalsetofprotocolparametervaluesandhowtosynthesizeCAThardwareforclassifyingcommunicationeventinstancesintopartitions.3.2.1Profilingandpartitioningcommunicationeventinstances

Inthissection,wedescribethepartitioningstepofourmethodology(step2ofFigure10).Theobjectiveofthepartitioningstepistoidentifyandclusterintoasinglepartition,asetofcommunicationeventinstancesthatcanbetreatedbythecommunicationprotocolinauniformmanner.Forinstance,theprotocolcoulddefineallmembersofagivenpartitiontohavethesamepriorityforaccessingasharedbus.

Thecommunicationanalysisgraphgeneratedbystep1ofouralgorithmcontainssufficientinformationtomea-suretheperformanceofthesystemasafunctionofthedelaysofitscommunicationevents.Instep2,weperformananalysisoftheCAGtomeasuretheimpactofindividualcommunicationinstancedelaysonthesystemperfor-mance.Instanceswhichhaveasimilarimpactonthesystemperformancearegroupedintothesamepartition.Theperformanceimpactofaninstanceismeasuredbyaparametercalledpartitioningprocedure.

Figure11showsasectionofaCAGgeneratedfromarepresentativeexecutionofanexamplesystem.Shadedverticesvertex

thatcapturesthechangeinsys-

temperformancewhenthecommunicationdelayoftheinstancechanges.Thefollowingexampleillustratesour

through

representinstancesofcommunicationevents.Vertices

and

representthefinaloutputsof

thesystem.Theobjectivefunctiontobeminimizedisthequantity

whereisthefinishtimeofa

is

intheCAG.

Tomeasuretheofthesystemperformancetocommunicationinstance

,theexistingdelayof

perturbedbyavalue

,andatraversalofthetransitivefanoutofintheCAGisusedtore-computethestartandby

unitsdelaysthefinishofboth

and

finishtimesoftheaffectedvertices.Theupdatedfinishtimesoftheverticesareusedtocalculatethechangeinthesystemperformancemetric.Inthisexample,perturbingthedelayof

13

system󰀀execution󰀀time󰀀(clock󰀀cycles)c1systemc󰀀omponentsStart=16Finish=32󰀀c2Start=33Finish=44z1Start=45*=50Finish󰀀=󰀀10,󰀀󰀀󰀀󰀀O=󰀀t(z󰀀)󰀀20s(c1=󰀀s(c2=󰀀)󰀀10)󰀀10s(c3=󰀀)󰀀0s(c4=󰀀)󰀀1+t(z2)c3Start=4Finish=9Start=33Finish=34Start=10Finish=14c4z2c1c2c3c4Start=35Finish*=40󰀀CP1CP2CP3Figure11:SensitivitycalculationandpartitioninginstancesintheCAG

by10unitseach,whileperturbingthedelayofby10units.Since

delays

alone.Similarly,delaying

delaysthefinishtimeof

,which

doesn’tlieonacriticalpath,perturbingithasnoeffectonsystemperformance.

Usingtheproceduredescribedabove,wecalculateasensitivitymeasuresthechangeinthevalueoftheobjectivefunction

valuesshowninFigure11,

isassignedto

,

foreachcommunicationinstance

afterperturbingthedelayof

areassignedto

by

.Next,weassign

.

communicationinstancesthathavesimilarsensitivityvaluestothesamepartition.Inthisexample,basedonthe

and

,and

isassignedto

Asmentionedbefore,eventsinthesamepartitionaretreatedsimilarlybytheCAT.3.2.2ModifyingProtocolParameters

Inthissectionwedescribesteps3and4oftheoverallflow,i.e.,howtoexamineeachpartitionandthenassignoptimizedprotocolparametervaluestothem.Whileourdiscussionisconfinedtodeterminingtheprioritythatshouldbeassignedtoeachpartition,itcouldbeextendedtoincludeotherprotocolparameterssuchaswhetherburstmodeshouldbesupportedornot,andifsowhatthecorrectDMAsizeshouldbe.The

ofapartitionindicatestheimpactitseventshaveontheperformanceofthesystem.However

assigningprioritiesbasedonthesensitivityofapartitionalonemaynotleadtothebestassignment.Thisisbecausesensitivitydoesnotcapturetheindirecteffectsofacommunicationeventorsetofeventsonthedelaysofotherconcurrentcommunicationevents(sucheffectsoccurduetothepresenceofsharedchannels/busesinthecommuni-cationarchitecture).Weaccountforthisbyderivingametricthatpenalizespartitionswhicharelikelytonegativelyimpactthedelaysofcommunicationeventsinotherpartitions.Inordertoobtainthisinformation,weanalyzetheCAGandevaluate,foreachpairofpartitionsbelongtopartition

,theamountoftimeforwhichcommunicationeventsthat

)thatinstancesin

aredelayedduetoeventsfrom

and

.Table1showsexampledataforasystemwiththreepartitions.

Column2givesthesensitivityofeachpartition.Columns3,4and5givesthetotaltime(

,

waitforinstancesineachoftheotherpartitions.Forexample,instancesin

induce

atotalwaitof100cyclesforinstancesofthetotalwaitingtime(ofonly7cycleson

tofinish.Column6givesthesumofcolumns3,4and5toindicate

)eventsinpartition

haveintroducedinotherpartitions,e.g.,

inducesatotalwait

and.

Findingtheidealwaytocombinethesestatisticalparametersintoaformulathatproducestheoptimumpriorityassignmentisahardoptimizationproblemtosolve.Instead,weuseaheuristiccalculationthatboostsapartition’spriorityinawayproportionaltoitssensitivity,butpenalizesitforthewaitingtimes

itintroducesinother

14

Table1:StatisticsofthePartitionPartitionCP1CP2CP3Sensitivitys(ci)1008510wi1(clock󰀀cycles)040w󰀀i2(clock󰀀cycles)10007wi3󰀀(clock󰀀cycles)330Wi(clock󰀀cycles)10377Prioritymapping17.18=>󰀀󰀀223.57=>1󰀀-75.0=>󰀀3partitions.UsingthenotationofTable1,thepriorityofapartition

isdefinedas:

example,for

,

isthe

,

,and

.Figure12showstheactualclassificationofcommuni-whetheror

cationinstancesthatresultsfromeachofthethreeformulae.Figure13showsthepredictionaccuracyofeachoftheformulaeundertest.Itturnsoutthatnotagiveninstancebelongsto

performsthebest,predictingwithaprobabilityof

.

t1c&&󰀀󰀀cntc󰀀&&󰀀cnt2󰀀<󰀀3Partition1=“yes”c&&󰀀󰀀cnt32󰀀=󰀀S0t1t1=󰀀communication󰀀c󰀀󰀀instancet=󰀀tracer󰀀󰀀instance1󰀀counter󰀀󰀀for󰀀xcnt=󰀀1counter󰀀󰀀for󰀀pcnt=󰀀2S341󰀀=󰀀t1S1xc&&󰀀󰀀cnt41󰀀<󰀀Sinitial󰀀=󰀀󰀀󰀀󰀀0󰀀statet1accepting󰀀S=󰀀󰀀󰀀󰀀state3󰀀S2Figure14:FSMimplementationof

Eachformulainvolvesa

asastartingpointandacountonthenumberofoccurrencesofcommunication

.

eventsandhencecanbeexpressedasaregularexpression.Consequently,itcanbedirectlytranslatedtohardwareimplementationasaFiniteStatemachine(FSM).Figure14showstheFSMthatimplementsIngeneral,choosingtheappropriatetracertokensandappropriatevaluesfortechniquestosolveit.

AdatasetisconstructedfromtheCAGforeachexaminedor

and

maynotbeatrivialtask.

Weformulatetheproblemintermsofawell-knownproblemfromregressiontheory,anduseknownstatistical

consistingof

,anda

valueforeach

(derivedfromthepartitionedCAG)indicatingwhetherornotthecommunicationinstanceat

distancefromthetracertokenbelongstoapartition

.Theregressionfunction

isdefinedasfollows:

Whenand

is1,itindicatestheinstanceatadistance

belongsto

.Anassignment

(where

)isrequiredthatcausestheleastsquareerror

,where

isthevaluefromthedata

set,and

istheprediction.Sincetheregressionfunctionisnon-linearin,nouniversaltechniqueisknown

tocomputeanexplicitsolution.However,severalheuristicsanditerativeproceduresexistwhichmaybeused[26].Notethat,theregressionfunctioncouldingeneralbeconstructedtoutilizeadditionaldesigner-specifiedparam-eters,suchaspartialinternalstatefromthecomponent,andpropertiesofthedatabeingprocessedbythesystem(e.g.,aQoSstamporadeadlinevalue).

4ExperimentalResults

Inthissectionwepresentresultsoftheapplicationofourtechniquestoseveralexamplesystems,includingaTCP/IPnetworkinterfacecardsystem,andthepacketforwardingunitofanoutput-queuedATMswitch.Wepresentperformanceresultsbasedonsystem-levelco-simulationforeachexample.

ThefirstexampleistheTCPsystemdescribedinSection2.ThesecondexampleisapacketforwardingunitofanoutputqueuedATMswitch(showninFigure15).Thesystemconsistsof

outputports,eachwithadedicated

smalllocalmemorythatstoresqueuedpacketaddresses.Thearrivingpacketbitsarewrittentoadual-portedshared

16

memory.Thestartingaddressofeachpacketiswrittentoanappropriateoutputqueuebyascheduler.Eachportpollsitsqueuetodetectpresenceofapacket.Ifitisnon-empty,theportissuesa

signaltoitslocalmemory,

extractstherelevantpacketfromthedual-portedsharedmemoryandsendsitontoitsoutputlink.

Thenextexample,SYS,isafourcomponentsystem(showninFigure16)whereeachcomponentissuesindepen-dentconcurrentrequestsforaccesstoasharedmemory.Figure17showsBRDG,anothersystemconsistingoffourcomponents,twomemoriesandtwobusesconnectedbyabridge.Thecomponentsthemselvesareeachconnectedtooneofthebuses,butcanmakerequeststothelocalbusarbiterforaccesstotheremotememoryviathebridge.Also,thecomponentssynchronizewitheachotherviadedicatedlinks.

Table2demonstratestheperformancebenefitsofusingCAT-basedcommunicationarchitecturesoverastaticprioritybasedcommunicationprotocol[15].Eachrowinthetablerepresentsoneoftheexamplesystemsdescribedearlier.Foreachsystem,column2definesaperformancemetric.InthecaseofTCP,SYSandATMthesearederivedfromasetofdeadlinesthatareassociatedwitheachpieceofdatathatpassesthroughthesystem.Theobjectiveineachcaseistominimizethenumberofmisseddeadlinesfortheseexamples.InthecaseofBRDG,eachdatatransactionisassignedaweight.Theperformanceofthesystemisexpressedasaweightedmeanoftheprocessingtimeofeachtransaction.Theobjectiveinthiscaseistominimizethisweightedaverageprocessingtime.ThestaticcommunicationprotocolconsistsofafixedDMAsizeforeachcommunicationrequestandastaticprioritybasedbusarbitrationscheme.Fortheseexamples,theCATsschemeforidentifyingpartitionsandassigningprioritiesandDMAsizesmakesuseofuserspecifiedinformationsuchastheweightsoneachrequestsanddeadlinesasdescribedinSection3toprovideforamoreflexiblecommunicationprotocol.

Foreachsystem,column4reportsperformanceresultsobtainedusingastaticcommunicationprotocol,whilecolumn5reportsresultsgeneratedbysimulatingaCATsbasedarchitecture.Speed-upsarereportedincolumn6.TheresultsindicatethatsignificantbenefitsinperformancecanbeobtainedbyusingaCATsbasedarchitectureoveraprotocolusingfixedparametervalues.InthecaseofTCP/IP,thenumberofmisseddeadlineswasreducedtozero,whileinthecaseofSYS,aobserved.

ThedesignofanefficientCAT-basedcommunicationarchitecturedependsontheselectionofagoodrepresen-tativetracewhenperformingthevariousstepsofthealgorithmofFigure10.However,ouralgorithmsattemptto

Network󰀀Outperformanceimprovement(reductioninthenumberofmisseddeadlines)was

Celal󰀀ddressesport2BusI/󰀀FCATNetwork󰀀InQueue1port1ATM󰀀cellSchedulerBusI/󰀀FCATQueue3port3BusI/󰀀FCATQueue2ArbiterDual-portedMemory󰀀Cell󰀀bitsFigure15:OutputqueuedATMswitch

17

BusI/󰀀FCATQueue4port4Comp1CATBus󰀀I/FComp2CATBus󰀀I/FComp3CATBus󰀀I/FComp4CATBus󰀀I/FComp1CATBus1󰀀I/FComp2CATBus2󰀀I/FComp3CATBus1󰀀I/FComp4CATBus2󰀀I/FARBITER1Memory1Bus1BridgeBus2Figure16:ExamplesystemSYSwith

concurrentbusaccess

ARBITER1MEMORY1ARBITER2MEMORY2Figure17:ExamplesystemBRDGwithmultiplebuses

Table2:PerformanceofsystemsusingCATbasedarchitecturesExampleSystemTCP/IPSYSATMBRDGPerformancemetricmisseddeadlinesmisseddeadlinesmisseddeadlinesavg.󰀀executiontime󰀀(cycles)Input󰀀TraceInformation20󰀀packets573transactions169󰀀packets10,000󰀀clockcyclesStaticProtocol1041340304.72CATs󰀀basedarchitecture01716254.1Performanceimprovement-24.32.51.2generatecommunicationarchitecturesthatarenotspecifictotheinputtracesusedtodesignthem,butdisplayim-provedperformanceoverawiderangeofcommunicationtraces.InordertoanalyzetheinputtracesensitivityoftheperformanceimprovementsobtainedthroughCAT-basedcommunicationarchitectures,weperformedthefollowingadditionalexperiment.FortheSYSexample,wesimulatedthesystemwithCAT-basedandconventionalcommu-nicationarchitecturesforthreedifferentinputtracesthathadwidelyvaryingcharacteristics.Table3presentstheresultsofourexperiments.Theparametersoftheinputtraceswerechosenatrandomtosimulaterun-timeunpre-dictability.Inallthecases,thesystemwithaCAT-basedcommunicationarchitecturedemonstratedaconsistentandsignificantimprovementoverthesystembasedonaconventionalcommunicationarchitecture.ThisdemonstratesthattheperformanceofCAT-basedarchitecturesarenotoverlysensitivetovariationsintheinputstimuli,sincetheyarecapableofadaptingtothechangingneedsofthesystem.

Table3:ImmunityofCATbasedarchitecturestovariationininputsInputs󰀀to󰀀theSYS󰀀exampleTrace󰀀1Trace󰀀2Trace󰀀3Input󰀀TraceInformation848󰀀transactions573󰀀transactions1070󰀀transactionsStatic󰀀Protocol318413316CATs󰀀basedarchitecture1611738Performanceimprovement1.9824.38.3718

5ConclusionsandFutureWork

Thispaperpresentedageneralmethodologyforthedesignofcustomsystem-on-chipcommunicationarchitec-tures,basedontheadditionofalayerofcircuitry,calledtheCommunicationArchitectureTuner(CAT),aroundanyexistingcommunicationarchitecturetopology.Theaddedlayerrendersthesystemcapableofadaptingtothechangingcommunicationneedsofitsconstituentcomponents.WeillustratedissuesandtradeoffsinvolvedinthedesignofCAT-basedcommunicationarchitectures,andpresentedalgorithmstoautomatethekeysteps.Experimen-talresultsindicatethatperformancemetrics(e.g.numberofmisseddeadlines,averageprocessingtime)forsystemswithCAT-basedcommunicationarchitecturesaresignificantly(sometimes,overanorderofmagnitude)betterthanthosewithconventionalcommunicationarchitectures.

References

[1]D.D.Gajski,F.Vahid,S.NarayanandJ.Gong,SpecificationandDesignofEmbeddedSystems.PrenticeHall,1994.[2]G.DeMicheli,SynthesisandOptimizationofDigitalCircuits.McGraw-Hill,NewYork,NY,1994.

[3]R.Ernst,J.Henkel,andT.Benner,“Hardware-softwarecosynthesisformicrocontrollers,”IEEEDesign&TestMagazine,pp.–75,

Dec.1993.

[4]T.B.Ismail,M.Abid,andM.Jerraya,“COSMOS:Acodesignappraochforacommunicatingsystem,”inProc.IEEEInternational

WorkshoponHardware/SoftwareCodesign,pp.17–24,1994.

[5]A.KalavadeandE.Lee,“Agloballycritical/locallyphasedrivenalgorithmfortheconstrainedhardwaresowftwarepartitioningproblem

,”inProc.IEEEInternationalWorkshoponHardware/SoftwareCodesign,pp.42–48,1994.

[6]P.H.Chou,R.B.Ortega,andG.B.Borriello,“TheCHINOOKhardware/softwarecosynthesissystem,”inProc.Int.Symp.System

LevelSynthesis,pp.22–27,1995.

[7]B.Lin,“Asystemdesignmethodolgyforsoftware/hardwarecodevelopmentoftelecommunicationnetworkapplications,”inProc.

DesignAutomationConf.,pp.672–677,1996.

[8]B.P.Dave,G.Lakshminarayana,andN.K.Jha,“COSYN:hardware-softwarecosynthesisofembeddedsystems,”inProc.Design

AutomationConf.,pp.703–708,1997.

[9]P.KnudsenandJ.Madsen,“Integratingcommunicationprotocolselectionwithpartitioninginhardware/softwarecodesign,”inProc.

Int.Symp.SystemLevelSynthesis,pp.111–116,Dec.1998.

[10]T.YenandW.Wolf,“Communicationsynthesisfordistributedembeddedsystems,”inProc.Int.Conf.Computer-AidedDesign,

pp.288–294,Nov.1995.

[11]J.Daveau,T.B.Ismail,andA.A.Jerraya,“Synthesisofsystem-levelcommunicationbyanallocationbasedapproach,”inProc.Int.

Symp.SystemLevelSynthesis,pp.150–155,Sept.1995.

[12]M.GasteierandM.Glesner,“Bus-basedcommunicationsynthesisonsystemlevel,”inACMTrans.DesignAutomationElectronic

Systems,pp.1–11,Jan.1999.

[13]R.B.OrtegaandG.Borriello,“Communicationsynthesisfordistributedembeddedsystems,”inProc.Int.Conf.Computer-Aided

Design,pp.437–444,1998.

[14]“SonicsIntegrationArchitecture,SonicsInc.(http://www.sonicsinc.com/).”.

[15]On-ChipBusDevelopmentWorkingGroupSpecification1Version1.1.0.VSIAlliance,Aug.1998.

[16]G.BorrielloandR.H.Katz,“Synthesisandoptimizationofinterfacetransducerlogic,”inProc.Int.Conf.ComputerDesign,Nov.1987.[17]J.S.SunandR.W.Brodersen,“Designofsysteminterfacemodules,”inProc.Int.Conf.Computer-AidedDesign,pp.478–481,Nov.

1992.

[18]P.GutberletandW.Rosenstiel,“Specificationofinterfacecomponentsforsynchronousdatapaths,”inProc.Int.Symp.SystemLevel

Synthesis,pp.134–139,1994.

[19]S.NarayananandD.D.Gajski,“Interfacingincompatibleprotocolsusinginterfaceprocessgeneration,”inProc.DesignAutomation

Conf.,pp.468–473,June1995.

[20]P.Chou,R.B.Ortega,andG.Borriello,“Interfaceco-synthesistechniquesforembeddedsystems,”inProc.Int.Conf.Computer-Aided

Design,pp.280–287,Nov.1995.

[21]J.Oberg,A.Kumar,andA.Hemani,“Grammar-basedhardwaresynthesisofdatacommunicationprotocols,”inProc.Int.Symp.System

LevelSynthesis,pp.14–19,1996.

[22]R.Passerone,J.A.Rowson,andA.Sangiovanni-Vincentelli,“Automaticsynthesisofinterfacesbetweenincompatibleprotocols,”in

Proc.DesignAutomationConf.,pp.8–13,June1998.

[23]J.SmithandG.DeMicheli,“Automatedcompositionofhardwarecomponents,”inProc.DesignAutomationConf.,pp.14–19,June

1998.

[24]A.S.Tanenbaum,ComputerNetworks.EnglewoodCliffs,N.J.:PrenticeHall,19.

[25]K.Lahiri,A.RaghunathanandS.Dey,“FastPerformanceAnanlysisofBusBasedSystem-on-ChipCommunicationArchitectures,”in

Proc.Int.Conf.Computer-AidedDesign,Nov.1999.

[26]G.A.F.Seber,C.J.Wild.,Non-linearRegression.Wiley,NewYork,19.

19

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- baomayou.com 版权所有 赣ICP备2024042794号-6

违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务