CommunicationArchitectureTuners:AMethodologyfortheDesignofHigh-PerformanceCommunicationArchitecturesforSystem-on-Chips
KanishkaLahiri,AnandRaghunathan,GaneshLakshminarayana,andSujitDey
Dept.ofElectricalEngg.,Univ.ofCalifornia,SanDiego,CA
C&CResearchLabs,NECUSA,Princeton,NJ
Abstract
Inthispaper,wepresentageneralmethodologyforthedesignofcustomsystem-on-chipcommunicationarchitec-tures.Ourtechniqueisbasedontheadditionofalayerofcircuitry,calledtheCommunicationArchitectureTuner(CAT),aroundanyexistingcommunicationarchitecturetopology.Theaddedlayerrendersthesystemcapableofadaptingtothechangingcommunicationneedsofitsconstituentcomponents.Forexample,morecriticaldatamaybehandleddifferently,leadingtolowercommunicationlatencies.TheCATmonitorstheinternalstateof,andcommunicationtransactionsgeneratedby,eachsystemcomponent,and“predicts”therelativeimportanceofcom-municationtransactionsintermsoftheirimpactondifferentsystem-levelperformancemetrics.Itthenconfigurestheprotocolparametersoftheunderlyingcommunicationarchitecture(e.g.priorities,DMAmodes,etc.)tobestsuitthesystem’schangingcommunicationneeds.
WeillustrateissuesandtradeoffsinvolvedinthedesignofCAT-basedcommunicationarchitectures,andpresentalgorithmstoautomatethekeysteps.Experimentalresultsindicatethatperformancemetrics(e.g.numberofmisseddeadlines,averageprocessingtime)forsystemswithCAT-basedcommunicationarchitecturesaresignifi-cantly(sometimes,overanorderofmagnitude)betterthanthosewithconventionalcommunicationarchitectures.
1Introduction
TheevolutionoftheSystem-on-Chip(SOC)paradigminelectronicsystemdesignhasthepotentialtoofferthedesignerseveralbenefits,includingimprovementsinsystemcost,size,performance,powerdissipation,anddesignturn-around-time.Theabilitytorealizethispotentialdependsonhowwellthedesignerexploitsthecustomizabilityofferedbythesystem-on-chipapproach.Whileonedimensionofthiscustomizabilityismanifestedinthediversityandconfigureabilityofthecomponentsthatareusedtocomposethesystem(e.g.,processoranddomain-specificcores,peripherals,etc.),another,equallyimportant,aspectisthecustomizabilityofthesystemcommunicationarchitecture.Inordertosupporttheincreasingdiversityandvolumeofon-chipcommunicationrequirements,whilemeetingstringentperformanceconstraintsandpowerbudgets,communicationarchitecturesneedtobecustomizedtothetargetsystemorapplicationdomaininwhichtheyareused.1.1PaperOverviewandContributions
Inthispaper,wepresentageneralmethodologyforthedesignofcustomsystem-on-chipcommunicationarchi-tectures,whichareflexibleandcapableofadaptingtovaryingcommunicationneedsofthesystemcomponents.Ourtechniquecanbeusedtooptimizeanyunderlyingcommunicationarchitecturetopologybyrenderingitcapableofadaptingtothechangingcommunicationneedsofthecomponentsconnectedtoit.Forexample,morecriticaldatamaybehandleddifferently,leadingtolowercommunicationlatencies.Thisresultsinsignificantimprovementsinvariousqualityofservice(QoS)metrics,includingtheoverallsystemperformance,observedcommunicationbandwidthandbusutilization,andthesystem’sabilitytomeetcriticaldeadlines.Ourtechniqueisbasedontheadditionofalayerofcircuitry,calledtheCommunicationArchitectureTuner(CAT),toeachcomponent.TheCATmonitorsandanalyzestheinternalstateof,andcommunicationtransactionsgeneratedby,asystemcomponentand“predicts”therelativeimportanceofcommunicationtransactionsintermsoftheirimpactondifferentsystem-levelperformancemetrics.TheresultsoftheanalysisareusedbytheCATtoconfiguretheparametersoftheunderlyingcommunicationarchitecturetobestsuitthecomponent’schangingcommunicationneeds.
WemotivatetheneedforCAT-basedcommunicationarchitecturesbyanalyzingexamplesystemsandscenariosinwhichnostaticcustomizationoftheprotocolscancompletelysatisfythesystem’stime-varyingcommunicationre-quirements.WeillustratetheissuesandtradeoffsinvolvedinthedesignofCAT-basedcommunicationarchitectures,anddemonstratethatthehardwareimplementationcomplexityoftheCATneedstobeconsideredinordertomaxi-mallyexploitthepotentialforperformanceimprovements.WepresentageneralmethodologyandalgorithmsforthedesignofCAT-basedSOCcommunicationarchitectures.Givenasystemwithadefinedcommunicationarchitecturetopology,typicalinputtraces,andtargetperformancemetrics,ouralgorithmsdetermineoptimizedcommunicationprotocolsforthevariouschannels/busesinthesystem,andanefficienthardwareimplementationintheformofCATswhichareconnectedinbetweeneachcomponentandthecommunicationarchitecture.Experimentalresultsforseveralexamplesystems,includinganATMswitchportschedulerandaTCP/IPNetworkInterfaceCardsubsys-tem,indicatethatperformancemetrics(e.g.,numberofmisseddeadlines,averageoraggregateprocessingtime,etc.)forsystemswithCAT-basedcommunicationarchitecturesaresignificantly(sometimesoveranorderofmagnitude)betterthansystemswithwell-optimizedconventionalcommunicationarchitectures.Insummary:
CAT-basedcommunicationarchitecturescanextendthepowerofanyunderlyingcommunicationarchitecture.ThetimingbehaviorpresentedbyaCAT-basedcommunicationarchitecturetoeachcomponentconnected
1
toit(suchascommunicationlatencyandbandwidth)isbettercustomizedto,andvariesaccordingto,thecomponent’sneeds.Thisresultsinsignificantlyimprovedsystemperformance.
ThepresentedCATdesignmethodologytradesoffsophisticationofthecommunicationarchitectureprotocolswiththecomplexity(andhence,overheadincurredby)theaddedhardware.
Inseveralcases,theuseofCAT-basedcommunicationarchitecturescanresultinsystemsthatsignificantlyoutperformthosebasedonanystaticcustomizationoftheprotocolparameters.RelatedWork
1.2
Weexaminerelatedworkinthefieldsofsystem-leveldesign,HW/SWco-design,andnetworkingprotocols,andplaceourworkinthecontextofpreviousworkinthosefields.Therehasbeenalargebodyofworkonsystem-levelsynthesisofapplication-specificarchitecturesthroughHW/SWpartitioningandmappingoftheapplicationtasksontopre-designedcoresandapplication-specifichardware[1,2,3,4,5,6,7,8,9].WhilesomeofthesetechniquesattempttoconsidertheimpactofcommunicationeffectsduringHW/SWpartitioningandmapping,theyeitherassumeafixedcommunicationprotocol(e.g.,PCI-basedbuses),orselectfroma“communicationlibrary”ofafewalternativeprotocols.Researchonsystem-levelsynthesisofcommunicationarchitectures[10,11,12,13]mostlydealswithsynthesisofthecommunicationarchitecturetopology,whichreferstothemannerinwhichcomponentsarestructurallyconnectedthroughdedicatedlinksorsharedcommunicationchannels(buses).Whiletopologyselectionisacriticalstepincommunicationarchitecturedesign,equallyimportantisthedesignoftheprotocolsusedbythechannels/busesintheselectedtopology.Forexample,thenatureofcommunicationtrafficgeneratedbythesystemcomponentsmayfavortheuseofatime-slicebasedbusprotocol[14]insomecases,andastaticprioritybasedprotocol[15]inothers.TheVSIAllianceon-chipbusworkinggroup[15]hasrecognizedthatamultitudeofbusprotocolswillbeneededinordertoservethewiderangeofSOCcommunicationrequirements.Further,mostprotocolsofferthedesigneravenuesforcustomizationintheformofparameterssuchasarbitrationpriorities,transferblocksizes,etc.Choosingappropriatevaluesfortheseparameterscansignificantlyimpactthelatencyandtransferbandwidthassociatedwithinter-componentcommunication.Finally,thereisabodyofworkoninterfacesynthesis[16,17,18,19,20,21,22,23],whichdealswithautomaticallygeneratingefficienthardwareimplementationsforcomponent-to-busorcomponent-to-componentinterfaces.Thesetechniquesaddressissuesintheimplementationofspecifiedprotocols,andnotinthecustomizationoftheprotocolsthemselves.
Insummary,webelievethatpreviousworkinthefieldofsystem-leveldesignandHW/SWco-designdoesnotadequatelyaddresstheproblemofcustomizingtheprotocolsusedinSOCcommunicationarchitecturestotheneedsoftheapplication.Anothercharacteristicofpreviousresearchisthatthedesignofthecommunicationarchitectureisperformedstaticallyusinginformationabouttheapplicationanditsenvironment(e.g.,typicalinputtraces).Inseveralapplications,thecommunicationbandwidthrequiredbyeachcomponent,theamountofdataitneedstocommunicate,andtherelative“importance”ofeachcommunicationrequest,maybesubjecttosignificantdynamicvariations.Asshownlaterinthispaper,insuchsituations,protocolsusedinconventionalcommunicationarchitec-turesmaynotbecapableofadaptingtheunderlyingcommunicationtopologytomeetingtheapplication’svaryingneeds.
Inthefieldoftelecommunicationsandnetworkingprotocoldesign,asignificantbodyofresearchhasbeende-votedtothedesignofprotocolstomeetdiverseQoSparameterssuchasconnectionestablishmentdelayandfailure
2
probability,throughput,residualerrorratio,etc.[24].Sophisticatedtechniquessuchasflowandtrafficcontrolalgo-rithmshavebeenproposedinthatcontextforadaptingtheprotocoltoimprovetheabovementionedmetrics.Withincreasingcomplexity,system-on-chipcommunicationarchitectureswillneedtoevolvebydrawinguponsomeofthetechniquesthathavebeendevelopedinthecontextoftelecomnetworks.However,therearesignificantdiffer-encessuchasthelatencyrequirements,errortoleranceandresiliencerequirements,whichdifferentiatetheproblemweareaddressingfromtheproblemsencounteredintelecomnetworkprotocoldesign.
2CommunicationArchitectureTuners:IntroductionandDesignIssues
Inthissection,wefirstmotivatetheneedforCAT-basedcommunicationarchitecturesbyshowinghowthelimitedflexibilityofconventionalcommunicationarchitectures,andtheirinabilitytoadapttothevaryingcommunicationneedsofthesystemcomponents,canleadtosignificantdeteriorationinthesystem’sperformance.WethenintroduceCAT-basedcommunicationarchitecturesandshowhowtheyaddresstheabovementioneddrawbacks.Finally,wediscussthekeyissuesandtradeoffsinvolvedinaCAT-basedcommunicationarchitecturedesignmethodology.Example1:ConsidertheexamplesystemshowninFigure1thatrepresentspartoftheTCP/IPcommunicationspro-tocolusedinanetworkinterfacecard(wehenceforthrefertothissystemastheTCPsystem).ThesystemshowninFigure1performschecksum-basedencoding(foroutgoingpackets)anderrordetection(forincomingpackets),andinterfaceswiththeEthernetcontrollerperipheral(whichimplementsthephysicalandlinklayernetworkprotocols).SincepacketsintheTCPprotocoldonotcontainanynotionofqualityofservice(QoS)[24],wehaveenhancedthepacketdatastructuretocontainafieldintheheaderthatindicatesadeadlineforthepackettobeprocessed.Weassumethattheobjectiveduringtheimplementationofthesystemistominimizethenumberofpacketswithmisseddeadlines.
To/FromnetworkETHER_DRIVERPKT_QUEUEIP_CHECKCHECKSUMTo/fromapplicationlayer(a) SpecificationSharedMemoryProtocolParameters: Priority DMA_mode DMA_size … ...MIPSR3000IP_CHECKCHECKSUMBusinterfaceArbiter(b) ImplementationMemoryFigure1:TCPsystemfromanetworkinterfacecard:(a)Specificationand(b)Implementationutilizingaconven-tionalbus-basedcommunicationarchitecture
Figure1(a)showsthebehavioroftheTCPsystemasasetofconcurrentcommunicatingtasksorprocesses.WeexplainthetasksperformedbytheTCPsystemforapacketreceivedbythesystemfromthenetwork.Theprocessether
queuemaintainsaqueuecontainingselected
informationfromthepacketheaders.Processip
checksumretrievesthepacketfromthesharedmemoryandcomputesthechecksumvalueforeachpacketandreturnsthevaluetotheip
driverand
pkt
check
andchecksumprocessesareimplementedusingdedicatedhardware.Allcommunicationsbetweenthesystemcomponentsareimplementedusingasharedbus.TheprotocolusedinthesharedbussupportsstaticprioritybasedarbitrationandDMA-modetransfer.Thebusarbiterandthebusinterfacesofthecomponentstogetherimplement
thebusprotocol.ThebusprotocolallowsthesystemdesignertospecifyvaluesforvariousparameterssuchasthebusprioritiesandDMAblocksizeforeachcomponent,etc.
WeanalyzedtheperformanceoftheTCPsystemofFigure1forseveraldistinctvaluesofthebusprotocolparameters.Forthisexperiment,foreaseofexplanation,wevariedonlythebuspriorityvaluesforeachcomponent,withfixedvaluesfortheremainingprotocolparameters.Thesystemsimulationwasperformedusingtracesofpacketswithvaryinglaxitiesofdeadlines.AnabstractviewoftheexecutionoftheTCPsystemprocessingfourpackets(numbered
)isshowninFigure2.Thefigureindicatesthetimesatwhicheachpacket
,thedeadlinesareinadifferentorder
arrivesfromthenetwork,andthedeadlinebywhichitneedstobeprocessed.Notethatwhilethearrivaltimesofthepacketsareintheorderand
.Forthesake
ofourillustration,wefocusontwodifferentbuspriorityassignments(
).Whilewedonotexplicitlyconsiderotherpriorityassignmentshere,
itcanbeshownthattheargumentswepresentforoneoftheabovetwocaseswillholdforeveryotherpriorityassignment.
ThefirstwaveforminFigure2representstheexecutionofthesystemwhenthebuspriorityassignment
isused.Afterthecompletionofthe
requestsbusaccesstoprocesspacket
.Thiseffectively
,while
delaystheprocessingofpacket
until
isabletoprocessitwithoutwaitingforpacket
tocomplete.Thisresultsinthedeadlines
forbothpackets
andbeingmet.However,letusconsiderpackets
and
whosedeadlinesareinthe
sameorderastheirarrivaltimes.Afterprocess
forpacket
,andprocess
.Thisdelaystheexecutionof
process
ThedeficiencyofthecommunicationarchitecturethatleadstomisseddeadlinesintheTCPexamplecanbesum-marizedasfollows.Therelativeimportanceofthecommunicationtransactionsgeneratedbythevarioussystemcomponents(
,and
)variesdependingonthedeadlinesofthepacketstheyare
processing.Ingeneral,theimportanceorcriticalityofeachcommunicationtransactionmaydependonseveralfac-torswhichtogetherdeterminewhetherthecommunicationwillbeonthesystem’scriticalpath.Thecommunicationarchitectureneedstobeabletodiscernbetweenmorecriticalandlesscriticalcommunicationrequestsandservethemaccordingly.Asshowninthepreviousexample,conventionalcommunicationarchitecturessufferfromthefollowingdrawbacks:(i)thedegreeofcustomizabilityofferedmaybeinsufficientinsystemswithstringentperfor-mancerequirements,and(ii)theyaretypicallynotcapableofsensingandadaptingtothevaryingcommunicationneedsofthesystemandthevaryingnatureofthedatabeingcommunicated.
CAT-basedcommunicationarchitecturesaddresstheaboveproblemsthroughtheuseofahardwarelayerthatadaptstheunderlyingcommunicationarchitectureaccordingtothechangingneedsofthevariouscomponentscon-nectedtoit.WenextshowhowaCAT-basedcommunicationarchitecturecanbeusedtoimprovetheperformanceoftheTCPsystem.
Example2:ACAT-basedcommunicationarchitecturefortheTCPsystemisshowninFigure3(a).CATsareaddedtothecomponentsthatimplementthe
,and
processes.Further,thebus
controllogic(arbiterandcomponentbusinterfaces)isenhancedtofacilitatetheoperationoftheCATs.AmoredetailedviewofacomponentwithaCATisshowninFigure3(b).ThecomponentnotifiestheCATwhenitgeneratescommunicationrequests.TheCATalsoobservesselecteddetailsaboutthedatabeingcommunicatedandthecomponent’sinternalstate.
Inthisexample,theCATobservesthepacketsizeanddeadlinefieldsfromtheheaderofthepacketcurrentlybeingprocessedbythecomponent.TheCATperformsthefollowingfunctions:(i)itgroupscommunicationeventsbasedonthesizeanddeadlineofthepacketcurrentlybeingprocessed,and(ii)foreventsfromeachgroup,it
5
determinesanappropriateassignmentofvaluestothevariousprotocolparameters.Asaresult,thecharacteristicsofthecommunicationarchitecture(includingthetimerequiredtoperformacommunication)areadaptedaccordingtothedifferentneedsandrelativeimportanceofthecommunicationrequests.Therationalebehindusingthedeadlineisthatpacketswithcloserdeadlinesneedtobegivenhigherimportance.Therationalebehindusingthesizeofthepacketismorecomplex.Incaseswhenallthepacketsinthesystemhaveroughlyequaldeadlines,itisadvantageoustofavorthecompletionofpacketswhicharesmaller,sincetheyhaveabetterchanceofmeetingthedeadline.WeusedthetechniquespresentedlaterinthispapertoimplementtheCAT-basedTCPsystemarchitectureshowninFigure3.Fortheeaseofillustration,theCATswereusedtovaryonlythebuspriorities.AllotherparameterswerespecifiedtothesamevaluesthatasusedinthearchitectureofFigure1.TheCATgroupsthecommunicationrequestsgeneratedfromacomponentbasedonthepackettheybelongto,andthepriorityofallcommunicationrequestsassociatedwithapacketarecomputedusingtheformulapacketsize,deadline,andarrivaltime,respectively.
TheexecutionoftheoptimizedsystemisshowninFigure4.ThesamepacketsequencethatwasusedtoillustratetheinadequacyoftheconventionalcommunicationarchitectureinFigure2wasusedforthisexperiment.Thesystemmeetsthedeadlinesforallthepackets(recallthattheoriginalsystemarchitecturepresentedinFigure1misseddeadlinesforallpriorityassignments).Whenpacketassignstothecommunicationrequestsgeneratedby
where,
and
representthe
(whichhasatightdeadline)arrives,theCAT
meetingitsdeadline.
and
,whicharestillprocessingpacket.Thisleadstopacket
Whenpacket
arrives,however,thecommunicationrequestsgeneratedby
andtoprocesspackettocompletioninordertomeetitstightdeadline.
Pkt. iPkt. i+1arrivesarrivesPkt. i+1Pkt. ideadlinedeadlinePkt. jPkt. j+1arrivesarrivesPkt. jdeadlinePkt. j+1deadlineCAT-basedarch.ii+1ijj+1All pkts meetdeadlinesether_driverip_checkchecksumFigure4:ExecutionoftheCAT-basedarchitecturefortheTCPsystem
Assigningappropriatevaluesforcommunicationprotocolparameters(suchasprioritiesandDMAsizes)tothecriticalevents,andtranslatingtheseresultsintoahigh-performanceimplementation.
Whileseveraltechniqueshavebeenproposedforsystem-levelperformanceanalysis[1,2]andcanbeusedforthefirststep,weuseananalysisofthesystemexecutiontracesasabasisforidentifyingcriticalcommunicationevents.Asignificantadvantageofusingexecutiontracesgeneratedthroughsystemsimulation,isthattheycanbederivedforanysystemforwhichasystem-levelsimulationmodelexists.Thegeneratedtracescanbeanalyzedtoexaminetheimpactofindividual(orgroupsof)communicationeventsonthesystem’sperformance.Communicationeventswhichareonthesystem“criticalpaths”,andwhosedelayssignificantlyimpactthespecifiedperformancemetricscanbeclassifiedascritical.ThedetailsofthetechniqueweusetoidentifycriticalcommunicationeventsareprovidedinSection3.
Sincethesystemexecutiontraceisspecifictotheinputtracesorstimuliused,thereisnosimplewaytocorrelatethecriticalcommunicationeventsinthesimulationtracetocriticalcommunicationeventsthatoccurwhilethesystemexecutes(possiblyunderdifferentstimuli).Forexample,consideracommunicationtracewherethetwentieth,twenty-first,andtwenty-seconddatatransfersafterthestartofsystemexecutionareshowntohaveastrongimpactonsystemperformance.Speedingupthesedatatransferswouldsignificantlyimprovesystemperformanceforthegiveninputtrace.Supposethatwewouldliketotranslatetheseinsightsintoanimprovedcommunicationprotocol.Clearly,anaivesystem,wherethetwentieth,twenty-first,andtwenty-seconddatatransfershaveahigh-priority,mightnotrealizeanyperformancegains,becausethesequenceofeventsthatoccursduringthesystem’sexecutioncoulddiffersignificantlyfromthatofthetrace.Inadditiontoidentifyingcriticalcommunicationevents,weneedtocorrelatetheiroccurrencetoothereasilydetectablepropertiesofthesystemstateanddataitisprocessing.Forexample,ifananalysisofthesimulationtracerevealsthattheoccurrenceofacriticaldata-transferishighlycorrelatedtoaspecificbranchbeingencounteredinthebehaviorofthecomponentexecutingthetransfer,theoccurrenceofthebranchmightbeusedasapredictorforthecriticalityofthedatatransfersgeneratedbythecomponent.Thefollowingexampleexaminessometradeoffsindesigningthesepredictors.
Example3:ConsiderthesystemshowninFigure5,whichisusedtoencryptdataforsecuritybeforetransmissionontoacommunicationsnetwork.Component1processesthedata,determinesthecodingandencryptionschemetobeused,andsendsthedatatoComponent2,whichencodesandencryptsthedatabeforesendingitthroughthesharedbustotheperipheralthattransmitsitontothenetwork.Figure6showsthedatatransfersoccurringonthesystembus.Theshadedellipses,marked
(
),representdatatransfersfromComponent2tothenetwork7
peripheral.LetussupposethatComponent2shouldtransferdataatafixedrate,andeachdatatransfershouldoccurbeforeadeadline(indicatedinFigure6byadottedline).AkeyperformancemetricforthesystemisthenumberofdatatransferscompletedbyComponent2inatimelymanner.Thecommunicationtraceindicatesthatdeadlinesarefrequentlynotmet.Analysisofthesystemexecutiontracealsoidentifiescommunicationeventsthatdidnotmeettheirdeadlines,e.g.,
and
.Inaddition,italsoidentifiescriticalcommunicationevents,i.e.,thosewhich,when
spedup,couldpotentiallyimprovesystemperformance.Sincethewaysofimprovingsystemperformance.Let
canoccuronlyafter
,speedingupimprove.
isoneof
denotethesetofall’ssuchthat
doesnotmeetthedeadline.
Theperformanceofthesystemcanimproveifthecommunicationtimesoftheeventsin
Havingisolatedthecriticalcommunicationeventsfromthesimulationtrace,weneedtodevelopschemestoidentifytheseelementsduringtheexecutionofthesystem.Asmentionedbefore,thisisdonebycorrelatingtheoccurrenceofcriticalcommunicationeventswithinformationaboutthesystemstateanddataitisprocessing.Inthisexample,letussupposethatwechoosetocorrelatecriticalcommunicationeventswiththecontrol-flowhistoryofthecomponentthatgeneratedthem.Wedefineacontrol-floweventasaBooleanvariablewhichassumesavalueofwhenacomponentexecutesaspecificoperation.Forexample,thebehaviorof
showninFigure5
function
isannotatedwithcontrol-flowevents
,
,
,and
.Ingeneral,if
,
,
,
arethecontrol-flowevents
whichareusedtodeterminewhetherornotacommunicationrequestiscritical,wecandefinea
whoseon-setdenotesthesetofcommunicationeventsclassifiedascritical.
Thenumberofcontrol-flowvariablesusedforthisclassificationhasaprofoundimpactontheclassificationofcommunicationevents.Agoodclassificationshouldhavethepropertiesofaone-to-onemap,i.e.everyeventclassifiedascriticalshouldindeedbecritical,andeverycriticaleventshouldbedetectedbytheclassification.Suppose,inthisexample,weareallowedtouseonlyonevariableforclassification.LetuschooseWefindthat,inallthecaseswheredeadlinesaremissed,event
asaclassifier.isusedasa
occurs.Basedonthisinsight,wemaychoose
.However,
oftenoccursalongwithnon-criticalcommunicationeventsaswell.If
classifier,only16%ofthecommunicationeventsclassifiedtobecriticalareindeedcritical.Therefore,suffer.
…..could
mis-classifyseveralcommunicationevents,andincorrectlyincreasetheirpriorities,causingsystemperformanceto
if (u) { // e1 packet = concat(packet,data);}if (…) { // e2…}if (security == high) { //e3packet->encryption_level = 0;} else if (security == medium) {packet->encryption_level = 1;} else {packet->encryption_level = low;}switch (channel_char.) //e4 case 1: packet->code = 1;break; case 2: packet->code = 2; break; default:packet->code = 3; break;…send_packet(x, encryption_unit);Component 1Encoding and encryptionComponent 2Figure5:Adataencryptionsystemthatillustratestradeoffsintheidentificationofcriticalcommunicationevents
8
x1y1x2y2x3y3x4y4x5x6y5Missed deadlinedeadlinesTimeFigure6:AtraceofbusactivityforthesystemshowninFigure5
Figures7(a)and(b)plotthepercentageofcriticalcommunicationeventsincoveredby
,andthepercentageof
,versusthenumberofvariablesthatperformtheclassification,respectively.Theaxisshows
thenumberofvariablesusedtoperformtheclassification.Forexample,thebestclassifierthatusestwovariablescaptures100%ofcriticalcommunicationevents,whileonly50%ofthecommunicationeventsclassifiedas“critical”byitareactuallycritical.Notethat,inthisexample,asthenumberofvariablesincreases,thepercentageofcriticalcommunicationeventsin
increases.Thisisbecause,asthenumberofvariablesincreases,theclassification
coveredasthe
criterionbecomesmorestringent,andnon-criticaleventsarelesslikelytopassthetest.However,simultaneously,criticaleventscouldbemissed,asshowninFigure7(b)(notethedecreaseinthepercentageof
numberofvariablesusedincreases).Therefore,oneneedstojudiciouslychoosetherightnumberofvariables,andtherightclassificationfunctionsinordertomaximallyimprovesystemperformance.Inthisexample,optimalresultsareobtainedbyusingthreevariables(
,
,and
)andaclassificationfunction
3MethodologyandAlgorithmsfortheDesignofCommunicationArchitectureTuners
Inthissection,wepresentastructuredmethodologyandautomationalgorithmsforthedesignofCAT-basedcommunicationarchitectures.Section3.1explainstheoverallmethodologyandoutlinesthedifferentstepsinvolved.Section3.2presentsthealgorithmsusedtoperformthecriticalstepsinmoredetail.
3.1AlgorithmandMethodology:Overview
Inthissection,wedescribeourtechniquesinthecontextofadesignflowwherethesystemisfirstpartitionedandmappedontovariouspre-designedcoresandapplication-specificlogic.Basedonthecommunicationandcon-nectivityrequirementsofthesystem,acommunicationarchitecturetopologyisselected.Theselectedtopologycan
100(e1,e2,e3,e4)(e1,e2,e3)100PercenotcfaseiswnfhicharienS(e2,e3)(e1,e2)(e1,e3)e2,e3e123456PercenotScfapturedbfy123456LengthofhistoryusedtodetectpartitionsLengthofhistoryusedtodetectpartitions(a)(b)
Figure7:Aplotofdifferentclassificationmetricswithrespecttothenumberofvariablesusedfortheclassification
9
CATtokensCOMPONENTComm.requestst2cnt < n1t1t1t2 + t3t3cnt < n3CPU1Co-Proc.MPEGdecoderCATcnt < n2CATDatapropertiesP1P2cnt < n4Memory1BridgeVideo Enc.InterfaceData &controlsignalsPrioritygeneratorPartition IDDMAsizegen.ProtocolParametersParam# nArbiter1App.SpecificlogicArbiter2Memory2CATBUS INTERFACETo communication architecture(a)(b)
Figure8:(a)AnexamplesystemwithaCAT-basedcommunicationarchitecture,and(b)detailedviewofacompo-nentwithaCAT
thenbeoptimizedusingtheproposedtechniques.Ouralgorithmtakesasinputsasimulateablepartitioned/mappedsystemdescription,theselectedcommunicationarchitecturetopology,typicalenvironmentstimulusorinputtraces,andobjectivesand/orconstraintsonperformancemetrics.Theperformancemetricscouldbespecifiedintermsoftheamountoftimetakentocompleteaspecificamountofwork(e.g.,aweightedoruniformaverageofprocessingtimes)orintermsofthenumberofoutputdeadlinesmetormissedforapplicationswithreal-timeconstraints.Theoutputofthealgorithmisasetofoptimizedcommunicationprotocolsforthetargetsystem.Fromahardwarepoint-of-view,thesystemisenhancedthroughtheadditionofCommunicationArchitectureTunerswherevernecessary,andthroughthemodificationofthecontrollers/arbitersforthevariouschannelsinthecommunicationarchitecture.AtypicalsystemwithaCAT-basedcommunicationarchitecturegeneratedusingourtechniquesisshowninFig-ure8(a).Thesystemcontainsseveralcomponents,includingaprocessorcore,memories,andperipherals.Theselectedcommunicationarchitecturetopologyisenclosedinthedottedboundary.Thetopologyselectedconsistsofdedicatedchannelsbetweencomponents(e.g.,betweentheprocessorandco-processor),aswellastwosharedbusesthatareconnectedbyabridge.TheportionsofthesystemthatareaddedormodifiedasaresultofourtechniqueareshownshadedinFigure8(a).Ourtechniquecanbeappliedtogeneralcommunicationarchitecturetopologiesthatcanbeexpressedasanarbitraryinterconnectednetworkofdedicatedandsharedchannels.
AmoredetailedviewofacomponentwithaCATisshowninFigure8(b).TheCATconsistsofa“partitiondetector”circuit,whichisshownasafinite-stateautomatoninthefigure,andparametergenerationcircuitsthatgeneratevaluesforthevariouscommunicationarchitectureprotocolparametersduringsystemexecution.Wenextdescribetheroleofthesecircuitsbriefly.
Partitiondetector:Welooselydefineacommunicationpartitionasasubsetofthecommunicationtransactionsgeneratedbythecomponentduringsystemexecution.Foreachcomponent,ouralgorithmidentifiesanumberof
10
TokensComm.transactionsPartitionRecognizerStateParameter(Priority)Comm.DelayS0S1S2S3S0242Figure9:SymbolicillustrationofCAT-optimizedcommunicationarchitectureexecution
partitions,andtheconditionsthatmustbesatisfiedbyacommunicationtransactionforittobeclassifiedundereachpartition.Theseconditionsareincorporatedintothepartitiondetectorcircuit.Thepartitiondetectorcircuitmonitorsandanalyzesthefollowinginformationgeneratedbythecomponent:
Tracertokensgeneratedbythecomponenttoindicatethatitisexecutingspecificoperations.ThecomponentisenhancedtogeneratethesetokenspurelyforthepurposeoftheCAT.
Thecommunicationtransactioninitiationrequeststhataregeneratedbythecomponent.
Anyotherapplication-specificpropertiesofthecommunicationdatabeinggeneratedbythecomponent(e.g.,fieldsinthedatawhichindicateitsrelativeimportance).
Thepartitiondetectorusesspecificsequencesoftracertokensandcommunicationrequeststoidentifythebeginningandendofasequenceofconsecutivecommunicationtransactionsthatbelongtoapartition.Forexample,theregularexpressions
and
maybeusedtodelineatecommunicationeventsthatbelongtopartition
.InSection3.2,wepresentgeneraltechniquestoautomaticallycomputethestartandendconditionsforeach
partition.
Parametergenerationcircuits:Thesecircuitscomputevaluesforcommunicationprotocolparameters(e.g.pri-orities,DMA/blocksizes,etc.)basedonthepartitionIDgeneratedbythepartitiondetectorcircuit,andotherapplication-specificdatapropertiesspecifiedbythesystemdesigner.Thevaluesoftheseparametersaresenttothearbitersandcontrollersinthecommunicationarchitecture,resultinginachangeinthecharacteristicsofthecommu-nicationarchitecture.AutomatictechniquestodesigntheparametergenerationcircuitsarepresentedinSection3.2.ThefunctioningofaCAT-basedcommunicationarchitectureisillustratedusingsymbolicwaveformsinFigure9.Thefirsttwowaveformsrepresenttracertokensgeneratedbythecomponent.Thenexttwowaveformsrepresentthecommunicationtransactionsgeneratedbythecomponent,andthestateofthepartitiondetectorcircuit,respectively.Thestateofthepartitiondetectorcircuitchangesfirstfrom
to
,andlaterfrom
to
,inreactiontothe
tracertokensgeneratedbythecomponent.Thefourthcommunicationtransactiongeneratedbythecomponentafter
Inputs:Partitioned/mappedsystem,Comm.Arch.topology,Inputtraces,PerformancemetricsAnalyzesystem,createComm.AnalysisGraphCAGPartitioncommunicationinstancesEvaluateclusterPartitions/statisticsclusters3Assignparametervaluesto4clusters12Improvedperformance?Re-Analyzesystem,Re-computeperformancemetrics5newcomm.arch.protocolsSynthesizeCATstorealizeoptimizedprotocols6Outputs:OptimizedCAT-basedsystemcommunicationarchitectureSystemwithFigure10:AlgorithmfordesigningCAT-basedcommunicationarchitectures
thepartitiondetectorreachesstate
causesittotransitionintostate
.Allcommunicationtransactionsthatoccur
whenthepartitiondetectorFSMisinstateareclassifiedasbelongingtopartition.Thefifthwaveform
showstheoutputoftheprioritygenerationcircuit.TheprioritygenerationcircuitassignsaprioritylevelofallcommunicationtransactionsthatbelongtopartitionofFigure9.
to
.Thisincreaseinpriorityleadstoadecreaseinthe
delayassociatedwiththecommunicationtransactionsthatbelongtopartition,asshowninthelastwaveform
TheoverallalgorithmfordesigningCAT-basedcommunicationarchitecturesisshowninFigure10.Instep1,performanceanalysisisperformedonthepartitioned/mappedsystemdescriptioninordertoderivetheinformationandstatisticsusedinthelatersteps.Inourwork,weusetheperformanceanalysistechniquepresentedin[25],whichiscomparableinaccuracytocompletesystemsimulation,whilebeingmuchmoreefficienttoemployinaniterativemanner.Theoutputofthisanalysisisacommunicationanalysisgraph,(CAG)whichisahighlycompactrepresentationofthesystem’sexecutionunderthegiveninputtraces.Theverticesinthegraphrepresentclustersofcomputationsandabstractcommunicationsperformedbythevariouscomponentsduringthesystemexecution.Theedgesinthegraphrepresenttheinter-dependenciesbetweenthevariouscomputationsandcommunications.Notethatsincethecommunicationanalysisgraphiseffectivelyunrolledintime,itisacyclic,andmaycontainseveraldistinctinstancesofasinglecomputationoperationorcommunicationfromthesystemspecification.Thecommunicationanalysisgraphisconstructedbyextractingnecessaryandsufficientinformationfromadetailedsystemexecutiontrace[25].TheCAGcanbeeasilyanalyzedtodeterminevariousperformancestatisticssuchassystemcriticalpath,averageprocessingtime,numberofmisseddeadlines,etc.
Instep2,wegroupthecommunicationverticesinthecommunicationanalysisgraphintoanumberofparti-tions.Themainrationalebehindthispartitioningisthateachofthepartitionsmayhavedistinctcommunicationrequirements,andhencemaypotentiallyrequireadifferentsetofvaluestobeassignedtotheparametersofthecommunicationprotocol(e.g.,priorities,DMAsizes,etc.)inordertooptimizesystemperformance.Notethatintheextremecase,eachcommunicationvertexinthecommunicationanalysisgraphcanbeassignedtoadistinctpartition.However,thishastwodisadvantages:(i)theareaanddelayoverheadincurredintheCATmaybecomeprohibitive,
12
and(ii)asillustratedinSection2,theuseofverysmallpartitionscanleadtoCAThardwarethatishighlysensitivetovariationsininputtraces.Weproposeanovelmetric,calledsensitivity,whichisusedtogroupcommunicationinstances(vertices)intopartitionsinSection3.2.1.Wealsopresenttechniquesthatenablethedesignertoselectanoptimalgranularityforthepartitions.
Step3evaluatesvariousstatisticsforeachcommunicationpartition,basedonwhich,step4determinesanassign-mentofcommunicationarchitectureparametervaluesforeachpartition.ThedetailsofthesestepsarepresentedinSection3.2.2.Theoutputofstep4isasetofcandidateprotocolsforthesystemcommunicationarchitecture.Step5re-evaluatesthesystemperformancefortheoptimizedprotocolsderivedinStep4.Ifaperformanceimprovementresults,steps1to5arerepeateduntilnofurtherperformanceimprovementisobtained.
Step6dealswithsynthesisofhardware(CATs)toimplementtheoptimizedprotocolsthatweredeterminedinstep4.AsillustratedinSection2,itiscriticaltoconsiderthehardwareimplementationcomplexityandoverheadsinordertofullyexploitthepotentialofCAT-basedcommunicationarchitectures.InSection3.2.3,weformulatetheproblemofgeneratingthepartitiondetectorandparametergenerationcircuitsasaproblemofgeneratingaminimum-complexityfunctiontofitasetofdatapoints,andoutlinehowitcanbeefficientlysolvedusingwell-knowntechniquesfromregressiontheory[26].3.2AlgorithmandMethodology:Details
Inthissectionwedescribethestepsoutlinedaboveinmoredetail.Wepresenttechniquestoobtainpartitionsofthecommunicationeventinstances,discusshowtoselectanoptimalsetofprotocolparametervaluesandhowtosynthesizeCAThardwareforclassifyingcommunicationeventinstancesintopartitions.3.2.1Profilingandpartitioningcommunicationeventinstances
Inthissection,wedescribethepartitioningstepofourmethodology(step2ofFigure10).Theobjectiveofthepartitioningstepistoidentifyandclusterintoasinglepartition,asetofcommunicationeventinstancesthatcanbetreatedbythecommunicationprotocolinauniformmanner.Forinstance,theprotocolcoulddefineallmembersofagivenpartitiontohavethesamepriorityforaccessingasharedbus.
Thecommunicationanalysisgraphgeneratedbystep1ofouralgorithmcontainssufficientinformationtomea-suretheperformanceofthesystemasafunctionofthedelaysofitscommunicationevents.Instep2,weperformananalysisoftheCAGtomeasuretheimpactofindividualcommunicationinstancedelaysonthesystemperfor-mance.Instanceswhichhaveasimilarimpactonthesystemperformancearegroupedintothesamepartition.Theperformanceimpactofaninstanceismeasuredbyaparametercalledpartitioningprocedure.
Figure11showsasectionofaCAGgeneratedfromarepresentativeexecutionofanexamplesystem.Shadedverticesvertex
thatcapturesthechangeinsys-
temperformancewhenthecommunicationdelayoftheinstancechanges.Thefollowingexampleillustratesour
through
representinstancesofcommunicationevents.Vertices
and
representthefinaloutputsof
thesystem.Theobjectivefunctiontobeminimizedisthequantity
whereisthefinishtimeofa
is
intheCAG.
Tomeasuretheofthesystemperformancetocommunicationinstance
,theexistingdelayof
perturbedbyavalue
,andatraversalofthetransitivefanoutofintheCAGisusedtore-computethestartandby
unitsdelaysthefinishofboth
and
finishtimesoftheaffectedvertices.Theupdatedfinishtimesoftheverticesareusedtocalculatethechangeinthesystemperformancemetric.Inthisexample,perturbingthedelayof
13
systemexecutiontime(clockcycles)c1systemcomponentsStart=16Finish=32c2Start=33Finish=44z1Start=45*=50Finish=10,O=t(z)20s(c1=s(c2=)10)10s(c3=)0s(c4=)1+t(z2)c3Start=4Finish=9Start=33Finish=34Start=10Finish=14c4z2c1c2c3c4Start=35Finish*=40CP1CP2CP3Figure11:SensitivitycalculationandpartitioninginstancesintheCAG
by10unitseach,whileperturbingthedelayofby10units.Since
delays
alone.Similarly,delaying
delaysthefinishtimeof
,which
doesn’tlieonacriticalpath,perturbingithasnoeffectonsystemperformance.
Usingtheproceduredescribedabove,wecalculateasensitivitymeasuresthechangeinthevalueoftheobjectivefunction
valuesshowninFigure11,
isassignedto
,
foreachcommunicationinstance
afterperturbingthedelayof
areassignedto
by
.Next,weassign
.
communicationinstancesthathavesimilarsensitivityvaluestothesamepartition.Inthisexample,basedonthe
and
,and
isassignedto
Asmentionedbefore,eventsinthesamepartitionaretreatedsimilarlybytheCAT.3.2.2ModifyingProtocolParameters
Inthissectionwedescribesteps3and4oftheoverallflow,i.e.,howtoexamineeachpartitionandthenassignoptimizedprotocolparametervaluestothem.Whileourdiscussionisconfinedtodeterminingtheprioritythatshouldbeassignedtoeachpartition,itcouldbeextendedtoincludeotherprotocolparameterssuchaswhetherburstmodeshouldbesupportedornot,andifsowhatthecorrectDMAsizeshouldbe.The
ofapartitionindicatestheimpactitseventshaveontheperformanceofthesystem.However
assigningprioritiesbasedonthesensitivityofapartitionalonemaynotleadtothebestassignment.Thisisbecausesensitivitydoesnotcapturetheindirecteffectsofacommunicationeventorsetofeventsonthedelaysofotherconcurrentcommunicationevents(sucheffectsoccurduetothepresenceofsharedchannels/busesinthecommuni-cationarchitecture).Weaccountforthisbyderivingametricthatpenalizespartitionswhicharelikelytonegativelyimpactthedelaysofcommunicationeventsinotherpartitions.Inordertoobtainthisinformation,weanalyzetheCAGandevaluate,foreachpairofpartitionsbelongtopartition
,theamountoftimeforwhichcommunicationeventsthat
)thatinstancesin
aredelayedduetoeventsfrom
and
.Table1showsexampledataforasystemwiththreepartitions.
Column2givesthesensitivityofeachpartition.Columns3,4and5givesthetotaltime(
,
waitforinstancesineachoftheotherpartitions.Forexample,instancesin
induce
atotalwaitof100cyclesforinstancesofthetotalwaitingtime(ofonly7cycleson
tofinish.Column6givesthesumofcolumns3,4and5toindicate
)eventsinpartition
haveintroducedinotherpartitions,e.g.,
inducesatotalwait
and.
Findingtheidealwaytocombinethesestatisticalparametersintoaformulathatproducestheoptimumpriorityassignmentisahardoptimizationproblemtosolve.Instead,weuseaheuristiccalculationthatboostsapartition’spriorityinawayproportionaltoitssensitivity,butpenalizesitforthewaitingtimes
itintroducesinother
14
Table1:StatisticsofthePartitionPartitionCP1CP2CP3Sensitivitys(ci)1008510wi1(clockcycles)040wi2(clockcycles)10007wi3(clockcycles)330Wi(clockcycles)10377Prioritymapping17.18=>223.57=>1-75.0=>3partitions.UsingthenotationofTable1,thepriorityofapartition
isdefinedas:
example,for
,
isthe
,
,and
.Figure12showstheactualclassificationofcommuni-whetheror
cationinstancesthatresultsfromeachofthethreeformulae.Figure13showsthepredictionaccuracyofeachoftheformulaeundertest.Itturnsoutthatnotagiveninstancebelongsto
performsthebest,predictingwithaprobabilityof
.
t1c&&cntc&&cnt2<3Partition1=“yes”c&&cnt32=S0t1t1=communicationcinstancet=tracerinstance1counterforxcnt=1counterforpcnt=2S341=t1S1xc&&cnt41<Sinitial=0statet1acceptingS=state3S2Figure14:FSMimplementationof
Eachformulainvolvesa
asastartingpointandacountonthenumberofoccurrencesofcommunication
.
eventsandhencecanbeexpressedasaregularexpression.Consequently,itcanbedirectlytranslatedtohardwareimplementationasaFiniteStatemachine(FSM).Figure14showstheFSMthatimplementsIngeneral,choosingtheappropriatetracertokensandappropriatevaluesfortechniquestosolveit.
AdatasetisconstructedfromtheCAGforeachexaminedor
and
maynotbeatrivialtask.
Weformulatetheproblemintermsofawell-knownproblemfromregressiontheory,anduseknownstatistical
consistingof
,anda
valueforeach
(derivedfromthepartitionedCAG)indicatingwhetherornotthecommunicationinstanceat
distancefromthetracertokenbelongstoapartition
.Theregressionfunction
isdefinedasfollows:
Whenand
is1,itindicatestheinstanceatadistance
belongsto
.Anassignment
(where
)isrequiredthatcausestheleastsquareerror
,where
isthevaluefromthedata
set,and
istheprediction.Sincetheregressionfunctionisnon-linearin,nouniversaltechniqueisknown
tocomputeanexplicitsolution.However,severalheuristicsanditerativeproceduresexistwhichmaybeused[26].Notethat,theregressionfunctioncouldingeneralbeconstructedtoutilizeadditionaldesigner-specifiedparam-eters,suchaspartialinternalstatefromthecomponent,andpropertiesofthedatabeingprocessedbythesystem(e.g.,aQoSstamporadeadlinevalue).
4ExperimentalResults
Inthissectionwepresentresultsoftheapplicationofourtechniquestoseveralexamplesystems,includingaTCP/IPnetworkinterfacecardsystem,andthepacketforwardingunitofanoutput-queuedATMswitch.Wepresentperformanceresultsbasedonsystem-levelco-simulationforeachexample.
ThefirstexampleistheTCPsystemdescribedinSection2.ThesecondexampleisapacketforwardingunitofanoutputqueuedATMswitch(showninFigure15).Thesystemconsistsof
outputports,eachwithadedicated
smalllocalmemorythatstoresqueuedpacketaddresses.Thearrivingpacketbitsarewrittentoadual-portedshared
16
memory.Thestartingaddressofeachpacketiswrittentoanappropriateoutputqueuebyascheduler.Eachportpollsitsqueuetodetectpresenceofapacket.Ifitisnon-empty,theportissuesa
signaltoitslocalmemory,
extractstherelevantpacketfromthedual-portedsharedmemoryandsendsitontoitsoutputlink.
Thenextexample,SYS,isafourcomponentsystem(showninFigure16)whereeachcomponentissuesindepen-dentconcurrentrequestsforaccesstoasharedmemory.Figure17showsBRDG,anothersystemconsistingoffourcomponents,twomemoriesandtwobusesconnectedbyabridge.Thecomponentsthemselvesareeachconnectedtooneofthebuses,butcanmakerequeststothelocalbusarbiterforaccesstotheremotememoryviathebridge.Also,thecomponentssynchronizewitheachotherviadedicatedlinks.
Table2demonstratestheperformancebenefitsofusingCAT-basedcommunicationarchitecturesoverastaticprioritybasedcommunicationprotocol[15].Eachrowinthetablerepresentsoneoftheexamplesystemsdescribedearlier.Foreachsystem,column2definesaperformancemetric.InthecaseofTCP,SYSandATMthesearederivedfromasetofdeadlinesthatareassociatedwitheachpieceofdatathatpassesthroughthesystem.Theobjectiveineachcaseistominimizethenumberofmisseddeadlinesfortheseexamples.InthecaseofBRDG,eachdatatransactionisassignedaweight.Theperformanceofthesystemisexpressedasaweightedmeanoftheprocessingtimeofeachtransaction.Theobjectiveinthiscaseistominimizethisweightedaverageprocessingtime.ThestaticcommunicationprotocolconsistsofafixedDMAsizeforeachcommunicationrequestandastaticprioritybasedbusarbitrationscheme.Fortheseexamples,theCATsschemeforidentifyingpartitionsandassigningprioritiesandDMAsizesmakesuseofuserspecifiedinformationsuchastheweightsoneachrequestsanddeadlinesasdescribedinSection3toprovideforamoreflexiblecommunicationprotocol.
Foreachsystem,column4reportsperformanceresultsobtainedusingastaticcommunicationprotocol,whilecolumn5reportsresultsgeneratedbysimulatingaCATsbasedarchitecture.Speed-upsarereportedincolumn6.TheresultsindicatethatsignificantbenefitsinperformancecanbeobtainedbyusingaCATsbasedarchitectureoveraprotocolusingfixedparametervalues.InthecaseofTCP/IP,thenumberofmisseddeadlineswasreducedtozero,whileinthecaseofSYS,aobserved.
ThedesignofanefficientCAT-basedcommunicationarchitecturedependsontheselectionofagoodrepresen-tativetracewhenperformingthevariousstepsofthealgorithmofFigure10.However,ouralgorithmsattemptto
NetworkOutperformanceimprovement(reductioninthenumberofmisseddeadlines)was
Celalddressesport2BusI/FCATNetworkInQueue1port1ATMcellSchedulerBusI/FCATQueue3port3BusI/FCATQueue2ArbiterDual-portedMemoryCellbitsFigure15:OutputqueuedATMswitch
17
BusI/FCATQueue4port4Comp1CATBusI/FComp2CATBusI/FComp3CATBusI/FComp4CATBusI/FComp1CATBus1I/FComp2CATBus2I/FComp3CATBus1I/FComp4CATBus2I/FARBITER1Memory1Bus1BridgeBus2Figure16:ExamplesystemSYSwith
concurrentbusaccess
ARBITER1MEMORY1ARBITER2MEMORY2Figure17:ExamplesystemBRDGwithmultiplebuses
Table2:PerformanceofsystemsusingCATbasedarchitecturesExampleSystemTCP/IPSYSATMBRDGPerformancemetricmisseddeadlinesmisseddeadlinesmisseddeadlinesavg.executiontime(cycles)InputTraceInformation20packets573transactions169packets10,000clockcyclesStaticProtocol1041340304.72CATsbasedarchitecture01716254.1Performanceimprovement-24.32.51.2generatecommunicationarchitecturesthatarenotspecifictotheinputtracesusedtodesignthem,butdisplayim-provedperformanceoverawiderangeofcommunicationtraces.InordertoanalyzetheinputtracesensitivityoftheperformanceimprovementsobtainedthroughCAT-basedcommunicationarchitectures,weperformedthefollowingadditionalexperiment.FortheSYSexample,wesimulatedthesystemwithCAT-basedandconventionalcommu-nicationarchitecturesforthreedifferentinputtracesthathadwidelyvaryingcharacteristics.Table3presentstheresultsofourexperiments.Theparametersoftheinputtraceswerechosenatrandomtosimulaterun-timeunpre-dictability.Inallthecases,thesystemwithaCAT-basedcommunicationarchitecturedemonstratedaconsistentandsignificantimprovementoverthesystembasedonaconventionalcommunicationarchitecture.ThisdemonstratesthattheperformanceofCAT-basedarchitecturesarenotoverlysensitivetovariationsintheinputstimuli,sincetheyarecapableofadaptingtothechangingneedsofthesystem.
Table3:ImmunityofCATbasedarchitecturestovariationininputsInputstotheSYSexampleTrace1Trace2Trace3InputTraceInformation848transactions573transactions1070transactionsStaticProtocol318413316CATsbasedarchitecture1611738Performanceimprovement1.9824.38.3718
5ConclusionsandFutureWork
Thispaperpresentedageneralmethodologyforthedesignofcustomsystem-on-chipcommunicationarchitec-tures,basedontheadditionofalayerofcircuitry,calledtheCommunicationArchitectureTuner(CAT),aroundanyexistingcommunicationarchitecturetopology.Theaddedlayerrendersthesystemcapableofadaptingtothechangingcommunicationneedsofitsconstituentcomponents.WeillustratedissuesandtradeoffsinvolvedinthedesignofCAT-basedcommunicationarchitectures,andpresentedalgorithmstoautomatethekeysteps.Experimen-talresultsindicatethatperformancemetrics(e.g.numberofmisseddeadlines,averageprocessingtime)forsystemswithCAT-basedcommunicationarchitecturesaresignificantly(sometimes,overanorderofmagnitude)betterthanthosewithconventionalcommunicationarchitectures.
References
[1]D.D.Gajski,F.Vahid,S.NarayanandJ.Gong,SpecificationandDesignofEmbeddedSystems.PrenticeHall,1994.[2]G.DeMicheli,SynthesisandOptimizationofDigitalCircuits.McGraw-Hill,NewYork,NY,1994.
[3]R.Ernst,J.Henkel,andT.Benner,“Hardware-softwarecosynthesisformicrocontrollers,”IEEEDesign&TestMagazine,pp.–75,
Dec.1993.
[4]T.B.Ismail,M.Abid,andM.Jerraya,“COSMOS:Acodesignappraochforacommunicatingsystem,”inProc.IEEEInternational
WorkshoponHardware/SoftwareCodesign,pp.17–24,1994.
[5]A.KalavadeandE.Lee,“Agloballycritical/locallyphasedrivenalgorithmfortheconstrainedhardwaresowftwarepartitioningproblem
,”inProc.IEEEInternationalWorkshoponHardware/SoftwareCodesign,pp.42–48,1994.
[6]P.H.Chou,R.B.Ortega,andG.B.Borriello,“TheCHINOOKhardware/softwarecosynthesissystem,”inProc.Int.Symp.System
LevelSynthesis,pp.22–27,1995.
[7]B.Lin,“Asystemdesignmethodolgyforsoftware/hardwarecodevelopmentoftelecommunicationnetworkapplications,”inProc.
DesignAutomationConf.,pp.672–677,1996.
[8]B.P.Dave,G.Lakshminarayana,andN.K.Jha,“COSYN:hardware-softwarecosynthesisofembeddedsystems,”inProc.Design
AutomationConf.,pp.703–708,1997.
[9]P.KnudsenandJ.Madsen,“Integratingcommunicationprotocolselectionwithpartitioninginhardware/softwarecodesign,”inProc.
Int.Symp.SystemLevelSynthesis,pp.111–116,Dec.1998.
[10]T.YenandW.Wolf,“Communicationsynthesisfordistributedembeddedsystems,”inProc.Int.Conf.Computer-AidedDesign,
pp.288–294,Nov.1995.
[11]J.Daveau,T.B.Ismail,andA.A.Jerraya,“Synthesisofsystem-levelcommunicationbyanallocationbasedapproach,”inProc.Int.
Symp.SystemLevelSynthesis,pp.150–155,Sept.1995.
[12]M.GasteierandM.Glesner,“Bus-basedcommunicationsynthesisonsystemlevel,”inACMTrans.DesignAutomationElectronic
Systems,pp.1–11,Jan.1999.
[13]R.B.OrtegaandG.Borriello,“Communicationsynthesisfordistributedembeddedsystems,”inProc.Int.Conf.Computer-Aided
Design,pp.437–444,1998.
[14]“SonicsIntegrationArchitecture,SonicsInc.(http://www.sonicsinc.com/).”.
[15]On-ChipBusDevelopmentWorkingGroupSpecification1Version1.1.0.VSIAlliance,Aug.1998.
[16]G.BorrielloandR.H.Katz,“Synthesisandoptimizationofinterfacetransducerlogic,”inProc.Int.Conf.ComputerDesign,Nov.1987.[17]J.S.SunandR.W.Brodersen,“Designofsysteminterfacemodules,”inProc.Int.Conf.Computer-AidedDesign,pp.478–481,Nov.
1992.
[18]P.GutberletandW.Rosenstiel,“Specificationofinterfacecomponentsforsynchronousdatapaths,”inProc.Int.Symp.SystemLevel
Synthesis,pp.134–139,1994.
[19]S.NarayananandD.D.Gajski,“Interfacingincompatibleprotocolsusinginterfaceprocessgeneration,”inProc.DesignAutomation
Conf.,pp.468–473,June1995.
[20]P.Chou,R.B.Ortega,andG.Borriello,“Interfaceco-synthesistechniquesforembeddedsystems,”inProc.Int.Conf.Computer-Aided
Design,pp.280–287,Nov.1995.
[21]J.Oberg,A.Kumar,andA.Hemani,“Grammar-basedhardwaresynthesisofdatacommunicationprotocols,”inProc.Int.Symp.System
LevelSynthesis,pp.14–19,1996.
[22]R.Passerone,J.A.Rowson,andA.Sangiovanni-Vincentelli,“Automaticsynthesisofinterfacesbetweenincompatibleprotocols,”in
Proc.DesignAutomationConf.,pp.8–13,June1998.
[23]J.SmithandG.DeMicheli,“Automatedcompositionofhardwarecomponents,”inProc.DesignAutomationConf.,pp.14–19,June
1998.
[24]A.S.Tanenbaum,ComputerNetworks.EnglewoodCliffs,N.J.:PrenticeHall,19.
[25]K.Lahiri,A.RaghunathanandS.Dey,“FastPerformanceAnanlysisofBusBasedSystem-on-ChipCommunicationArchitectures,”in
Proc.Int.Conf.Computer-AidedDesign,Nov.1999.
[26]G.A.F.Seber,C.J.Wild.,Non-linearRegression.Wiley,NewYork,19.
19
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- baomayou.com 版权所有 赣ICP备2024042794号-6
违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务