Publications

(2024). PruneGNN: An Optimized Algorithm-Hardware Framework for Graph Neural Network Pruning. IEEE International Symposium on High-Performance Computer Architecture (HPCA).

(2024). MaxK-GNN: Towards Theoretical Speed Limits for Accelerating Graph Neural Networks Training. ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

(2024). Masked Memory Primitive for Key Insulated Schemes. IEEE International Symposium on Hardware Oriented Security and Trust (HOST).

(2024). Exploiting Intrinsic Redundancies in Dynamic Graph Neural Networks for Processing Efficiency. IEEE Computer Architecture Letters.

DOI

(2023). MergePath-SpMM: Parallel Sparse Matrix-Matrix Algorithm for Graph Neural Network Acceleration. 2023 IEEE International Symposium on Performance Analysis of Systems and Software.

PDF

(2023). Characterization of Timing-Based Software Side-Channel Attacks and Mitigations on Network-on-Chip Hardware. ACM Journal on Emerging Technologies in Computing Systems.

PDF DOI

(2023). ASM: An Adaptive Secure Multicore for Co-Located Mutually Distrusting Processes. ACM Trans. Archit. Code Optim..

PDF DOI

(2023). Secure Remote Attestation with Strong Key Insulation Guarantees. IEEE Transactions on Computers.

PDF DOI

(2023). Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks. 2023 IEEE/ACM International Conference On Computer Aided Design.

PDF

(2022). On the Design of Quantum Graph Convolutional Neural Network in the NISQ-Era and Beyond. 2022 IEEE International Conference on Computer Design.

PDF

(2022). CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM. 2022 IEEE International Conference on Computer Design.

PDF

(2022). Towards Sparsification of Graph Neural Networks. 2022 IEEE International Conference on Computer Design.

PDF

(2022). Towards Real-time Temporal Graph Learning. 2022 IEEE International Conference on Computer Design.

PDF

(2022). SSE: Security Service Engines to Accelerate Enclave Performance in Secure Multicore Processors. IEEE Computer Architecture Letters.

DOI

(2022). Secure Remote Attestation with Strong Key Insulation Guarantees. CoRR.

PDF

(2022). Protecting On-Chip Data Access Against Timing-Based Side-Channel Attacks on Multicores. 2022 IEEE International Symposium on Secure and Private Execution Environment Design.

DOI

(2022). HD-CPS: Hardware-assisted Drift-aware Concurrent Priority Scheduler for Shared Memory Multicores. IEEE International Symposium on High-Performance Computer Architecture (HPCA).

DOI

(2022). Characterization of mitigation schemes against timing-based side-channel attacks on PCIe hardware. IEEE International Symposium on Quality Electronic Design (ISQED).

(2021). Seeds of SEED: Characterizing Enclave-level Parallelism in Secure Multicore Processors. IEEE International Symposium on Secure and Private Execution Environment Design.

DOI

(2021). ConNOC: A practical timing channel attack on network-on-chip hardware in a multicore processor. IEEE International Symposium on Hardware Oriented Security and Trust.

(2021). Bilinear Map Based One-Time Signature Scheme with Secret Key Exposure.

PDF

(2021). Autonomous Secure Remote Attestation even when all Used and to be Used Digital Keys Leak.

PDF

(2021). An Efficient Algorithm for the Construction of Dynamically Updating Trajectory Networks. IEEE Conference on High Performance Extreme Computing.

(2021). A performance predictor for implementation selection of parallelized static and temporal graph algorithms. Concurrency and Computation: Practice and Experience.

PDF DOI

(2020). IRONHIDE: A Secure Multicore that Efficiently Mitigates Microarchitecture State Attacks for Interactive Applications. IEEE International Symposium on High Performance Computer Architecture, HPCA 2020, San Diego, CA, USA, February 22-26, 2020.

PDF DOI

(2020). In-Hardware Moving Compute to Data Model to Accelerate Thread Synchronization on Large Multicores. IEEE Micro.

PDF DOI

(2020). Exploring accelerator and parallel graph algorithmic choices for temporal graphs. PMAM@PPoPP ‘20: Eleventh International Workshop on Programming Models and Applications for Multicores and Manycores colocated with the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, USA, February 22, 2020.

PDF DOI

(2020). Accelerating Relax-Ordered Task-Parallel Workloads Using Multi-Level Dependency Checking. Proceedings of the 34th ACM International Conference on Supercomputing.

PDF DOI

(2019). POSTER: Exploiting Multi-Level Task Dependencies to Prune Redundant Work in Relax-Ordered Task-Parallel Algorithms. 28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019, Seattle, WA, USA, September 23-26, 2019.

PDF DOI

(2019). HeteroMap: A Runtime Performance Predictor for Efficient Processing of Graph Analytics on Heterogeneous Multi-Accelerators. IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2019, Madison, WI, USA, March 24-26, 2019.

PDF DOI

(2019). Guest Editors Introduction: Special Section on Emerging Technologies in Computer Design. IEEE Trans. Emerg. Top. Comput..

PDF DOI

(2019). Advancing the State-of-the-Art in Hardware Trojans Detection. IEEE Trans. Dependable Secur. Comput..

PDF DOI

(2018). Software-Hardware Managed Last-level Cache Allocation Scheme for Large-Scale NVRAM-Based Multicores Executing Parallel Data Analytics Applications. 2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Vancouver, BC, Canada, May 21-25, 2018.

PDF DOI

(2018). Multicore Resource Isolation for Deterministic, Resilient and Secure Concurrent Execution of Safety-Critical Applications. IEEE Comput. Archit. Lett..

PDF DOI

(2018). Guest Editorial: Special Section on Defect and Fault Tolerance in VLSI and Nanotechnology. IEEE Trans. Emerg. Top. Comput..

PDF DOI

(2018). Declarative Resilience: A Holistic Soft-Error Resilient Multicore Architecture that Trades off Program Accuracy for Efficiency. ACM Trans. Embedded Comput. Syst..

PDF DOI

(2018). Breaking the Oblivious-RAM Bandwidth Wall. 36th IEEE International Conference on Computer Design, ICCD 2018, Orlando, FL, USA, October 7-10, 2018.

PDF DOI

(2018). Accelerating Synchronization in Graph Analytics Using Moving Compute to Data Model on Tilera TILE-Gx72. 36th IEEE International Conference on Computer Design, ICCD 2018, Orlando, FL, USA, October 7-10, 2018.

PDF DOI

(2017). Towards Resilient yet Efficient Parallel Execution of Convolutional Neural Networks. 2017 Boston area ARChitecture Annual Workshop (BARC).

PDF

(2017). Situationally Adaptive Scheduling of Graph Algorithms on Single-Chip Parallel Machines. 2017 Boston area ARChitecture Annual Workshop (BARC).

(2017). Revisiting Definitional Foundations of Oblivious RAM for Secure Processor Implementations. CoRR.

PDF

(2017). QUARQ: A Novel General Purpose Multicore Architecture for Cognitive Computing. 2017 SRC TECHCON.

PDF

(2017). GraphTuner: An Input Dependence Aware Loop Perforation Scheme for Efficient Execution of Approximated Graph Algorithms. 2017 IEEE International Conference on Computer Design, ICCD 2017, Boston, MA, USA, November 5-8, 2017.

PDF DOI

(2017). Exploiting the Tradeoff between Program Accuracy and Soft-error Resiliency Overhead for Machine Learning Workloads. CoRR (appeared in IEEE Workshop on Silicon Errors in Logic - System Effects).

PDF

(2017). Exploiting Heterogeneous Parallel Accelerators to Improve Performance in Graph Analytics. 2017 SRC TECHCON. Best In Session Award.

PDF

(2017). Efficient Situational Scheduling of Graph Workloads on Single-Chip Multicores and GPUs. IEEE Micro.

PDF DOI

(2017). Accelerating Graph and Machine Learning Workloads Using a Shared Memory Multicore Architecture with Auxiliary Support for In-hardware Explicit Messaging. 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, FL, USA, May 29 - June 2, 2017.

PDF DOI

(2016). A Lightweight Spatio-Temporally Partitioned Multicore Architecture for Concurrent Execution of Safety Critical Workloads. SAE Technical Paper.

PDF DOI

(2016). Tradeoffs in Secure Accelerator Designs. 2016 Boston area ARChitecture Annual Workshop (BARC).

PDF

(2016). OGAPI Oblivious Graph Processing in Multicores. 2016 Boston area ARChitecture Annual Workshop (BARC).

PDF

(2016). Locality-aware data replication in the last-level cache for large scale multicores. The Journal of Supercomputing.

PDF DOI

(2016). LDAC: Locality-Aware Data Access Control for Large-Scale Multicore Cache Hierarchies. ACM Transactions on Architecture and Code Optimization (TACO) – Presentation at 2017 International Conference on High-Performance Embedded Architectures and Compilers, (HiPEAC).

PDF DOI

(2016). GPU concurrency choices in graph analytics. 2016 IEEE International Symposium on Workload Characterization, IISWC 2016, Providence, RI, USA, September 25-27, 2016.

PDF DOI

(2016). Foreword. 2016 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2016, Storrs, CT, USA, September 19-20, 2016.

PDF DOI

(2016). Efficient Error-Detection and Recovery Mechanisms for Reliability and Resiliency of Multicores. 29th International Conference on VLSI Design and 15th International Conference on Embedded Systems, VLSID 2016, Kolkata, India, January 4-8, 2016.

PDF DOI

(2016). A Case for Deploying Multicores in Cyberphysical Embedded Systems. 2016 Boston area ARChitecture Annual Workshop (BARC).

PDF

(2016). A Case for a Situationally Adaptive Many-core Execution Model for Cognitive Computing Workloads. ASPLOS 2016 International Workshop on Cognitive Architectures, (CogArch).

PDF

(2015). The Execution Migration Machine: Directoryless Shared-Memory Architecture. IEEE Computer.

PDF DOI

(2015). OSPREY: Implementation of Memory Consistency Models for Cache Coherence Protocols involving Invalidation-Free Data Access. 2015 International Conference on Parallel Architectures and Compilation, PACT 2015, San Francisco, CA, USA, October 18-21, 2015.

PDF DOI

(2015). Many-core Architecture Characterization of the Path-Planning Workload. ASPLOS 2015 International Workshop on Cognitive Architectures, (CogArch).

PDF

(2015). M-MAP: Multi-factor memory authentication for secure embedded processors. 33rd IEEE International Conference on Computer Design, ICCD 2015, New York City, NY, USA, October 18-21, 2015.

PDF DOI

(2015). M-MAP: Multi-Factor Memory Authentication for Secure Embedded Processors. IACR Cryptol. ePrint Arch..

PDF

(2015). Exploring the performance implications of memory safety primitives in many-core processors executing multi-threaded workloads. Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy, HASP@ISCA 2015, Portland, OR, USA, June 14, 2015.

PDF DOI

(2015). Efficient parallelization of path planning workload on single-chip shared-memory multicores. 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015, Waltham, MA, USA, September 15-17, 2015.

PDF DOI

(2015). Efficient parallel packet processing using a shared memory many-core processor with hardware support to accelerate communication. 10th IEEE International Conference on Networking, Architecture and Storage, NAS 2015, Boston, MA, USA, August 6-7, 2015.

PDF DOI

(2015). CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores. 2015 IEEE International Symposium on Workload Characterization, IISWC 2015, Atlanta, GA, USA, October 4-6, 2015. Best Paper Nominee.

PDF DOI

(2015). Accelerating Communication in Single-chip Shared Memory Many-core Processors. 2015 Boston area ARChitecture Annual Workshop (BARC).

PDF

(2015). A Cross-Layer Multicore Architecture to Tradeoff Program Accuracy and Resilience Overheads. IEEE Comput. Archit. Lett..

PDF DOI

(2014). Thread Migration Prediction for Distributed Shared Caches. IEEE Comput. Archit. Lett..

PDF DOI

(2014). Suppressing the Oblivious RAM timing channel while making information leakage and program efficiency trade-offs. 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014, Orlando, FL, USA, February 15-19, 2014.

PDF DOI

(2014). Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold Voltages. ISCA 2014 International Workshop on Near-threshold Computing, (WNTC).

PDF

(2014). Locality-aware data replication in the Last-Level Cache. 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014, Orlando, FL, USA, February 15-19, 2014.

PDF DOI

(2014). HaTCh: Hardware Trojan Catcher. IACR Cryptol. ePrint Arch..

PDF

(2014). EXECUTION MIGRATION. United States Patent No. 8904154.

PDF

(2014). EM2: A Scalable Shared Memory Architecture for Large-Scale Multicores. Multicore Technology: Architecture, Reconfiguration and Modeling.

DOI

(2013). Towards efficient dynamic data placement in NoC-based multicores. 2013 IEEE 31st International Conference on Computer Design, ICCD 2013, Asheville, NC, USA, October 6-9, 2013.

PDF DOI

(2013). Toward Holistic Soft-Error-Resilient Shared-Memory Multicores. IEEE Computer.

PDF DOI

(2013). The locality-aware adaptive cache coherence protocol. The 40th Annual International Symposium on Computer Architecture, ISCA'13, Tel-Aviv, Israel, June 23-27, 2013.

PDF DOI

(2013). MARTHA: architecture for control and emulation of power electronics and smart grid systems. Design, Automation and Test in Europe, DATE 13, Grenoble, France, March 18-22, 2013.

PDF DOI

(2013). A private level-1 cache architecture to exploit the latency and capacity tradeoffs in multicores operating at near-threshold voltages. 2013 IEEE 31st International Conference on Computer Design, ICCD 2013, Asheville, NC, USA, October 6-9, 2013.

PDF DOI

(2013). A framework to accelerate sequential programs on homogeneous multicores. 21st IEEE/IFIP International Conference on VLSI and System-on-Chip, VLSI-SoC 2013, Istanbul, Turkey, October 7-9, 2013.

PDF DOI

(2012). Low-Latency Mechanisms for Near-Threshold Operation of Private Caches in Shared Memory Multicores. 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2012, Workshops Proceedings, Vancouver, BC, Canada, December 1-5, 2012.

PDF DOI

(2012). Judicious Thread Migration When Accessing Distributed Shared Caches. HiPEAC Workshop on Computer Architecture and Operating System Co-design (CAOS).

PDF

(2012). HORNET: A Cycle-Level Multicore Simulator. IEEE Trans. on CAD of Integrated Circuits and Systems.

PDF DOI

(2012). A low-overhead dynamic optimization framework for multicores. International Conference on Parallel Architectures and Compilation Techniques, PACT ‘12, Minneapolis, MN, USA - September 19 - 23, 2012.

PDF DOI

(2012). A Case for Fine-Grain Adaptive Cache Coherence. MIT CSAIL Technical Report (MIT-CSAIL-TR-2012-012).

PDF

(2011). Time-Predictable Computer Architecture for Cyber-Physical Systems: Digital Emulation of Power Electronics Systems. Proceedings of the 32nd IEEE Real-Time Systems Symposium, RTSS 2011, Vienna, Austria, November 29 - December 2, 2011.

PDF DOI

(2011). System-level Optimizations for Memory Access in the Execution Migration Machine (EM2). HiPEAC Workshop on Computer Architecture and Operating System Co-design (CAOS).

PDF

(2011). Shared Memory via Execution Migration. Ideas and Perspectives Session at International Conference on Architectural Support for Programming Languages and Operating Systems, (ASPLOS).

PDF

(2011). Scalable, accurate multicore simulation in the 1000-core era. IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2011, 10-12 April, 2011, Austin, TX, USA.

PDF DOI

(2011). Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric Multicores. 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011, Galveston, TX, USA, October 10-14, 2011.

PDF DOI

(2011). Library Cache Coherence. MIT CSAIL Technical Report (MIT-CSAIL-TR-2011-027).

PDF

(2011). Hardware/Software Codesign Architecture for Online Testing in Chip Multiprocessors. IEEE Trans. Dependable Secur. Comput..

PDF DOI

(2011). DIRECTORYLESS SHARED MEMORY COHERENCE USING EXECUTION MIGRATION. Parallel and distributed computing and systems. Best Paper Award at PDCS 2011.

PDF DOI

(2011). Deadlock-free fine-grained thread migration. NOCS 2011, Fifth ACM/IEEE International Symposium on Networks-on-Chip, Pittsburgh, Pennsylvania, USA, May 1-4, 2011. Best Paper Award.

PDF DOI

(2011). DCC: A Dependable Cache Coherence Multicore Architecture. IEEE Computer Architecture Letters. Presented at Best Papers from CAL Session at HPCA 2012.

PDF DOI

(2011). Brief announcement: distributed shared memory based on computation migration. SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, USA, June 4-6, 2011 (Co-located with FCRC 2011).

PDF DOI

(2011). ARCc: A case for an architecturally redundant cache-coherence architecture for large multicores. IEEE 29th International Conference on Computer Design, ICCD 2011, Amherst, MA, USA, October 9-12, 2011.

PDF DOI

(2010). Shadow checker (SC): A low-cost hardware scheme for online detection of faults in small memory structures of a microprocessor. IEEE International Test Conference, ITC 2010, Austin, TX, USA, November 2-4, 2010.

PDF DOI

(2010). Scalable directoryless shared memory coherence using execution migration. MIT CSAIL Technical Report (MIT-CSAIL-TR-2010-053).

PDF

(2010). Multithreaded Simulation to Increase Performance Modeling Throughput on Large Compute Grids. ASPLOS Exascale Evaluation and Research Techniques Workshop (EXERT).

PDF

(2010). Instruction-Level Execution Migration. MIT CSAIL Technical Report (MIT-CSAIL-TR-2010-019).

PDF

(2010). EM2: A Scalable Shared-Memory Multicore Architecture. MIT CSAIL Technical Report (MIT-CSAIL-TR-2010-030).

PDF

(2010). DARSIM: A parallel cycle-level NoC Simulator. ISCA International Workshop on Modeling, Benchmarking and Simulation (MoBS).

PDF

(2010). A self-adaptive scheduler for asymmetric multi-cores. Proceedings of the 20th ACM Great Lakes Symposium on VLSI 2010, Providence, Rhode Island, USA, May 16-18 2010.

PDF DOI

(2010). A model to exploit power-performance efficiency in superscalar processors via structure resizing. Proceedings of the 20th ACM Great Lakes Symposium on VLSI 2010, Providence, Rhode Island, USA, May 16-18 2010.

PDF DOI

(2009). Run-Time Reconfiguration for Performance and Power Optimizations in Asymmetric Chip Multiprocessors. HiPEAC Workshop on Reconfigurable Computing (WRC).

(2009). Predictive Thermal Management for Chip Multiprocessors Using Co-designed Virtual Machines. High Performance Embedded Architectures and Compilers, Fourth International Conference, HiPEAC 2009, Paphos, Cyprus, January 25-28, 2009. Proceedings.

PDF DOI

(2009). Improving yield and reliability of chip multiprocessors. Design, Automation and Test in Europe, DATE 2009, Nice, France, April 20-24, 2009.

PDF DOI

(2009). Hardware/software co-design architecture for thermal management of chip multiprocessors. Design, Automation and Test in Europe, DATE 2009, Nice, France, April 20-24, 2009.

PDF DOI

(2009). A self-adaptive system architecture to address transistor aging. Design, Automation and Test in Europe, DATE 2009, Nice, France, April 20-24, 2009.

PDF DOI

(2008). Automatic Adjustment of System Performance to Mitigate Device Aging via a Co-designed Virtual Machine. MICRO-41 Workshop on Dependable Architectures, (WDA).

PDF

(2008). A framework for predictive dynamic temperature management of microprocessor systems. 2008 International Conference on Computer-Aided Design, ICCAD 2008, San Jose, CA, USA, November 10-13, 2008.

PDF DOI