References

Advanced Micro Devices. 2021. “Processor Programming Reference (PPR) for AMD Family 19h Model 01h (55898).” B1 Rev 0.50. https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/55898_B1_pub_0_50.zip.

———. 2022. “AMD64 Technology Platform Quality of Service Extensions.” Pub. 56375, rev 1.01. https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/other/56375_1_03_PUB.pdf.

Akinshin, Andrey. 2019. Pro .NET Benchmarking. 1st ed. Apress. https://doi.org/10.1007/978-1-4842-4941-3.

Alam, Mejbah, Justin Gottschlich, Nesime Tatbul, Javier S Turek, Tim Mattson, and Abdullah Muzahid. 2019. “A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions.” In Advances in Neural Information Processing Systems 32, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, 11627–39. Curran Associates, Inc. http://papers.nips.cc/paper/9337-a-zero-positive-learning-approach-for-diagnosing-software-performance-regressions.pdf.

AMD. 2023. AMD64 Architecture Programmer’s Manual. Advanced Micro Devices, Inc. https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf.

———. 2024. AMD uProf User Guide, Revision 4.2. Advanced Micro Devices, Inc. https://www.amd.com/content/dam/amd/en/documents/developer/version-4-2-documents/uprof/uprof-user-guide-v4.2.pdf.

Arm. 2022a. Arm Architecture Reference Manual Supplement Armv9. Arm Limited. https://documentation-service.arm.com/static/632dbdace68c6809a6b41710?token=.

———. 2022b. Arm Neoverse™ V1 Pmu Guide, Revision: R1p2. Arm Limited. https://developer.arm.com/documentation/PJDOC-1063724031-605393/2-0/?lang=en.

———. 2023a. Arm Neoverse V1 Core: Performance Analysis Methodology. Arm Limited. https://armkeil.blob.core.windows.net/developer/Files/pdf/white-paper/neoverse-v1-core-performance-analysis.pdf.

———. 2023b. Arm Statistical Profiling Extension: Performance Analysis Methodology. Arm Limited. https://developer.arm.com/documentation/109429/latest/.

Chen, Dehao, David Xinliang Li, and Tipp Moseley. 2016. “AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications.” In CGO 2016 Proceedings of the 2016 International Symposium on Code Generation and Optimization, 12–23. New York, NY, USA. https://ieeexplore.ieee.org/document/7559528.

Chen, Jiahao, and Jarrett Revels. 2016a. “Robust Benchmarking in Noisy Environments.” http://arxiv.org/abs/1608.04295.

———. 2016b. “Robust Benchmarking in Noisy Environments,” August. http://arxiv.org/abs/http://arxiv.org/abs/1608.04295v1.

Cooper, K. D., and L. Torczon. 2012. Engineering a Compiler. Morgan Kaufmann. Morgan Kaufmann. https://books.google.co.in/books?id=CGTOlAEACAAJ.

Curtsinger, Charlie, and Emery D. Berger. 2013. “STABILIZER: Statistically Sound Performance Evaluation.” In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, 219–28. ASPLOS ’13. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2451116.2451141.

———. 2018. “Coz: Finding Code That Counts with Causal Profiling.” Commun. ACM 61 (6): 91–99. https://doi.org/10.1145/3205911.

Dagenais, Michel. 2016. “Hardware-Assisted Instruction Profiling and Latency Detection.” The Journal of Engineering 2016 (10): 367–376(9). https://digital-library.theiet.org/content/journals/10.1049/joe.2016.0127.

Daly, David, William Brown, Henrik Ingo, Jim O’Leary, and David Bradford. 2020. “The Use of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration System.” In Proceedings of the Acm/Spec International Conference on Performance Engineering, 67–75. ICPE ’20. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3358960.3375791.

domo.com. 2017. Data Never Sleeps 5.0. Domo, Inc. https://www.domo.com/learn/data-never-sleeps-5?aid=ogsm072517_1&sf100871281=1.

Du, Jiaqing, Nipun Sehrawat, and Willy Zwaenepoel. 2010. “Performance Profiling in a Virtualized Environment.” In Proceedings of the 2nd Usenix Conference on Hot Topics in Cloud Computing, 2. HotCloud’10. USA: USENIX Association. https://www.usenix.org/legacy/event/hotcloud10/tech/full_papers/Du.pdf.

Fog, Agner. 2004. “Optimizing Software in C++: An Optimization Guide for Windows, Linux and Mac Platforms.” https://www.agner.org/optimize/optimizing_cpp.pdf.

———. 2012. “The Microarchitecture of Intel, Amd and via Cpus: An Optimization Guide for Assembly Programmers and Compiler Makers.” Copenhagen University College of Engineering. https://www.agner.org/optimize/microarchitecture.pdf.

Fog, Agner, and others. 2011. “Instruction Tables: Lists of Instruction Latencies, Throughputs and Micro-Operation Breakdowns for Intel, Amd and via Cpus.” Copenhagen University College of Engineering. https://www.agner.org/optimize/instruction_tables.pdf.

Gregg, Brendan. 2013. Systems Performance: Enterprise and the Cloud. 1st ed. USA: Prentice Hall Press.

Grosser, Tobias, Armin Größlinger, and C. Lengauer. 2012. “Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation.” Parallel Process. Lett. 22.

Hennessy, John L. 2018. “The Future of Computing.” Youtube. 2018. https://youtu.be/Azt8Nc-mtKM?t=329.

Hennessy, John L., and David A. Patterson. 2017. Computer Architecture, Sixth Edition: A Quantitative Approach. 6th ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Herdrich, A., and others. 2016. “Cache QoS: From Concept to Reality in the Intel® Xeon® processor E5-2600 V3 Product Family.” In HPCA, 657–68. https://ieeexplore.ieee.org/document/7446102.

Ingo, Henrik, and David Daly. 2020. “Automated System Performance Testing at Mongodb.” In Proceedings of the Workshop on Testing Database Systems. DBTest ’20. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3395032.3395323.

Intel. 2023a. CPU Metrics Reference. Intel® Corporation. https://software.intel.com/en-us/vtune-help-cpu-metrics-reference.

———. 2023b. Intel® 64 and Ia-32 Architectures Optimization Reference Manual. Intel® Corporation. https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-optimization-reference-manual.html.

Jimenez, D. A., and C. Lin. 2001. “Dynamic Branch Prediction with Perceptrons.” In Proceedings Hpca Seventh International Symposium on High-Performance Computer Architecture, 197–206. https://doi.org/10.1109/HPCA.2001.903263.

Jin, Guoliang, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu.

  1. “Understanding and Detecting Real-World Performance Bugs.” In Proceedings of the 33rd Acm Sigplan Conference on Programming Language Design and Implementation, 77–88. PLDI ’12. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2254064.2254075.

Kanev, Svilen, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. “Profiling a Warehouse-Scale Computer.” SIGARCH Comput. Archit. News 43 (3S): 158–69. https://doi.org/10.1145/2872887.2750392.

Kapoor, Rajiv. 2009. “Avoiding the Cost of Branch Misprediction.” https://software.intel.com/en-us/articles/avoiding-the-cost-of-branch-misprediction.

Karamatas, Chris. 2022. “AMD EPYC 7003 Series Microarchitecture Overview.” Pub. 57075, rev 3.0. https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/white-papers/overview-amd-epyc7003-series-processors-microarchitecture.pdf.

Khuong, Paul-Virak, and Pat Morin. 2015. “Array Layouts for Comparison-Based Searching.” https://arxiv.org/ftp/arxiv/papers/1509/1509.05053.pdf.

Leiserson, Charles E., Neil C. Thompson, Joel S. Emer, Bradley C. Kuszmaul, Butler W. Lampson, Daniel Sanchez, and Tao B. Schardl. 2020. “There’s Plenty of Room at the Top: What Will Drive Computer Performance After Moore’s Law?” Science 368 (6495). https://doi.org/10.1126/science.aam9744.

Lemire, Daniel. 2020. “Making Your Code Faster by Taming Branches.” https://www.infoq.com/articles/making-code-faster-taming-branches/.

Liu, Min, Xiaohui Sun, Maneesh Varshney, and Ya Xu. 2019. “Large-Scale Online Experimentation with Quantile Metrics.” https://arxiv.org/abs/1903.08762.

Lopes, Nuno P., and John Regehr. 2018. “Future Directions for Optimizing Compilers,” September. http://arxiv.org/abs/http://arxiv.org/abs/1809.02161v1.

Luo, Taowei, Xiaolin Wang, Jingyuan Hu, Yingwei Luo, and Zhenlin Wang.

  1. “Improving Tlb Performance by Increasing Hugepage Ratio.” In 2015 15th Ieee/Acm International Symposium on Cluster, Cloud and Grid Computing, 1139–42. https://doi.org/10.1109/CCGrid.2015.36.

Matteson, David S., and Nicholas A. James. 2014. “A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.” Journal of the American Statistical Association 109 (505): 334–45. https://doi.org/10.1080/01621459.2013.849605.

Mittal, Sparsh. 2016. “A Survey of Techniques for Cache Locking.” ACM Transactions on Design Automation of Electronic Systems 21 (May). https://doi.org/10.1145/2858792.

Muła, Wojciech, and Daniel Lemire. 2019. “Base64 Encoding and Decoding at Almost the Speed of a Memory Copy.” Software: Practice and Experience 50 (2): 89–97. https://doi.org/10.1002/spe.2777.

Mytkowicz, Todd, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2009a. “Producing Wrong Data Without Doing Anything Obviously Wrong!” In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 265–76. ASPLOS Xiv. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/1508244.1508275.

———. 2009b. “Producing Wrong Data Without Doing Anything Obviously Wrong!” SIGPLAN Not. 44 (3): 265–76. https://doi.org/10.1145/1508284.1508275.

Nair, Reena, and Tony Field. 2020. “GAPP: A Fast Profiler for Detecting Serialization Bottlenecks in Parallel Linux Applications.” Proceedings of the ACM/SPEC International Conference on Performance Engineering, April. https://doi.org/10.1145/3358960.3379136.

Navarro-Torres, Agustín, Jesús Alastruey-Benedé, Pablo Ibáñez, and Víctor Viñals-Yúfera. 2023. “BALANCER: Bandwidth Allocation and Cache Partitioning for Multicore Processors.” The Journal of Supercomputing 79 (9): 10252–76. https://doi.org/10.1007/s11227-023-05070-0.

Navarro-Torres, Agustín AND others. 2019. “Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SP.” PLOS ONE, 1–24. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0220135.

Newell, Andy, and Sergey Pupyrev. 2018. “Improved Basic Block Reordering.” CoRR abs/1809.04676. http://arxiv.org/abs/1809.04676.

Nowak, Andrzej, and Georgios Bitzes. 2014. “The Overhead of Profiling Using Pmu Hardware Counters.” In. https://zenodo.org/record/10800/files/TheOverheadOfProfilingUsingPMUhardwareCounters.pdf.

Ottoni, Guilherme, and Bertrand Maher. 2017. “Optimizing Function Placement for Large-Scale Data-Center Applications.” In Proceedings of the 2017 International Symposium on Code Generation and Optimization, 233–44. CGO ’17. Austin, USA: IEEE Press. https://ieeexplore.ieee.org/document/7863743.

Panchenko, Maksim, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2018. “BOLT: A Practical Binary Optimizer for Data Centers and Beyond.” CoRR abs/1807.06735. http://arxiv.org/abs/1807.06735.

Paoloni, Gabriele. 2010. How to Benchmark Code Execution Times on Intel® Ia-32 and Ia-64 Instruction Set Architectures. Intel® Corporation. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf.

Pharr, Matt, and William R. Mark. 2012. “Ispc: A Spmd Compiler for High-Performance Cpu Programming.” In 2012 Innovative Parallel Computing (Inpar), 1–13. https://doi.org/10.1109/InPar.2012.6339601.

Ren, Gang, Eric Tune, Tipp Moseley, Yixin Shi, Silvius Rus, and Robert Hundt. 2010. “Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers.” IEEE Micro, 65–79. http://www.computer.org/portal/web/csdl/doi/10.1109/MM.2010.68.

Sasongko, Muhammad Aditya, Milind Chabbi, Paul H J Kelly, and Didem Unat. 2023. “Precise Event Sampling on Amd Versus Intel: Quantitative and Qualitative Comparison.” IEEE Transactions on Parallel and Distributed Systems 34 (5): 1594–1608. https://doi.org/10.1109/TPDS.2023.3257105.

Seznec, André, and Pierre Michaud. 2006. “A Case for (Partially) Tagged Geometric History Length Branch Prediction.” J. Instr. Level Parallelism 8. https://inria.hal.science/hal-03408381/document.

statista.com. 2018. Volume of Data/Information Created Worldwide from 2010 to 2025. Statista, Inc. https://www.statista.com/statistics/871513/worldwide-data-created/.

Suresh Srinivas, et al. 2019. “Runtime Performance Optimization Blueprint: Intel® Architecture Optimization with Large Code Pages.” Intel® Corporation. https://www.intel.com/content/www/us/en/develop/articles/runtime-performance-optimization-blueprint-intel-architecture-optimization-with-large-code.html.

Wang, Xiaodong, and others. 2017. “SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support.” In HPCA, 121–32. https://ieeexplore.ieee.org/document/7920819.

Yasin, Ahmad. 2014. “A Top-down Method for Performance Analysis and Counters Architecture.” In, 35–44. https://doi.org/10.1109/ISPASS.2014.6844459.

results matching ""

    No results matching ""