apache spark - not disk or network bound but CPU bound
Laeeth Isharc via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri May 1 04:15:00 PDT 2015
http://radar.oreilly.com/2015/04/investigating-sparks-performance.html
"For many who use and deploy Apache Spark, knowing how to find
critical bottlenecks is extremely important. In a recent O’Reilly
webcast, Making Sense of Spark Performance, Spark committer and
PMC member Kay Ousterhout gave a brief overview of how Spark
works, and dove into how she measured performance bottlenecks
using new metrics, including block-time analysis. Ousterhout
walked through high-level takeaways from her in-depth analysis of
several workloads, and offered a live demo of a new performance
analysis tool and explained how you can use it to improve your
Spark performance.
Her research uncovered surprising insights into Spark’s
performance on two benchmarks (TPC-DS and the Big Data
Benchmark), and one production workload. As part of our overall
series of webcasts on big data, data science, and engineering,
this webcast debunked commonly held ideas surrounding network
performance, showing that CPU — not I/O — is often a critical
bottleneck, and demonstrated how to identify and fix stragglers."
More information about the Digitalmars-d-learn
mailing list