Get in Touch

Course Outline

Performance Concepts and Metrics

  • Latency, throughput, power consumption, and resource utilization.
  • Distinguishing between system-level and model-level bottlenecks.
  • Profiling strategies for inference versus training.

Profiling on Huawei Ascend

  • Leveraging CANN Profiler and MindInsight.
  • Kernel and operator diagnostics.
  • Understanding offload patterns and memory mapping.

Profiling on Biren GPU

  • Utilizing Biren SDK for performance monitoring.
  • Analyzing kernel fusion, memory alignment, and execution queues.
  • Conducting power and temperature-aware profiling.

Profiling on Cambricon MLU

  • Using BANGPy and Neuware performance tools.
  • Gaining kernel-level visibility and interpreting logs.
  • Integrating the MLU profiler with deployment frameworks.

Graph and Model-Level Optimization

  • Strategies for graph pruning and quantization.
  • Operator fusion and computational graph restructuring.
  • Standardizing input sizes and tuning batch parameters.

Memory and Kernel Optimization

  • Optimizing memory layout and reuse strategies.
  • Managing buffers efficiently across different chipsets.
  • Applying platform-specific kernel-level tuning techniques.

Cross-Platform Best Practices

  • Achieving performance portability through abstraction strategies.
  • Developing shared tuning pipelines for multi-chip environments.
  • Case study: Tuning an object detection model across Ascend, Biren, and MLU.

Summary and Next Steps

Requirements

  • Experience with AI model training or deployment pipelines.
  • Understanding of GPU/MLU compute principles and model optimization techniques.
  • Basic familiarity with performance profiling tools and metrics.

Target Audience

  • Performance engineers.
  • Machine learning infrastructure teams.
  • AI system architects.
 21 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories