Cuda programming смотреть последние обновления за сегодня на .
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
Learn how to write, compile, and run a simple C program on your GPU using Microsoft Visual Studio with the Nsight plug-in. Find code used in the video at: 🤍 Learn more at the blog: 🤍
GPU programming using nVidia CUDA
In this video, we talk about how why GPU's are better suited for parallelized tasks. We go into how a GPU is better than a CPU at certain tasks. Finally, we setup the NVIDIA CUDA programming packages to use the CUDA API in Visual Studio. GPUs are a great platform to executed code that can take advantage of hyper parallelization. For example, in this video we show the difference between adding vectors on a CPU versus adding vectors on a GPU. By taking advantage of the CUDA parallelization framework, we can do mass addition in parallel. Join me on Discord!: 🤍 Support me on Patreon!: 🤍
In this tutorial, we will talk about CUDA and how it helps us accelerate the speed of our programs. Additionally, we will discuss the difference between processors (CPUs) and graphic cards (GPUs) and how come we can use both to process code. By the end of this video - we will install CUDA and perform a quick speed test comparing the speed of our GPU with the speed of our CPU. We will create 2 extremely large data structures with PyTorch and we will multiply one by the other to test the performance. Specifically, I'll be comparing Nvidia's GeForce RTX 3090 GPU with Intel's i9-12900K 12th-Gen Alder Lake Processor (with DDR5 memory). I'll be posting some more advanced benchmarks in the next few tutorials, as the code I'm demonstrating in this video is 100% beginner-friendly! ⏲️ Time Stamps ⏲️ * 00:00 - what is CUDA? 00:47 - how processors (CPU) operate? 01:42 - CPU multitasking 03:16 - how graphic cards (GPU) operate? 04:02 - how come GPUs can run code faster than CPUs? 04:59 - benefits of using CUDA 06:03 - verify our GPU is capable of CUDA 06:48 - install CUDA with Anaconda and PyTorch 09:22 - verify if CUDA installation was successful 10:32 - CPU vs GPU speed test with PyTorch 14:20 - freeze CPU with torch.cuda.synchronize() 15:51 - speed test results 17:55 - CUDA for systems with multiple GPUs 18:28 - next tutorials and thanks for watching! 🔗 Important Links 🔗 * ⭐ My Anaconda Tutorial for Beginners: 🤍 ⭐ My CUDA vs. TensorRT Tutorial for Beginners: 🤍 ⭐ CUDA Enabled GPUS: 🤍 ⭐ Complete Notebook Code: 🤍 💻 Install with VENV instead of Anaconda (LINUX) 💻 * ❗install venv: $ sudo apt-get install -y python3-venv 🥇create working environment: $ python3 -m venv my_env 🥈activate working environment: $ source my_env/bin/activate 🥉install PIP3 and PyTorch+CUDA: (my_env) $ sudo apt install python3-pip (my_env) $ pip3 install torch1.10.1+cu113 torchvision0.11.2+cu113 torchaudio0.10.1+cu113 -f 🤍 🏆more information about VENV: 🤍 🏆more information about installing Pytorch: 🤍 🙏SPECIAL THANK YOU 🙏 * Thank you so much to Robert from Nvidia for helping me with the speed test code! Thank you to SFX Buzz for the scratched record sound: 🤍 Thank you to Flat Icon for the beautiful icon graphics: 🤍
Part of the Nvidia HPC SDK Training, Jan 12-13, 2022. Slides and more details are available at 🤍
Introduction to NVIDIA's CUDA parallel architecture and programming model. Learn more by following 🤍gpucomputing on twitter.
GPU programming with CUDA
In this tutorial, I’ll show you everything you need to know about CUDA programming so that you could make use of GPU parallelization, thru simple modifications of your already existing code, running on a boring CPU. The following tutorial was recorded on NVIDIA’s Jetson Orin supercomputer. CUDA stands for Compute Unified Device Architecture, and is a parallel computing platform and application programming interface that enables software to use certain types of graphics processing units for general purpose processing, an approach called general-purpose computing on GPUs. First, I will start by writing a simple function that does a vector multiplication, which is going to run on a CPU. Then we get the same job done using CUDA parallelization on a GPU. Keep in mind that GPU’s have more cores than CPU and hence when it comes to parallel computing of data, GPUs perform exceptionally better than CPUs even though GPUs have lower clock speed and lack several core management features as compared to CPUs. An example reveals that running 64 million massive multiplications on a GPU takes about 0.64 seconds, as opposed to 31.4 seconds when running on a CPU. This translates to a x50 gain in terms of speed, thanks to the parallelization on such a huge number of cores. Amazing ! This means that running a complex program on CPU taking about a month, could be executed in 14 hrs. This could be also faster given more cores. Then, I’ll show you the gains in filling arrays on python on a CPU vs on a GPU. Another example reveals that the amount of time it took to fill the array on a CPU is about 2.58 seconds, as opposed to 0.39 seconds on a GPU, which is a gain of about 6.6x. The last fundamental section of this video is to show the gains in rendering images (or videos) on python. We will demonstrate why you see some film producers or movie makers rendering and editing their content on a GPU. GPU rendering delivers with a graphics card rather of a CPU, which may substantially speed up the rendering process because GPUs are primarily built for fast picture rendering. GPUs were developed in response to graphically intensive applications that taxed CPUs and slowed processing speed. I will use the Mandelbrot set to perform a comparison between CPU and GPU power. This example reveals that only 1.4 seconds of execution is needed on a GPU as opposed to 110 seconds on a CPU, which is a 78x gain. This simply means that instead of rendering a 4K resolution video over a week on a CPU, you could get the same video in 8K resolution rendered in 2 hours on a GPU, if you are using 32 threads. So imagine if you doubled the threads and blocks involved in GPU optimization. ⏲Outline⏲ 00:00 Introduction 00:33 Multiplication gains on GPUs vs CPUs 08:31 Filling an array on GPUs vs CPUs 11:55 Rendering gains on GPU vs CPU 12:35 What is a Mandelbrot set ? 13:39 Mandelbrot set rendering on CPU 17:01 Mandelbrot set rendering on GPU 20:54 Outro 📚Related Lectures Jetson Orin Supercomputer - 🤍 Quick Deploy: Object Detection via NGC on Vertex AI Workbench Google Cloud - 🤍 Voice Swap using NVIDIA's NeMo - 🤍 🔴 Subscribe for more videos on CUDA programming 👍 Smash that like button, in case you find this tutorial useful. 👁🗨 Speak up and comment, I am all ears. 💰 Donate to help the channel Patreon - 🤍 BTC wallet - 3KnwXkMZB4v5iMWjhf1c9B9LMTKeUQ5viP ETH wallet - 0x44F561fE3830321833dFC93FC1B29916005bC23f DOGE wallet - DEvDM7Pgxg6PaStTtueuzNSfpw556vXSEW API3 wallet - 0xe447602C3073b77550C65D2372386809ff19515b DOT wallet - 15tz1fgucf8t1hAdKpUEVy8oSR8QorAkTkDhojhACD3A4ECr ARPA wallet - 0xf54bEe325b3653Bd5931cEc13b23D58d1dee8Dfd QNT wallet - 0xDbfe00E5cddb72158069DFaDE8Efe2A4d737BBAC AAVE wallet - 0xD9Db74ac7feFA7c83479E585d999E356487667c1 AGLD wallet - 0xF203e39cB3EadDfaF3d11fba6dD8597B4B3972Be AERGO wallet - 0xd847D9a2EE4a25Ff7836eDCd77E5005cc2E76060 AST wallet - 0x296321FB0FE1A4dE9F33c5e4734a13fe437E55Cd DASH wallet - XtzYFYDPCNfGzJ1z3kG3eudCwdP9fj3fyE #cuda #cudaprogramming #gpu
Speaker: Mr. Oren Tropp (Sagivtech) "Prace Conference 2014", Partnership for Advanced Computing in Europe, Tel Aviv University, 13.2.14
Learn to use a CUDA GPU to dramatically speed up code in Python. 00:00 Start of Video 00:16 End of Moore's Law 01: 15 What is a TPU and ASIC 02:25 How a GPU works 03:05 Enabling GPU in Colab Notebook 04:16 Using Python Numba 05:40 Building Mandlebrots with and without GPU and Numba 07:49 CUDA Vectorize Functions 08:27 Copy Data to GPU Memory Tutorial: 🤍 Book: 🤍 If you enjoyed this video, here are additional resources to look at: Coursera + Duke Specialization: Building Cloud Computing Solutions at Scale Specialization: 🤍 O'Reilly Book: Practical MLOps: 🤍 O'Reilly Book: Python for DevOps: 🤍 Pragmatic AI: An Introduction to Cloud-based Machine Learning: 🤍 Pragmatic AI Labs Book: Python Command-Line Tools: 🤍 Pragmatic AI Labs Book: Cloud Computing for Data Analysis : 🤍 Pragmatic AI Book: Minimal Python: 🤍 Pragmatic AI Book: Testing in Python: 🤍 Subscribe to Pragmatic AI Labs YouTube Channel: 🤍 View content on noahgift.com: 🤍 View content on Pragmatic AI Labs Website: 🤍
In this video we look at writing a simple matrix multiplication kernel from scratch in CUDA! For code samples: 🤍 For live content: 🤍
If you can parallelize your code by harnessing the power of the GPU, I bow to you. GPU code is usually abstracted away by by the popular deep learning frameworks, but knowing how it works is really useful. CUDA is the most popular of the GPU frameworks so we're going to add two arrays together, then optimize that process using it. I love CUDA! Code for this video: 🤍 Alberto's Winning Code: 🤍 Hutauf's runner-up code: 🤍 Please Subscribe! And like. And comment. That's what keeps me going. Follow me: Twitter: 🤍 Facebook: 🤍 More learning resources: 🤍 🤍 🤍 🤍 🤍 🤍 🤍 🤍 🤍 Join us in the Wizards Slack channel: 🤍 No, Nvidia did not pay me to make this video lol. I just love CUDA. And please support me on Patreon: 🤍 Follow me: Twitter: 🤍 Facebook: 🤍 Instagram: 🤍 Signup for my newsletter for exciting updates in the field of AI: 🤍 Hit the Join button above to sign up to become a member of my channel for access to exclusive content!
In this CUDACast video, we'll see how to write and run your first CUDA Python program using the Numba Compiler from Continuum Analytics.
It’s 2019, and Moore’s Law is dead. CPU performance is plateauing, but GPUs provide a chance for continued hardware performance gains, if you can structure your programs to make good use of them. In this talk you will learn how to speed up your Python programs using Nvidia’s CUDA platform. EVENT: PyTexas2019 SPEAKER: William Horton PUBLICATION PERMISSIONS: Original video was published with the Creative Commons Attribution license (reuse allowed). ATTRIBUTION CREDITS: Original video source: 🤍
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
/Using the GPU can substantially speed up all kinds of numerical problems. Conventional wisdom dictates that for fast numerics you need to be a C/C wizz. It turns out that you can get quite far with only python. In this video, I explain how you can use cupy together with numba to perform calculations on NVIDIA GPU's. Production quality is not the best, but I hope you may find it useful. 00:00 Introduction: GPU programming in python, why? 06:52 Cupy intro 08:39 Cupy demonstration in Google colab 19:54 Cupy summary 20:21 Numba.cuda and kernels intro 25:07 Grids, blocks and threads 27:12 Matrix multiplication kernel 29:20 Tiled matrix multiplication kernel and shared memory 34:31 Numba.cuda demonstration in Google colab 44:25 Final remarks Edit 3/9/2021: the notebook is use for demonstration can be found here 🤍 Edit 9/9/2021: at 23:56 one of the grid elements should be labeled 1,3 instead of 1,2. Thanks to _ for pointing this out.
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
Basics of CUDA Programming | CUDA Terminologies | Host, Device, Kernel, Stream Multiprocessor, Stream Processor, Thread, Block, Grid, Warp, gpu vs cpu,what is cuda,what is cuda cores,what is cuda cores in graphics cards,what is cuda gpu,what is cuda programming,what is cuda and opencl,what is cuda toolkit,what is cuda nvidia,what is cuda cores in gpu Blog link for What is CUDA? / Basics of CUDA (Necessity of GPU, Host, Device, Kernel, Stream Multiprocessor, Stream Processor, Thread, Block, Grid, Warp, Memory architecture of GPU): 🤍 Find Amazon links to purchase Computer / Electronics appliances online: Tripod 🤍 Tripod Mobile Holder 🤍 USB 3.0 128GB OTG Pen Drive 🤍 USB 3.1 32GB OTG Pen Drive 🤍 SanDisk 128GB Memory Card with Adapter 🤍 HP 15.6-inch Laptop 🤍 ASUS 14.0-inch Thin and Light Laptop 🤍 HP 245 14.1 inch Laptop 🤍 Seagate 2 TB External Hard Drive 🤍 Seagate 1 TB External Hard Drive 🤍 Power Bank 20000mAH 🤍 Power Bank 10000mAH 🤍 Mi Band (Black) 🤍 Headphones with Mic 🤍 Earphones with Mic 🤍 Wireless Mouse 🤍 Wired Mouse 🤍 USB Keyboard 🤍 Wireless Keyboard and Mouse Combo 🤍 OTG Cable 🤍 HDMI to VGA Converter Adapter Cable 🤍 Wireless USB WiFi Receiver Adapter for PC, Desktop and Laptops 🤍 Wireless Presenter for Powerpoint Presentation 🤍 USB 3.0 Port Hub (for 4 USB) 🤍 Ethernet to USB 3.0 Converter Cable 🤍 Redmi Note 8 Mobile 🤍 Amazon Kindle 🤍 , 🤍 Also visit our website 🤍 #cuda #cudaprogramming #cudaterminologies #basicsofcuda #cpuvsgpu
🤍 — Discussion & Comments: 🤍 — Presentation Slides, PDFs, Source Code and other presenter materials are available at: 🤍 — Computer system architecture trends are constantly evolving to provide higher performance and computing power, to support the increasing demand for high-performance computing domains including AI, machine learning, image processing and automotive driving aids. The most recent being the move towards heterogeneity, where a system has one or more co-processors, often a GPU, working with it in parallel. These kinds of systems are everywhere, from desktop machines and high-performance computing supercomputers to mobile and embedded devices. Many-core GPU has shaped by the fast-growing video game industry that expects a tremendous massive number of floating-point calculations per video frame. The motive was to look for ways to maximize the chip area and power budget dedicated to floating-point calculations. The solution is to optimize for execution throughput of a massive number of threads. The design saves chip area and power by allowing pipelined memory channels and arithmetic operations to have long latency. The reduce area and power on memory and arithmetic allows designers to have more cores on a chip to increase the execution throughput. In CPPCON 2018, we presented "A Modern C Programming Model for CPUs using Khronos SYCL", which provided an introduction to GPU programming using SYCL. This talk will take this further. It will present the GPU architecture and the GPU programming model; covering the execution and memory model. It will describe parallel programming patterns and common parallel algorithms and how they map to the GPU programming model. Finally, through this lens, it will look at how to construct the control-flow of your programs and how to structure and move your data to achieve efficient utilisation of GPU architectures. This talk will use SYCL as a programming model for demonstrating the concepts being presented, however, the concepts can be applied to any other heterogeneous programming model such as OpenCL or CUDA. SYCL allows users to write standard C code which is then executed on a range of heterogeneous architectures including CPUs, GPUs, DSPs, FPGAs and other accelerators. On top of this SYCL also provides a high-level abstraction which allows users to describe their computations as a task graph with data dependencies, while the SYCL runtime performs data dependency analysis and scheduling. SYCL also supports a host device which will execute on the host CPU with the same execution and memory model guarantees as OpenCL for debugging purposes, and a fallback mechanism which allows an application to recover from failure. — Gordon Brown Codeplay Software Principal Software Engineer, SYCL & C Edinburgh, United Kingdom Gordon Brown is a principal software engineer at Codeplay Software specializing in heterogeneous programming models for C. He has been involved in the standardization of the Khronos standard SYCL and the development of Codeplay's implementation of the standard from its inception. More recently he has been involved in the efforts within SG1/SG14 to standardize execution and to bring heterogeneous computing to C. — Videos Filmed & Edited by Bash Films: 🤍 *-* Register Now For CppCon 2022: 🤍 *-*
See how to install CUDA Python followed by a tutorial on how to run a Python example on a GPU. Find code used in the video at: 🤍 Learn more at the blog: 🤍
What does a GPU do differently to a CPU and why don't we use them for everything? First of a series from Jem Davies, VP of Technology at ARM. Floating Point Numbers: 🤍 Why Computers Use Binary: 🤍 How Bitcoin Works: 🤍 Triangles & Pixels (Graphics Playlist): 🤍 🤍 🤍 This video was filmed and edited by Sean Riley. Computer Science at the University of Nottingham: 🤍 Computerphile is a sister project to Brady Haran's Numberphile. More at 🤍
Overview of each generation of CUDA hardware from Tesla through Ampere
Nvidia CUDA С Уроки. Начало. Введение. Параллельное программирование GPU. 🤍 Стать спонсором канала 🤍 Яндекс кошелек - 4100 1163 2706 8392 🤍 🤍 список видеороликов (🤍
Myself Shridhar Mankar a Engineer l YouTuber l Educational Blogger l Educator l Podcaster. My Aim- To Make Engineering Students Life EASY. Website - 🤍 5 Minutes Engineering English YouTube Channel - 🤍 Instagram - 🤍 A small donation would mean the world to me and will help me to make AWESOME videos for you. • UPI ID : 5minutesengineering🤍apl Playlists : • 5 Minutes Engineering Podcast : 🤍 • Aptitude : 🤍 • Machine Learning : 🤍 • Computer Graphics : 🤍 • C Language Tutorial for Beginners : 🤍 • R Tutorial for Beginners : 🤍 • Python Tutorial for Beginners : 🤍 • Embedded and Real Time Operating Systems (ERTOS) : 🤍 • Shridhar Live Talks : 🤍 • Welcome to 5 Minutes Engineering : 🤍 • Human Computer Interaction (HCI) : 🤍 • Computer Organization and Architecture : 🤍 • Deep Learning : 🤍 • Genetic Algorithm : 🤍 • Cloud Computing : 🤍 • Information and Cyber Security : 🤍 • Soft Computing and Optimization Algorithms : 🤍 • Compiler Design : 🤍 • Operating System : 🤍 • Hadoop : 🤍 • CUDA : 🤍 • Discrete Mathematics : 🤍 • Theory of Computation (TOC) : 🤍 • Data Analytics : 🤍 • Software Modeling and Design : 🤍 • Internet Of Things (IOT) : 🤍 • Database Management Systems (DBMS) : 🤍 • Computer Network (CN) : 🤍 • Software Engineering and Project Management : 🤍 • Design and Analysis of Algorithm : 🤍 • Data Mining and Warehouse : 🤍 • Mobile Communication : 🤍 • High Performance Computing : 🤍 • Artificial Intelligence and Robotics : 🤍
This is my second talk for San Diego State University's CS490 course where we will be discussing how to write our first CUDA Program. Viewers should be comfortable programming and have some experience with the C programming language.
CUDA program structure Host, Device. cudaMalloc, cudaMemcpy
💡 Giveaway steps: ✅ 1. Register to NVIDIA GTC via 🤍 ✅ 2. Wait for #GTC23 to start and join the Keynote livestream. ✅ 3. Attend GTC sessions (there’s really a lot of sessions going on - just pick one you’re interested in) 😄 ✅ 4. Screenshot me a proof that you attended the keynote and a session of your choice on my email: bazziapps🤍gmail.com ✅ 5. Subscribe to my YouTube channel here - 🤍 😅 ✉️ Email: bazziapps🤍gmail.com ⏱Outline: 00:00 Intro 01:45 4080 RTX Giveaway steps 02:45 Importing numba 03:00 Importing numpy 03:21 Importing exponential from math 03:30 The CUDA JIT decorator 🤍cuda.jit 04:21 Gaussian kernel filter for CUDA 06:02 CUDA to device 06:33 Convolution for CUDA 09:55 Python Imaging Library 10:36 grayscale imaging 11:08 CUDA to device 11:17 CUDA device array like 11:29 Computing the Gaussian kernel 12:31 The CUDA convolution 12:41 Plotting via matplotlib 12:57 Testing for different sigma values 13:06 Outro 📚Related Lectures 🤍 🤍 🤍 🤍 🤍 📚 Image processing: Image processing is a field of computer science and engineering that deals with analysis, manipulation, and interpretation of digital images. It involves use of algorithms and techniques to extract useful information from digital images or to enhance their visual quality for human perception. Image processing has applications in a wide range of fields, including medical imaging, remote sensing, security and surveillance, robotics, and entertainment. In medical imaging, for ex, image processing techniques can be used to detect and diagnose diseases, ex. cancer, by analyzing medical images such as X-rays, CT scans. 📚 CUDA things to know: CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) developed by NVIDIA that enables developers to harness the power of GPUs (graphics processing units) for general-purpose computing. In image processing, CUDA can be used to accelerate various operations such as filtering, segmentation, feature extraction, and registration. Here's how CUDA works in image processing: 1. GPU Architecture: Modern GPUs have hundreds or thousands of cores, each capable of performing simple arithmetic and logical operations. These cores are organized into streaming multiprocessors (SMs), which manage the execution of multiple threads in parallel. In contrast to CPUs (central processing units), which are optimized for serial processing of complex instructions, GPUs are optimized for parallel processing of many simple instructions. 2. CUDA Programming: To program GPUs for image processing tasks, developers can use the CUDA C/C programming language, which extends the standard C/C language with special keywords and functions for parallel programming. CUDA programs consist of a host program running on the CPU and a device program running on the GPU. The host program prepares data for processing and launches kernel functions on the GPU to perform computations in parallel. 3. Image Processing Tasks: CUDA can accelerate various image processing tasks, such as: ✅ Filtering: Filtering operations such as blurring, sharpening, and edge detection can be accelerated using CUDA by applying convolution kernels to the image data in parallel. ✅ Segmentation: Segmentation tasks such as thresholding and region growing can be accelerated using CUDA by applying segmentation algorithms to image data in parallel. ✅ Feature Extraction: Feature extraction tasks such as feature detection, description, and matching can be accelerated using CUDA by applying feature extraction algorithms to image data in parallel. 4. CUDA Libraries: NVIDIA provides several CUDA libraries that can be used for image processing, such as: ✅ cuFFT: a library for fast Fourier transforms on the GPU, which can be used for filtering and other operations. ✅ cuBLAS: a library for basic linear algebra operations on the GPU, which can be used for matrix operations in image processing. ✅ cuDNN: a library for deep neural networks on the GPU, which can be used for tasks such as image classification and object detection. ✅ OpenCV GPU: a GPU-accelerated version of the popular OpenCV library for computer vision, which includes functions for image processing and computer vision tasks. In summary, CUDA can significantly accelerate image processing tasks by leveraging the parallel processing power of GPUs. Developers can use CUDA programming techniques and libraries to accelerate various image processing tasks such as filtering, segmentation, feature extraction, and registration. 🙏🏻 Credits: Dan G. for directing Moe D. for editing Samer S. for brainstorming Bensound for audio This video is created under a creative common's license. #gtc23 #cuda #imageprocessing
In this video we look at a step-by-step performance optimization of matrix multiplication in CUDA! Spreadsheet: 🤍 For code samples: 🤍 For live content: 🤍
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
High-performance computing is now dominated by general-purpose graphics processing unit (GPGPU) oriented computations. How can we leverage our knowledge of C to program the GPU? NVIDIA's answer to general-purpose computing on the GPU is CUDA. CUDA programs are essentially C programs, but have some differences. CUDA comes as a Toolkit SDK containing a number of libraries that exploit the resources of the GPU: fast Fourier transforms, machine learning training and inference, etc. Thrust is a C template library for CUDA. In this month's meeting, Richard Thomson will present a brief introduction to CUDA with the Thrust library to program the GPU. Programming the GPU with CUDA is a huge topic covered by lots of libraries, tutorials, videos, and so-on, so we will only be able to present an introduction to the topic. You are encouraged to explore more on your own! Utah C Programmers meetup: 🤍 Utah C Programmers blog: 🤍 CUDA: 🤍 Thrust: 🤍
유료강의 : 🤍 GPGPU 컴퓨팅을 알아보아요
This video is part of an online course, Intro to Parallel Programming. Check out the course here: 🤍
This video tutorial has been taken from Learning CUDA 10 Programming. You can learn more and buy the full video course here 🤍 Find us on Facebook 🤍 Follow us on Twitter - 🤍
This talk is part of the Iowa State University Statistics Department lecture series on GPU computing. More information on this talk is available at 🤍 The website for all the talks is 🤍 In a nutshell, GPU computing makes use of graphics cards to parallelize algorithms, speeding up computations by several orders of magnitude.
High-performance computing is now dominated by general-purpose graphics processing unit (GPGPU) oriented computations. How can we leverage our knowledge of C to program the GPU? NVIDIA's answer to general-purpose computing on the GPU is CUDA. CUDA programs are essentially C programs, but have some differences. CUDA comes as a Toolkit SDK containing a number of libraries that exploit the resources of the GPU: fast Fourier transforms, machine learning training and inference, etc. Thrust is a C template library for CUDA. In this month's meeting, Richard Thomson will present a brief introduction to CUDA with the Thrust library to program the GPU. Programming the GPU with CUDA is a huge topic covered by lots of libraries, tutorials, videos, and so-on, so we will only be able to present an introduction to the topic. You are encouraged to explore more on your own! PUBLICATION PERMISSIONS: Original video was published with the Creative Commons Attribution license (reuse allowed). Link: 🤍
Julia has several packages for programming GPUs, each of which support various programming models. In this workshop, we will demonstrate the use of three major GPU programming packages: CUDA.jl for NVIDIA GPUs, AMDGPU.jl for AMD GPUs, and oneAPI.jl for Intel GPUs. We will explain the various approaches for programming GPUs with these packages, ranging from generic array operations that focus on ease-of-use, to hardware-specific kernels for when performance matters. Most of the workshop will be vendor-neutral, and the content will be available for all supported GPU back-ends. There will also be a part on vendor-specific tools and APIs. Attendees will be able to follow along, but are recommended to have access to a suitable GPU for doing so. Materials 🤍 🤍 Enjoyed the workshop? Consider sponsoring us on GitHub: 🤍 00:00 Welcome! 00:24 Welcome 01:20 Outline 02:44 JuliaGPU packages 04:08 JuliaGPU back-ends 05:34 GPU Architecture 07:25 Parallel programming models 08:55 Follow along and links to notebooks, JuliaHub 12:37 Start of tuturial with notebook 16:00 Array programming 28:20 Kernel programming 34:32 Parallel programming + questions 58:40 Profiling 1:01:50 Profiling: NVIDIA Nsight Systems: live example 1:11:00 Profiling: NVIDIA Nsight Compute: live example → optimize single kernel invocation 1:19:05 Common issues: unsupported array operations 1:21:50 Common issues: unsuppored kernel operations 1:27:40 Parallel programming issues 1:31:55 Tour of accompanying Github repo 1:32:40 Case Study I: Image processing using AMDGPU 1:57:00 Break 2:01:30 Case Study II: Fun with arrays, Machine Learning 2:10:47 Case Study III: Random number generators 2:22:10 Kernel abstractions 2:42:10 Example: Solving heat equation with GPU 2:56:30 Sneak peek of Enzyme (automatic differentiation framework) 2:59:18 Questions and Future plans Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: 🤍