When doing operations with Pytorch tensors that have different ranks, broadcasting is used.
Pytorch will scale up the lowest-rank tensor making copies of its value in the other dimensions.
From Deep Learning for Coders, Chapter 4:
- PyTorch doesn’t actually copy
mean3
1,010 times. It pretends it were a tensor of that shape, but doesn’t actually allocate any additional memory - It does the whole calculation in C (or, if you’re using a GPU, in CUDA, the equivalent of C on the GPU), tens of thousands of times faster than pure Python (up to millions of times faster on a GPU!).