A collision of block hashes would likely break most Bitcoin implementations in horrible and unexpected ways, but it’s simply not going to happen unless you harness all the energy in the observable universe, and then some, for the sole purpuse of finding the collision, or discover a cryptographic weakness in SHA-256. Yes, rising difficulty theoretically makes block hash collisions more likely, but not to a degree that would ever matter.
I was wondering why block ids aren’t defined as the hash of the whole block rather than their headers
The header includes the transaction merkle tree root, so it indirectly commits to the entire contents of the block anyway, and this way is more elegant and efficient:
- When a node hears about a new block, it first downloads and validates its header for proof of work. This prevents other nodes from cheaply spamming it with entire fake blocks that it would waste time validating, only blocks with valid proof of work are even downloaded.
- Similarly, during initial block download, nodes use a “headers-first” synchronization strategy, which allows them to know the block hash of every block in the most proof of work chain just by downloading and validating the headers. This prevents some types of DoS attacks, makes parallel block downloads easier, and probably other benefits.
- SPV wallets and other light clients can use the headers to validate transaction inclusion in the chain. (Their limitation is that they can only validate the proof of work, not other consensus rules, the assumption is that miners wouldn’t waste hashpower on mining invalid blocks.)

