Ok, the other problem.. it seems to me that you are doing a (5,784) dotted with a 784,1, for the first loop. Dot should broadcast on its own (if training data is (N,784,1), I think, though perhaps you might need tensordot.

But then second iteration, you try to multiply a 10,5 by the 784,1...

Instead you need to store the current result as a separate variable.

Maybe

def predict (self,a): layer_output=a for [w,b] in zip(self.weights,self.biases) layer_output=self.activation( np.dot(w,layer_output)+b) return layer_output

Which I think should give an out sized (N,10) if a is sized (N,784)

Or perhaps

def predict (self,a): output=np.array([]) for data in a: layer_output=data for [w,b] in zip(self.weights,self.biases) layer_output=self.activation( np.dot(w,layer_output)+b) output.append(layer_output)

in case broadcasting doesn't work(but it should, and will be loads faster than a loop)