Ok, the other problem.. it seems to me that you are doing a (5,784) dotted with a 784,1, for the first loop. Dot should broadcast on its own (if training data is (N,784,1), I think, though perhaps you might need tensordot.
But then second iteration, you try to multiply a 10,5 by the 784,1...
Instead you need to store the current result as a separate variable.
Maybe
def predict (self,a):
layer_output=a
for [w,b] in zip(self.weights,self.biases)
layer_output=self.activation(
np.dot(w,layer_output)+b)
return layer_output
Which I think should give an out sized (N,10) if a is sized (N,784)
Or perhaps
def predict (self,a):
output=np.array([])
for data in a:
layer_output=data
for [w,b] in zip(self.weights,self.biases)
layer_output=self.activation(
np.dot(w,layer_output)+b)
output.append(layer_output)
in case broadcasting doesn't work(but it should, and will be loads faster than a loop)