Have .csv file's URL

DaveGadgeteer

(Noob) I have a URL for an internet of things data log of my rainfall. If I open it in a browser, it displays like a 3-column spreadsheet.

I'd like to get the data into an array so I can process a summary into another array, which I would then plot.

The URL examples I've found so far don't seem to relate to reading data.

Can someone point me in a useful direction? Perhaps there's an example that does something similar?

Dave

ccc

import csv
with open(filename, newline='') as in_file:
    for row in csv.reader(in_file):
        print(', '.join(row))

Like in https://github.com/cclauss/Ten-lines-or-less/blob/master/world_bank_data.py

DaveGadgeteer

Thanks!
I still am not getting the URL connected to the file properly. In the following, I get the error urlopen not defined, though request.py illustrates it explicitly, and changing requests to request doesn't help: (I don't have a zip file to deal with, so can't follow the linked example directly):

import csv, io, requests

url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1'


with urlopen(url, data=None) as in_file:
    for row in csv.reader(in_file):
        print(', '.join(row))

DaveGadgeteer

Trying a different variation

import csv, io, requests, urllib.request

url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1'

in_file=urllib.request.urlopen(url, data=None)
for row in csv.reader(in_file):
        print(', '.join(row))

I got the error:
line 9, in <module>
for row in csv.reader(in_file):
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

JonB

http://stackoverflow.com/questions/18897029/read-csv-file-from-url-into-python-3-x-csv-error-iterator-should-return-str

ccc

Always easier with requests...

import csv, requests

url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv'
filename = url.split('/')[-1]
with open(filename, 'wb') as out_file:
    out_file.write(requests.get(url).content)

# _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
# change 'rb' to 'r'
with open(filename, 'rb') as in_file:
    for row in csv.reader(in_file):
        print(', '.join(row))

ccc

Or if you want to do it all in RAM...

#!/usr/bin/env python3

import csv, io, requests

url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv'
with io.StringIO(requests.get(url).text) as mem_file:
    for row in csv.reader(mem_file):
        print(', '.join(row))

omz

@ccc I just learned that csv.reader accepts any iterator, and not just file-like objects, so you could make this slightly shorter:

import csv, requests

url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv'
for row in csv.reader(requests.get(url).text.splitlines()):
    print(', '.join(row))

DaveGadgeteer

Thank you all so very much!
I would never have found a solution without your help.
The following does what I wanted, almost. Instead of printing the rows, I need the data in an array. But maybe reader is an array of strings that I can parse.

More difficult to solve, the sparkfun website is very overloaded and often returns a 503 error and fails. It would be nice to have the software display a message, wait a little, then try again. But maybe the easiest fix is to just run my own server that isn't so busy.

# Download and display csv data from rain gauge
from contextlib import closing

import csv, io, requests, urllib.request, codecs
from contextlib import closing
url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1'



with closing(requests.get(url, stream=True)) as r:
    reader = csv.reader(codecs.iterdecode(r.iter_lines(), 'utf-8'))
    for row in reader:
        print (row)  ```

omz

Here's an idea how you could deal with the server errors, and also a starting point for parsing and plotting your data.

import csv
import requests
import time
import dateutil
import matplotlib.pyplot as plt

url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1'

r = requests.get(url)
retry_count = 0
while r.status_code != 200 and retry_count < 10:
	print('status code %i, retrying...' % r.status_code)
	retry_count += 1
	time.sleep(2)
	r = requests.get(url)

if r.status_code == 200:
	dates = []
	tips = []
	lines = r.text.splitlines()[1:] # Strip header line
	for row in csv.reader(lines):
		dates.append(dateutil.parser.parse(row[2]))
		tips.append(int(row[1]))

	plt.plot_date(dates, tips, fmt='-')
	plt.show()
else:
	print('Failed to load data')

ccc

@DaveGadgeteer Perhaps the best approach would be to separate the data download from the data parsing as done in https://forum.omz-software.com/topic/4011/have-csv-file-s-url/6. That way, you do not have to hit the URL resource/server so often.

You can then read the local file and convert it into a list of wearher_readings ...

import collections, csv

filename = 'YGa69ObX6WFj9mYa4EmW.csv'

with open(filename, 'r') as in_file:
    data = []  # a list of weather_readings
    weather_reading = None
    for row in csv.reader(in_file):
        if weather_reading:
            data.append(weather_reading(*row))
        else:  # create a custom datatype from the header record
            weather_reading = collections.namedtuple('weather_reading', row)

print('\n'.join(str(x) for x in data))
# weather_reading(time='1952', tips='773', timestamp='2017-04-30T20:42:59.017Z')

JonB

another possibility is numpy.genfromtext and numpy.recfromcsv, both of which either guess field types and autoconvert, or else let you specify col types. If you are doing any sort of analysis on the data, you want it in a format that numpy can use,

ccc

import numpy
filename = 'YGa69ObX6WFj9mYa4EmW.csv'
a = numpy.recfromcsv(filename)
print(a[0])
print(a.dtype)

Works nicely but the third field is bytes instead of datetime64[ms].