Have .csv file's URL
(Noob) I have a URL for an internet of things data log of my rainfall. If I open it in a browser, it displays like a 3-column spreadsheet.
I'd like to get the data into an array so I can process a summary into another array, which I would then plot.
The URL examples I've found so far don't seem to relate to reading data.
Can someone point me in a useful direction? Perhaps there's an example that does something similar?
import csv with open(filename, newline='') as in_file: for row in csv.reader(in_file): print(', '.join(row))
I still am not getting the URL connected to the file properly. In the following, I get the error urlopen not defined, though request.py illustrates it explicitly, and changing requests to request doesn't help: (I don't have a zip file to deal with, so can't follow the linked example directly):
import csv, io, requests url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1' with urlopen(url, data=None) as in_file: for row in csv.reader(in_file): print(', '.join(row))
Trying a different variation
import csv, io, requests, urllib.request url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1' in_file=urllib.request.urlopen(url, data=None) for row in csv.reader(in_file): print(', '.join(row))
I got the error:
line 9, in <module>
for row in csv.reader(in_file):
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
Always easier with requests...
import csv, requests url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv' filename = url.split('/')[-1] with open(filename, 'wb') as out_file: out_file.write(requests.get(url).content) # _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) # change 'rb' to 'r' with open(filename, 'rb') as in_file: for row in csv.reader(in_file): print(', '.join(row))
Or if you want to do it all in RAM...
#!/usr/bin/env python3 import csv, io, requests url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv' with io.StringIO(requests.get(url).text) as mem_file: for row in csv.reader(mem_file): print(', '.join(row))
@ccc I just learned that
csv.readeraccepts any iterator, and not just file-like objects, so you could make this slightly shorter:
import csv, requests url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv' for row in csv.reader(requests.get(url).text.splitlines()): print(', '.join(row))
Thank you all so very much!
I would never have found a solution without your help.
The following does what I wanted, almost. Instead of printing the rows, I need the data in an array. But maybe reader is an array of strings that I can parse.
More difficult to solve, the sparkfun website is very overloaded and often returns a 503 error and fails. It would be nice to have the software display a message, wait a little, then try again. But maybe the easiest fix is to just run my own server that isn't so busy.
# Download and display csv data from rain gauge from contextlib import closing import csv, io, requests, urllib.request, codecs from contextlib import closing url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1' with closing(requests.get(url, stream=True)) as r: reader = csv.reader(codecs.iterdecode(r.iter_lines(), 'utf-8')) for row in reader: print (row) ```
Here's an idea how you could deal with the server errors, and also a starting point for parsing and plotting your data.
import csv import requests import time import dateutil import matplotlib.pyplot as plt url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1' r = requests.get(url) retry_count = 0 while r.status_code != 200 and retry_count < 10: print('status code %i, retrying...' % r.status_code) retry_count += 1 time.sleep(2) r = requests.get(url) if r.status_code == 200: dates =  tips =  lines = r.text.splitlines()[1:] # Strip header line for row in csv.reader(lines): dates.append(dateutil.parser.parse(row)) tips.append(int(row)) plt.plot_date(dates, tips, fmt='-') plt.show() else: print('Failed to load data')
@DaveGadgeteer Perhaps the best approach would be to separate the data download from the data parsing as done in https://forum.omz-software.com/topic/4011/have-csv-file-s-url/6. That way, you do not have to hit the URL resource/server so often.
You can then read the local file and convert it into a list of wearher_readings ...
import collections, csv filename = 'YGa69ObX6WFj9mYa4EmW.csv' with open(filename, 'r') as in_file: data =  # a list of weather_readings weather_reading = None for row in csv.reader(in_file): if weather_reading: data.append(weather_reading(*row)) else: # create a custom datatype from the header record weather_reading = collections.namedtuple('weather_reading', row) print('\n'.join(str(x) for x in data)) # weather_reading(time='1952', tips='773', timestamp='2017-04-30T20:42:59.017Z')
another possibility is numpy.genfromtext and numpy.recfromcsv, both of which either guess field types and autoconvert, or else let you specify col types. If you are doing any sort of analysis on the data, you want it in a format that numpy can use,
import numpy filename = 'YGa69ObX6WFj9mYa4EmW.csv' a = numpy.recfromcsv(filename) print(a) print(a.dtype)
Works nicely but the third field is