Getting data from X-Sources to Google’s Colab
Getting data from X-Source to Google’s Colab
We will explore various options of reading data to a Colab notebook
What is Google Colab ?
Google Colab is a free cloud based programming environment with the concept of notebooks like Jupyter. Recently, it has gain much popularity among developers (majorly data enthusiasts)
Firing a Colab Notebook
In today’s blog, we will see how we can access and load your personal data to Colab for some interesting research projects.
Let’s fill the X in the heading. We will see to how one can load data from Google Drive
You can find all the code in this Notebook
File System
# colab provides `files` helper for uploading data from local file system to google colab
from google.colab import files
uploaded = files.upload()
all_data = ''
# `uploaded` is dict that holds file names as keys and values as the content of that filefor data_file in uploaded.keys(): print 'Reading file {}'.format(data_file) all_data += uploaded.get(data_file) all_data += '\n' print 'Total length read so far is {}'.format(len(data))
Google Drive
# colab provides `drive` helper for uploading data from google drive to google colab
from google.colab import drive
# mounting drive# this will require authentication : Follow the steps as guideddrive.mount('/content/drive')
data_files = glob.glob("/content/drive/My Drive/Colab Notebooks/*.txt")
all_data = ''for data_file in data_files: print 'Reading file {}'.format(data_file) all_data += open(data_file, 'r').read() print 'Total length read so far is {}'.format(len(all_data)) all_data += '\n'
S3
import boto3import botocore
BUCKET_NAME = 'my-bucket' # replace with your bucket nameKEY = 'image_in_s3.jpg' # replace with your object key
s3 = boto3.resource('s3')
try: # we are trying to download a JPEG image from s3 with name `image_in_s3` to colab dir with name `image_in_colab` s3.Bucket(BUCKET_NAME).download_file(KEY, 'image_in_colab.jpg')except botocore.exceptions.ClientError as e: if e.response['Error']['Code'] == "404": print("The object does not exist.") else: raise
Read Handling large files in colabto read on some other methods for uploading and downloading your data files to Colab.
Why use Google Colab ?
It never hurts to use free stuff that has so much goodness packaged to it. Also it makes it breeze to code pair and review at same place. Last but not the least, the processing power (free K80 GPU) it provides is the one to die for.