Working with S3 objects¶

This tutorial explains how you can use Amazon S3 storage for easy access to datasets
We'll load data from Amazon S3 directly in a pandas dataframe
For this tutorial we'll use a public dataset.

Use a public available dataset¶

See AWS Open Data Registry for more information on Public Datasets
What you'll need:

The bucket reference

publicBucket = "covid19-lake" # the bucket reference

Python - Boto3¶

Boto3 is AWS own sdk for programmatic access to S3
We'll first apply Boto3 to connect to S3.

import boto3
from botocore import UNSIGNED # You'll need this to connect as anonymous. You could also pass your access key and secret
from botocore.client import Config
import pandas as pd

We need to instantiate a boto3 client and pass it our credentials or indicate that we want to access as anonymous

s3_client = boto3.client('s3', config=Config(signature_version=UNSIGNED))

Let's verify the access control list for this bucket

acl = s3_client.get_bucket_acl(Bucket=publicBucket)
owner = acl["Owner"]
grants = acl["Grants"]
print("Bucket owned by ", owner)
print("Bucket grants:")
for grant in grants:
  grantee = grant["Grantee"]
  permission = grant["Permission"]
  print("Grantee=", grantee, ", Permission=", permission)

Bucket owned by  {'ID': '052611c7635e5a88f6e2b6e8b9ebeb8feabfec78a155b5499897dcaa90e731b5'}
Bucket grants:
Grantee= {'Type': 'Group', 'URI': 'http://acs.amazonaws.com/groups/s3/LogDelivery'} , Permission= WRITE
Grantee= {'Type': 'Group', 'URI': 'http://acs.amazonaws.com/groups/s3/LogDelivery'} , Permission= READ_ACP
Grantee= {'Type': 'Group', 'URI': 'http://acs.amazonaws.com/groups/global/AllUsers'} , Permission= READ
Grantee= {'Type': 'Group', 'URI': 'http://acs.amazonaws.com/groups/global/AllUsers'} , Permission= READ_ACP
Grantee= {'ID': '68862915a0b1ecf44a6ecb256b2eb4df3c4b169bbf5d73bf638bdd717ba1dcf0', 'Type': 'CanonicalUser'} , Permission= FULL_CONTROL
Grantee= {'ID': '052611c7635e5a88f6e2b6e8b9ebeb8feabfec78a155b5499897dcaa90e731b5', 'Type': 'CanonicalUser'} , Permission= READ
Grantee= {'ID': '052611c7635e5a88f6e2b6e8b9ebeb8feabfec78a155b5499897dcaa90e731b5', 'Type': 'CanonicalUser'} , Permission= READ_ACP

Helper function to list the objects in a bucket:
We'll use this function to browse through the bucket contents

def list_bucket_objects(**kwargs):
  response = s3_client.list_objects_v2(**kwargs)
  continuation_token = response.get("NextContinuationToken")
  for obj in response.get("Contents"):
    key = obj.get("Key")
    size = obj.get("Size")
    storageclass = obj.get("StorageClass")
    print("Object found with key=", key, ", size=", size, ", S3 storage class=", storageclass)
  return continuation_token

Boto3 - browse the contents of a Bucket¶

Check what's inside the bucket:

args = dict(Bucket=publicBucket, MaxKeys=10)
continuation = list_bucket_objects(**args)

Object found with key= alleninstitute/CORD19/comprehendmedical/ , size= 0 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/comprehendmedical/comprehend_medical.json , size= 14136397 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/json/metadata/part-00000-9e786a1f-46af-4351-be48-d1a0129c76be-c000.json , size= 78485736 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/json/metadata/part-00001-9e786a1f-46af-4351-be48-d1a0129c76be-c000.json , size= 78427247 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/json/metadata/part-00002-9e786a1f-46af-4351-be48-d1a0129c76be-c000.json , size= 78473799 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/json/metadata/part-00003-9e786a1f-46af-4351-be48-d1a0129c76be-c000.json , size= 36557651 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/ , size= 0 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/0015023cc06b5362d332b3baf348d11567ca2fbb.json , size= 72983 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/00340eea543336d54adda18236424de6a5e91c9d.json , size= 66399 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/004f0f8bb66cf446678dc13cf2701feec4f36d76.json , size= 12712 , S3 storage class= STANDARD

The amount of objects is huge, so let's browse 10 objects at a time
If you keep running next code block the objects will be shown like a page of 10 each

args["ContinuationToken"] = continuation
continuation = list_bucket_objects(**args)

Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/06d12dc5ac32d82387c65370d0a600e13059122d.json , size= 56034 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/07e833d0917cace550853f72923856d0fe1a7120.json , size= 58585 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/080660f20f078c10524f6186bca263327094acbb.json , size= 65686 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/08660499ee722a74043f8417faee3e1eeb9d0f5f.json , size= 237587 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/08a22278486e12768ce186677a6a89663d24586f.json , size= 55368 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/090b6c8b3df30bc248221869f673a2d970caa1b9.json , size= 40371 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/091a8e9a61e19e88caeb039f0e3888d111b20439.json , size= 80963 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/09b6706748f0c1ae0da436ac2dfac9052b84e4ea.json , size= 85339 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/09c9fcabc66a106e01ef42247cbd86b6d85bd67f.json , size= 127138 , S3 storage class= STANDARD
Object found with key= alleninstitute/CORD19/raw/2020_04_28/biorxiv_medrxiv/09ec8daa8e32168d92d05b86de1784c639685fb4.json , size= 156719 , S3 storage class= STANDARD

Of course, it makes no sense to browse through 10000 or 100K amounts of files. We can do this a bit more efficiently.

Let's add a prefix
Prefixes give us a way of prefiltering the objects in the bucket
For more information on this public dataset see covid19-lake datasets

args = dict(Bucket=publicBucket, MaxKeys=50, Prefix='static-datasets')
list_bucket_objects(**args)

Object found with key= static-datasets/csv/CountyPopulation/County_Population.csv , size= 146906 , S3 storage class= STANDARD
Object found with key= static-datasets/csv/countrycode/CountryCodeQS.csv , size= 8622 , S3 storage class= STANDARD
Object found with key= static-datasets/csv/state-abv/states_abv.csv , size= 665 , S3 storage class= STANDARD
Object found with key= static-datasets/json/CountyPopulation/part-00000-efc1e925-701b-4432-98be-7d36b9d3ec7e-c000.json , size= 359380 , S3 storage class= STANDARD
Object found with key= static-datasets/json/countrycode/part-00000-d80c811d-343b-4f60-ad30-624239b02074-c000.json , size= 31820 , S3 storage class= STANDARD
Object found with key= static-datasets/json/state-abv/part-00000-0faa317c-1e4c-43d2-ad87-fd5f5fc610f3-c000.json , size= 2125 , S3 storage class= STANDARD

args = dict(Bucket=publicBucket, MaxKeys=50, Prefix='rearc-covid-19')
list_bucket_objects(**args)

Object found with key= rearc-covid-19-nyt-data-in-usa/csv/us-counties/us-counties.csv , size= 11043235 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-nyt-data-in-usa/csv/us-states/us-states.csv , size= 209197 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-nyt-data-in-usa/json/us-counties/part-00000-b286de62-7a95-4f2f-a7ea-594a94cea9d2-c000.json , size= 28369100 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-nyt-data-in-usa/json/us-states/part-00000-54f699bd-38d1-490f-9e47-af853a3810df-c000.json , size= 540873 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-prediction-models/csv/county-predictions/county-predictions.csv , size= 770751 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-prediction-models/csv/severity-index/severity-index.csv , size= 794287 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-prediction-models/json/county-predictions/part-00000-1007846c-ba6d-4bc2-85df-0a938983f507-c000.json , size= 3129921 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-prediction-models/json/severity-index/part-00000-a47732d5-04ae-4d35-a080-4e662ec4ce6b-c000.json , size= 2592695 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-testing-data/csv/states_daily/states_daily.csv , size= 1358016 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-testing-data/csv/us-total-latest/us.csv , size= 471 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-testing-data/csv/us_daily/us_daily.csv , size= 30536 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-testing-data/json/states_daily/part-00000-6e997909-89c5-4f54-b8ce-d30a3f1f4546-c000.json , size= 2359835 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-testing-data/json/us-total-latest/part-00000-1fa3af7b-f025-4571-a7bc-c9ca2eefb46f-c000.json , size= 544 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-testing-data/json/us_daily/part-00000-37a1a406-3909-4b0c-b309-a314187a2e1b-c000.json , size= 63776 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-world-cases-deaths-testing/csv/covid-19-world-cases-deaths-testing.csv , size= 4794390 , S3 storage class= STANDARD
Object found with key= rearc-covid-19-world-cases-deaths-testing/json/part-00000-c43699b6-0612-4add-8d93-a33e0cd3f00c-c000.json , size= 8317970 , S3 storage class= STANDARD

Pandas through Boto3¶

Let's load one of these files in a pandas dataframe

obj =  s3_client.get_object(Bucket=publicBucket, Key="rearc-covid-19-testing-data/csv/us_daily/us_daily.csv")
df = pd.read_csv(obj.get("Body"))
display(df)

Ok, so we can see clearly that this file contains daily updated data for the amount of positive, negative, pending tests, hospitalized patients, recovered, deceased etc.

Python - s3fs¶

A bit simpler to just load data from S3 is the library s3fs
Pandas internally replaced boto3 with s3fs for loading of data from S3

# import libraries
import os
from s3fs.core import S3FileSystem

Just connect to S3 and treat it as a file system

s3 = S3FileSystem(anon=True)

list files

s3.ls(path=publicBucket, detail=False)

['covid19-lake/alleninstitute',
 'covid19-lake/archived',
 'covid19-lake/cfn',
 'covid19-lake/covid_knowledge_graph',
 'covid19-lake/covidcast',
 'covid19-lake/dashboard.html',
 'covid19-lake/databrowser.html',
 'covid19-lake/enigma-aggregation',
 'covid19-lake/enigma-jhu',
 'covid19-lake/enigma-jhu-timeseries',
 'covid19-lake/enigma-nytimes-data-in-usa',
 'covid19-lake/index.html',
 'covid19-lake/rearc-covid-19-nyt-data-in-usa',
 'covid19-lake/rearc-covid-19-prediction-models',
 'covid19-lake/rearc-covid-19-testing-data',
 'covid19-lake/rearc-covid-19-world-cases-deaths-testing',
 'covid19-lake/rearc-usa-hospital-beds',
 'covid19-lake/safegraph-open-census-data',
 'covid19-lake/static-datasets',
 'covid19-lake/tableau-covid-datahub',
 'covid19-lake/tableau-jhu']

path = '/covid19-lake/rearc-covid-19-testing-data'
s3.ls(path=path, detail=False)

['covid19-lake/rearc-covid-19-testing-data/csv',
 'covid19-lake/rearc-covid-19-testing-data/json']

disk usage

s3.disk_usage(path=path, total=False)

{'covid19-lake/rearc-covid-19-testing-data/csv/states_daily/states_daily.csv': 1358016,
 'covid19-lake/rearc-covid-19-testing-data/csv/us-total-latest/us.csv': 471,
 'covid19-lake/rearc-covid-19-testing-data/csv/us_daily/us_daily.csv': 30536,
 'covid19-lake/rearc-covid-19-testing-data/json/states_daily/part-00000-6e997909-89c5-4f54-b8ce-d30a3f1f4546-c000.json': 2359835,
 'covid19-lake/rearc-covid-19-testing-data/json/us-total-latest/part-00000-1fa3af7b-f025-4571-a7bc-c9ca2eefb46f-c000.json': 544,
 'covid19-lake/rearc-covid-19-testing-data/json/us_daily/part-00000-37a1a406-3909-4b0c-b309-a314187a2e1b-c000.json': 63776}

Read comma delimited file directly in pandas dataframe

file = 'covid19-lake/rearc-covid-19-testing-data/csv/us_daily/us_daily.csv'
df = pd.read_csv(s3.open(file, mode='rb'))
display(df)

Ok, same result. Looks good.

Optional - do some visualization to have the data 'speak' a bit more¶

Python - Altair¶

For more information on Altair see Altair

We'll use Altair to display the data

import altair as alt
# by default altair will only render a visualization if the number of records <=5000
# if you want to disable that behaviour uncomment next line
# alt.data_transformers.disable_max_rows()

Check first datatypes of dataframe

df.dtypes

date                          int64
states                        int64
positive                      int64
negative                      int64
pending                     float64
hospitalizedCurrently       float64
hospitalizedCumulative      float64
inIcuCurrently              float64
inIcuCumulative             float64
onVentilatorCurrently       float64
onVentilatorCumulative      float64
recovered                   float64
dateChecked                  object
death                       float64
hospitalized                float64
lastModified                 object
total                         int64
totalTestResults              int64
posNeg                        int64
deathIncrease                 int64
hospitalizedIncrease          int64
negativeIncrease              int64
positiveIncrease              int64
totalTestResultsIncrease      int64
hash                         object
dtype: object

We'll convert the int64 datatype of date to an actual datetime

df["timestamp"] = pd.to_datetime(df["date"], format="%Y%m%d")

set the index of the dataframe to a datetime index

df.set_index("timestamp",drop=False, inplace=True)
display(df)

So let's plot the data on a timeline
Let's look at the trend in positives/negatives, recovered, deceased and hospitalized patients for Covid-19 in the USA

source = df
base = alt.Chart(source).properties(width=1200, height=400).transform_fold(['recovered', 'death', 'hospitalized','positive'], as_ = ['counts', 'category'])
area = base.mark_area(opacity=0.5).encode(
    alt.X('timestamp:T'),
    alt.Y('category:Q', stack=None),
    alt.Color('counts:N')
    )
mark = base.mark_point(color='red', shape='circle').encode(x='timestamp:T', y='positive', tooltip=['timestamp', 'positive', 'negative', 'recovered', 'death','hospitalized'])
display(area+mark)

Let's also plot the increase numbers

source = df
base = alt.Chart(source).properties(width=1200, height=400).transform_fold(['deathIncrease', 'hospitalizedIncrease', 'positiveIncrease'], as_ = ['growth', 'category'])
area = base.mark_area(opacity=0.5).encode(
    alt.X('timestamp:T'),
    alt.Y('category:Q', stack=None),
    alt.Color('growth:N')
    )
mark = base.mark_point(color='red', shape='circle').encode(x='timestamp:T', y='positiveIncrease', tooltip=['timestamp', 'deathIncrease', 'hospitalizedIncrease', 'positiveIncrease'])
display(area+mark)

	date	states	positive	negative	pending	hospitalizedCurrently	hospitalizedCumulative	inIcuCurrently	inIcuCumulative	onVentilatorCurrently	...	lastModified	total	totalTestResults	posNeg	deathIncrease	hospitalizedIncrease	negativeIncrease	positiveIncrease	totalTestResultsIncrease	hash
0	20200628	56	2540983	28447030	2198.0	32117.0	240156.0	5230.0	10473.0	2077.0	...	2020-06-28T00:00:00Z	30990211	30988013	30988013	273	580	544208	42161	586369	dc9b104a6101a2b1d147dd004970493f3faef554
1	20200627	56	2498822	27902822	2186.0	32220.0	239576.0	5296.0	10415.0	2159.0	...	2020-06-27T00:00:00Z	30403830	30401644	30401644	506	1057	547406	43471	590877	c76401840e79b9f3870ca039962ca0289d948dcf
2	20200626	56	2455351	27355416	2201.0	31423.0	238519.0	5263.0	10334.0	2075.0	...	2020-06-26T00:00:00Z	29812968	29810767	29810767	619	1526	558574	44373	602947	d28d2902aab75c2b63f7584fd72a5e02f160fd0a
3	20200625	56	2410978	26796842	2133.0	31532.0	236993.0	5305.0	10257.0	2214.0	...	2020-06-25T00:00:00Z	29209953	29207820	29207820	2500	1257	598526	39061	637587	f29e38890a88c4b4d5770436f86bcd1c326ca7ac
4	20200624	56	2371917	26198316	2049.0	30826.0	235736.0	5279.0	10173.0	2248.0	...	2020-06-24T00:00:00Z	28572282	28570233	28570233	722	1310	473722	38706	512428	9fb40b6267ac764e6e112724ce5419555da235c0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
154	20200126	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2020-01-26T00:00:00Z	2	2	2	0	0	0	0	0	e1cf59ab48e1cf367c4a6798a508a23d9d36bd18
155	20200125	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2020-01-25T00:00:00Z	2	2	2	0	0	0	0	0	bef2a1d5f2a13491e0e0369bbd46c10cdd12973b
156	20200124	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2020-01-24T00:00:00Z	2	2	2	0	0	0	0	0	bfffe76fc0b7cf11efe8aecd3cc7b22598d77d61
157	20200123	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2020-01-23T00:00:00Z	2	2	2	0	0	0	0	0	cee36ebf3174bf1df0daa36e1e8088a157406fad
158	20200122	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2020-01-22T00:00:00Z	2	2	2	0	0	0	0	0	d538c99729d1fee626212d1878a100c1e1204a5f

	date	states	positive	negative	pending	hospitalizedCurrently	hospitalizedCumulative	inIcuCurrently	inIcuCumulative	onVentilatorCurrently	...	lastModified	total	totalTestResults	posNeg	deathIncrease	hospitalizedIncrease	negativeIncrease	positiveIncrease	totalTestResultsIncrease	hash
0	20200628	56	2540983	28447030	2198.0	32117.0	240156.0	5230.0	10473.0	2077.0	...	2020-06-28T00:00:00Z	30990211	30988013	30988013	273	580	544208	42161	586369	dc9b104a6101a2b1d147dd004970493f3faef554
1	20200627	56	2498822	27902822	2186.0	32220.0	239576.0	5296.0	10415.0	2159.0	...	2020-06-27T00:00:00Z	30403830	30401644	30401644	506	1057	547406	43471	590877	c76401840e79b9f3870ca039962ca0289d948dcf
2	20200626	56	2455351	27355416	2201.0	31423.0	238519.0	5263.0	10334.0	2075.0	...	2020-06-26T00:00:00Z	29812968	29810767	29810767	619	1526	558574	44373	602947	d28d2902aab75c2b63f7584fd72a5e02f160fd0a
3	20200625	56	2410978	26796842	2133.0	31532.0	236993.0	5305.0	10257.0	2214.0	...	2020-06-25T00:00:00Z	29209953	29207820	29207820	2500	1257	598526	39061	637587	f29e38890a88c4b4d5770436f86bcd1c326ca7ac
4	20200624	56	2371917	26198316	2049.0	30826.0	235736.0	5279.0	10173.0	2248.0	...	2020-06-24T00:00:00Z	28572282	28570233	28570233	722	1310	473722	38706	512428	9fb40b6267ac764e6e112724ce5419555da235c0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
154	20200126	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2020-01-26T00:00:00Z	2	2	2	0	0	0	0	0	e1cf59ab48e1cf367c4a6798a508a23d9d36bd18
155	20200125	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2020-01-25T00:00:00Z	2	2	2	0	0	0	0	0	bef2a1d5f2a13491e0e0369bbd46c10cdd12973b
156	20200124	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2020-01-24T00:00:00Z	2	2	2	0	0	0	0	0	bfffe76fc0b7cf11efe8aecd3cc7b22598d77d61
157	20200123	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2020-01-23T00:00:00Z	2	2	2	0	0	0	0	0	cee36ebf3174bf1df0daa36e1e8088a157406fad
158	20200122	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2020-01-22T00:00:00Z	2	2	2	0	0	0	0	0	d538c99729d1fee626212d1878a100c1e1204a5f

	date	states	positive	negative	pending	hospitalizedCurrently	hospitalizedCumulative	inIcuCurrently	inIcuCumulative	onVentilatorCurrently	...	total	totalTestResults	posNeg	deathIncrease	hospitalizedIncrease	negativeIncrease	positiveIncrease	totalTestResultsIncrease	hash	timestamp
timestamp
2020-06-28	20200628	56	2540983	28447030	2198.0	32117.0	240156.0	5230.0	10473.0	2077.0	...	30990211	30988013	30988013	273	580	544208	42161	586369	dc9b104a6101a2b1d147dd004970493f3faef554	2020-06-28
2020-06-27	20200627	56	2498822	27902822	2186.0	32220.0	239576.0	5296.0	10415.0	2159.0	...	30403830	30401644	30401644	506	1057	547406	43471	590877	c76401840e79b9f3870ca039962ca0289d948dcf	2020-06-27
2020-06-26	20200626	56	2455351	27355416	2201.0	31423.0	238519.0	5263.0	10334.0	2075.0	...	29812968	29810767	29810767	619	1526	558574	44373	602947	d28d2902aab75c2b63f7584fd72a5e02f160fd0a	2020-06-26
2020-06-25	20200625	56	2410978	26796842	2133.0	31532.0	236993.0	5305.0	10257.0	2214.0	...	29209953	29207820	29207820	2500	1257	598526	39061	637587	f29e38890a88c4b4d5770436f86bcd1c326ca7ac	2020-06-25
2020-06-24	20200624	56	2371917	26198316	2049.0	30826.0	235736.0	5279.0	10173.0	2248.0	...	28572282	28570233	28570233	722	1310	473722	38706	512428	9fb40b6267ac764e6e112724ce5419555da235c0	2020-06-24
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2020-01-26	20200126	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2	2	2	0	0	0	0	0	e1cf59ab48e1cf367c4a6798a508a23d9d36bd18	2020-01-26
2020-01-25	20200125	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2	2	2	0	0	0	0	0	bef2a1d5f2a13491e0e0369bbd46c10cdd12973b	2020-01-25
2020-01-24	20200124	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2	2	2	0	0	0	0	0	bfffe76fc0b7cf11efe8aecd3cc7b22598d77d61	2020-01-24
2020-01-23	20200123	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2	2	2	0	0	0	0	0	cee36ebf3174bf1df0daa36e1e8088a157406fad	2020-01-23
2020-01-22	20200122	1	2	0	NaN	NaN	NaN	NaN	NaN	NaN	...	2	2	2	0	0	0	0	0	d538c99729d1fee626212d1878a100c1e1204a5f	2020-01-22