Spaces:
Running
Running
title: README | |
emoji: π | |
colorFrom: yellow | |
colorTo: yellow | |
sdk: static | |
pinned: false | |
BigBanyanTree is an initiative to empower engineering colleges to set up their data engineering clusters and drive interest in data processing and analysis using tools such as Apache Spark. | |
As part of that initiative, we have open-sourced datasets processed from CommonCrawl data. | |
The datasets offer two subsets having the specified columns:</br> | |
`script_extraction`: ["ip", "host", "server", "script_src_attrs", "year"]</br> | |
`ipmaxmind`: ["ip", "host", "server", "postal_code", "latitude", "longitude", "accuracy_radius", "continent_code", "continent_name", "country_iso_code", "subdivision_code", "city_name", "metro_code", "time_zone", "year"] |