Why login to get public data?
Brazilian government is making difficult to download data (requesting login or captchas or putting slow servers) and they say is a MUST to check this data before trading with farms. So why would you not let people download the data if they do not have one option on it?
Try Playwright to grab the data, do us a favor and throw it into the Internet Archive if you grab it.

https://news.ycombinator.com/item?id=39442740

Thanks for pointing it out, will try to use it when needed. I could put it on internet archive problem is it is updated in a daily basis so uploading to there we could more likely create a temporal analysis rather than use it for real. Other problem that some datasets are kind of big, between 1Gb and 300 Gb
To stop excessive use by bots. You don't want everyone just scripting and pulling data at insane rates. Also to validate you are in the country as it's paid for by the government you don't want the whole world leeching off it.
I understand but the problem is that we need bots or someone will have to do it manually, and believe me we deal with a lot of layers, daily.

There are other methods to limit bad behavior like having usage limits and issuing a 429 too many requests in case of not respecting it.

For other countries to not access data, they could issue a api key or create a geofence to block external IP´s.

There are developer friendly ways to do it all, but it seems they are moving to a pattern to have massive amount of real people downloading the data instead of analyzing it, what a waste!

  • ·
  • 2 weeks ago
  • ·
  • [ - ]