Why I'm building this? There are a lot of tools out there for discovering meta and security data relating to a website, IP or server. But currently, there isn't anything that does everything, all in one place and without a paywall/ user sign up.
It's still a WIP, and I'm working on a new version, with some more comprehensive checks, so any feedback would be much appreciated :)
Does this come down to trying to stuff a bunch of stuff for domains into a presentation and information gathering method for websites?
For cases where it can not be determined, it would be best to say "can not be determined" rather than "No", because the last thing anyone needs is some PHB giving people grief because, for example, the WAF in use doesn't expose itself to this detector.
I checked out another startup I know of (https://highlight.io) and it listed the same results.
Maybe I’m misinterpreting what this section means?
- Google safe browsing
- URL Haus (malware distribution)
- Phish Tank (phishing URLs)
- Cloudmersive (website virus check)
Happy hunting I guess? lol, put the links there for ya1: https://github.com/Lissy93/web-check/blob/master/api/threats... 2: - https://safebrowsing.google.com/ - https://urlhaus.abuse.ch/ - https://phishtank.org/ - https://cloudmersive.com/products
FYI, your account creation page is pushing a special offer that expired 6/2/2024.
We definitely offer evaluation plans & free tiers, if you want to give me a shout over at grant@embolt.app; I can help set you up with an account to try us and we’ll see if we can help!
Ended up cloning the project to see by myself what URL it uses... turns out that the Google API was returning a JSON document with instructions to enable the PageSpeed Insights API! I'd never used Google Cloud before, so I had been a bit clueless until that point :-)
My suggestion is that the "Show Error" button showed the actual output of the API calls, because otherwise this very useful JSON from Google was being lost in translation.
Now that I checked the code it's clear that there are actually 2 things to enable that are accessed with the API key:
* PageSpeed Insights API: https://console.cloud.google.com/apis/library/pagespeedonlin...
* Safe Browsing API: https://console.cloud.google.com/apis/api/safebrowsing.googl...
So I'd suggest adding this info to either or both of the README and the app itself.
Otherwise, a very very cool project! I've been checking several of my sites for the last hour.
Looks like a super promising project! Thanks for building and sharing.
I've been working on a project [1] that probably wants to become a live crawler like this, but it's currently batch based. I'm focused on RSS feeds and microformats [2]. Can you share any details on what kind of performance / operational costs you're seeing while you're on the HN front page? The fly.toml looks like $5/month could suffice?
Over how long time? Even if it's just over an hour that's just under 30 rps, over a day it's a little over 1 rps.
Great site btw
It shows my dnssec as not present even though https://dnssec-analyzer.verisignlabs.com/ which it links to shows all green for my test site.
The DNS records panel seems a bit broken, it shows my SPF record as the NS ("NS v=spf1 mx -all").
The Server Records panel has a "ports" entry, but that only shows the first open port (for me 22).
When showing Response Time its pretty critical to show where you requested it from. Since you're showing the "location" of the server you could even subtract/show what part of the response time is due to distance latency (or ping the server and use the RTT).
It'd be useful to show things like what protocol is used (http, h2, h3), what cipher was used, etc.
Global Ranking chart should perhaps be inverted? Currently it goes down the more popular the site becomes.
TLS Security Issues & TLS Cipher Suites just send undefined to the tls-observatory site (https://tls-observatory.services.mozilla.com/api/v1/results?...).
HSTS without subdomains shows as "No", there should probably be different levels for "none", "without subdomains", "without preload", "with preload" "in the preload list".
For my site it shows under "Site Features" a "root authority". Okay that's new to me, let's see what that means. The full explanation is: "Checks which core features are present on a site." That's like answering "water" when someone asks "what's water?"
The use cases section of the info is similarly useless and additionally hyperbolic in most instances, such as: "DNSSEC information provides insight into an organization's level of cybersecurity maturity and potential vulnerabilities". If DNSSEC for one domain can tell me about the overall security maturity of an organisation as well as reveal potential vulnerabilities, please enlighten me because that'd be very useful for redteaming assignments
The thing detects January 1st 2008 as the page's content type, which makes no sense (checked with curl, that's indeed incorrect)
Server location is undefined at the top of the page (first impression; the section with the map) but later in the server info section it guesses a random city in the right country
It reports page energy consumption in KWg. Kelvin×Watt×grams, is this a typo for kWh? One kWh is about as much energy as 50 smartphone batteries can hold, as if a page (as measured by its size in bytes) would ever use that amount of energy. You can download many 4k movies on one smartphone charge (also when considering the power consumption of routers), surely that's not the unit being used to judge html weight?
The raw json results, where I was hoping fields might have clearer (technical) labels than the page, remains blank when trying to open it
Overall, I'm not sure what the intended use of this site is. It presents random pieces of information with misleading contextualisation and no technical explanation, some of which show incorrect values and many of which don't work (failing to load or showing error values like undefined). Maybe tackle it in sections, rethinking what the actual goal is here and, once you've identified one, writing that goal into the "use cases" section and implementing it, finally writing in the "what is this" section what it is the site is checking for, then repeat for the next useful piece of information you can come up with, etc.
The energy consumption metric (KWg) should be more clearly defined with some context info, as it's not even remotely standardized, or even commonly used--it took some effort to track down what it's actually measuring. According to another site[1] dedicated to sustainability, "KWg" is "kilowatts consumed per gigabyte" (presumably per gigabyte transferred), so should probably be marked as "kWGB", if it's going to exist at all.
The data seems to be drawn from the Website Carbon Calculator API, which states that "If your hosting provider has not registered the IP address of your server with The Green Web Foundation, we will not be able to detect it."[2] I visited the Green Web Foundation's website[3], which appears to provide exact same services and data as the Website Carbon Calculator, which is an ironically wasteful endeavor--I'm making requests to three separate endpoints just to get an apparently arbitrary number back? I ran the test on my website, and it correctly identified my host, and strangely did not offer any kind of quantitative values and instead just gave a binary "Green" or "Not Green" determination and badge. It did at least provided some additional context, in the form of OVHCloud's Universal Registration Document[4] from FY2023, which includes a chapter on sustainability efforts, and while that was far more helpful than anything else this exercise had revealed up to this point, it notably did not provide any "kWGB" measurements, or any other site-specific energy consumption data that I could find which would facilitate calculating any sort of energy per-unit data, especially not that could then be attributable to and/or derived towards a single website that's being served from a virtual machine running on a dedicated baremetal server in one of their global data centers.
Tldr; I'm fairly certain this is just meaningless filler data from a service that's probably just a corporate green-washing badge backed by little more than the faint whiff of due diligence.
EDIT: Formatting
---
1. https://s2group.cs.vu.nl/2022-08-04-web-emissions/
2. https://www.websitecarbon.com/faq/
3. https://www.thegreenwebfoundation.org/green-web-check/
4. https://corporate.ovhcloud.com/sites/default/files/2023-11/o...
For example https://www.whatsmydns.net/#A/www.bispebjerghospital.dk shows that the address is only resolvable from some locations.
I contacted the hostmaster and they admitted they have blocking in the DNS server.
Would be nice to see this also on this site.
example URL "with" malware: Https://cnn.com example URL without malware: https://cnn.com
you're missing subdomains & certs, a very crucial part of investigations imo
I don't have anything to add. Nicely done.
Thanks!