Show HN: I made tool that let's you see everything about any website
Yes, it's open source: https://github.com/lissy93/web-check :)

Why I'm building this? There are a lot of tools out there for discovering meta and security data relating to a website, IP or server. But currently, there isn't anything that does everything, all in one place and without a paywall/ user sign up.

It's still a WIP, and I'm working on a new version, with some more comprehensive checks, so any feedback would be much appreciated :)

Checking two websites/domains I'm responsible for, this information is really confusing or just plain wrong. The "DNS Records" card for MX is not the IP addresses of the actual MX records (nor am I sure why it would be -- why wouldn't the MX records be shown here?). "DNS Server" is the addresses of the webservers, not the DNS servers for the domain from whois or from the SOA record. It can show certificate information, but not the cipher suites? Traceroute fails because traceroute isn't available/isn't in path (the error shown is "/bin/sh: line 1: traceroute: command not found"). Firewall seems to be looking specifically for a web-application-firewall, but "firewall" is a somewhat generic term that includes a number of different technologies. Email configuration is wrong, probably because a website is not the same as a domain -- I don't have SPF or DKIM records for the www subdomain, because that's not where we send email from. The "Redirects" card says it followed one redirect, but there is no redirect on the address I provided.

Does this come down to trying to stuff a bunch of stuff for domains into a presentation and information gathering method for websites?

For cases where it can not be determined, it would be best to say "can not be determined" rather than "No", because the last thing anyone needs is some PHB giving people grief because, for example, the WAF in use doesn't expose itself to this detector.

This service is scraping data from somewhere else, it reports us on Amazon and we migrated to gcp a year ago.
I’m a bit confused on the “Threats” section, entering my member management startup https://embolt.app shows malware detected with a timestamp dating back to 2018 (we launched this year).

I checked out another startup I know of (https://highlight.io) and it listed the same results.

Maybe I’m misinterpreting what this section means?

Looking at the code [1], your site failed one of four checks [2]:

  - Google safe browsing
  - URL Haus (malware distribution)
  - Phish Tank (phishing URLs)
  - Cloudmersive (website virus check)
Happy hunting I guess? lol, put the links there for ya

1: https://github.com/Lissy93/web-check/blob/master/api/threats... 2: - https://safebrowsing.google.com/ - https://urlhaus.abuse.ch/ - https://phishtank.org/ - https://cloudmersive.com/products

Off-topic: Embolt looks interesting. I am an I.T. consultant and sysadmin for a chiropractic trade association, and have been looking for something to help manage and grow their memberships. Do you offer any evaluation plans or free tiers (or are the transaction fees the only monetization involved)?

FYI, your account creation page is pushing a special offer that expired 6/2/2024.

Oops looks like we didn’t expire our offer banner correctly — thanks for the heads up.

We definitely offer evaluation plans & free tiers, if you want to give me a shout over at grant@embolt.app; I can help set you up with an account to try us and we’ll see if we can help!

  • j1elo
  • ·
  • 4 weeks ago
  • ·
  • [ - ]
For some reason the Quality check was always failing with an error 403, even though I had followed the link to create a Google API key and passed it as an env var to the Docker container.

Ended up cloning the project to see by myself what URL it uses... turns out that the Google API was returning a JSON document with instructions to enable the PageSpeed Insights API! I'd never used Google Cloud before, so I had been a bit clueless until that point :-)

My suggestion is that the "Show Error" button showed the actual output of the API calls, because otherwise this very useful JSON from Google was being lost in translation.

Now that I checked the code it's clear that there are actually 2 things to enable that are accessed with the API key:

* PageSpeed Insights API: https://console.cloud.google.com/apis/library/pagespeedonlin...

* Safe Browsing API: https://console.cloud.google.com/apis/api/safebrowsing.googl...

So I'd suggest adding this info to either or both of the README and the app itself.

Otherwise, a very very cool project! I've been checking several of my sites for the last hour.

>https://web-check.xyz/check/http%3A%2F%2F127.0.0.1

>City: undefined, undefined, undefined

Heh

that's my ip address!
The request is coming from inside the house!
The same is also displayed for https://social.immibis.com/, which is definitely in a very popular Hetzner data center near Helsinki.
Should display "City: Home (:"
The docker version[1] worked better for me to test out. The free website version does not have all the features (like Chromium) enabled which is why some of the report data is missing or incorrect.

Looks like a super promising project! Thanks for building and sharing.

[1] https://hub.docker.com/r/lissy93/web-check

From the first of 3 previous submissions to HN: https://news.ycombinator.com/item?id=36839603
What's the difference between the link in this post (https://v1.web-check.xyz/) and on your Github (https://web-check.xyz/)?
Beautiful! Thanks for open sourcing this!

I've been working on a project [1] that probably wants to become a live crawler like this, but it's currently batch based. I'm focused on RSS feeds and microformats [2]. Can you share any details on what kind of performance / operational costs you're seeing while you're on the HN front page? The fly.toml looks like $5/month could suffice?

[1] https://alexsci.com/rss-blogroll-network/

[2] https://microformats.org/wiki/Main_Page

I'm not OP; I received ~100 thousand requests being on the front page once. It was an AI app and I quickly got rate limited to GCP Vertex AI lol.
> 100 thousand requests

Over how long time? Even if it's just over an hour that's just under 30 rps, over a day it's a little over 1 rps.

A correction to the post's title: https://youryoure.com/?apostrophe Should have been lets. Those are two different words with different meanings!

Great site btw

Looks nice, some feedback though:

It shows my dnssec as not present even though https://dnssec-analyzer.verisignlabs.com/ which it links to shows all green for my test site.

The DNS records panel seems a bit broken, it shows my SPF record as the NS ("NS v=spf1 mx -all").

The Server Records panel has a "ports" entry, but that only shows the first open port (for me 22).

When showing Response Time its pretty critical to show where you requested it from. Since you're showing the "location" of the server you could even subtract/show what part of the response time is due to distance latency (or ping the server and use the RTT).

It'd be useful to show things like what protocol is used (http, h2, h3), what cipher was used, etc.

Global Ranking chart should perhaps be inverted? Currently it goes down the more popular the site becomes.

TLS Security Issues & TLS Cipher Suites just send undefined to the tls-observatory site (https://tls-observatory.services.mozilla.com/api/v1/results?...).

HSTS without subdomains shows as "No", there should probably be different levels for "none", "without subdomains", "without preload", "with preload" "in the preload list".

So broken that it’s probably just a tool to collect URLs
  • ·
  • 4 weeks ago
  • ·
  • [ - ]
I have been using this and I have got to say this is one of the best open source projects at least for me as I need to look up URLs reputation and this is highly helpful in how everything is organized as cards. One screen to get all the helpful information you need. I'm looking forward to the API version and if I could use this as a replacement for VT. I did notice one thing sometimes when you lookup a URL you don't get back any response and when you check network activity tab on a browser you see the requests are getting rejected
Every section has little (i) icons and all of them are useless.

For my site it shows under "Site Features" a "root authority". Okay that's new to me, let's see what that means. The full explanation is: "Checks which core features are present on a site." That's like answering "water" when someone asks "what's water?"

The use cases section of the info is similarly useless and additionally hyperbolic in most instances, such as: "DNSSEC information provides insight into an organization's level of cybersecurity maturity and potential vulnerabilities". If DNSSEC for one domain can tell me about the overall security maturity of an organisation as well as reveal potential vulnerabilities, please enlighten me because that'd be very useful for redteaming assignments

The thing detects January 1st 2008 as the page's content type, which makes no sense (checked with curl, that's indeed incorrect)

Server location is undefined at the top of the page (first impression; the section with the map) but later in the server info section it guesses a random city in the right country

It reports page energy consumption in KWg. Kelvin×Watt×grams, is this a typo for kWh? One kWh is about as much energy as 50 smartphone batteries can hold, as if a page (as measured by its size in bytes) would ever use that amount of energy. You can download many 4k movies on one smartphone charge (also when considering the power consumption of routers), surely that's not the unit being used to judge html weight?

The raw json results, where I was hoping fields might have clearer (technical) labels than the page, remains blank when trying to open it

Overall, I'm not sure what the intended use of this site is. It presents random pieces of information with misleading contextualisation and no technical explanation, some of which show incorrect values and many of which don't work (failing to load or showing error values like undefined). Maybe tackle it in sections, rethinking what the actual goal is here and, once you've identified one, writing that goal into the "use cases" section and implementing it, finally writing in the "what is this" section what it is the site is checking for, then repeat for the next useful piece of information you can come up with, etc.

Well this was an entertaining 15-minute rabbit hole.

The energy consumption metric (KWg) should be more clearly defined with some context info, as it's not even remotely standardized, or even commonly used--it took some effort to track down what it's actually measuring. According to another site[1] dedicated to sustainability, "KWg" is "kilowatts consumed per gigabyte" (presumably per gigabyte transferred), so should probably be marked as "kWGB", if it's going to exist at all.

The data seems to be drawn from the Website Carbon Calculator API, which states that "If your hosting provider has not registered the IP address of your server with The Green Web Foundation, we will not be able to detect it."[2] I visited the Green Web Foundation's website[3], which appears to provide exact same services and data as the Website Carbon Calculator, which is an ironically wasteful endeavor--I'm making requests to three separate endpoints just to get an apparently arbitrary number back? I ran the test on my website, and it correctly identified my host, and strangely did not offer any kind of quantitative values and instead just gave a binary "Green" or "Not Green" determination and badge. It did at least provided some additional context, in the form of OVHCloud's Universal Registration Document[4] from FY2023, which includes a chapter on sustainability efforts, and while that was far more helpful than anything else this exercise had revealed up to this point, it notably did not provide any "kWGB" measurements, or any other site-specific energy consumption data that I could find which would facilitate calculating any sort of energy per-unit data, especially not that could then be attributable to and/or derived towards a single website that's being served from a virtual machine running on a dedicated baremetal server in one of their global data centers.

Tldr; I'm fairly certain this is just meaningless filler data from a service that's probably just a corporate green-washing badge backed by little more than the faint whiff of due diligence.

EDIT: Formatting

---

1. https://s2group.cs.vu.nl/2022-08-04-web-emissions/

2. https://www.websitecarbon.com/faq/

3. https://www.thegreenwebfoundation.org/green-web-check/

4. https://corporate.ovhcloud.com/sites/default/files/2023-11/o...

I have a issue with the website background - on a high refresh rate display with 240Hz, the background animation is incredibly fast and its super distracting.
Seems like there may be some issue with the crawl rules. What is it looking for that would to the error "t.robots is not defined" ?
How do I see how a site is handled in DNS?

For example https://www.whatsmydns.net/#A/www.bispebjerghospital.dk shows that the address is only resolvable from some locations.

I contacted the hostmaster and they admitted they have blocking in the DNS server.

Would be nice to see this also on this site.

If the scheme is not lowercase it seems to erroneously detect malware and provides a zip file url for some malware which does not exist on the page. Seems like a bug !

example URL "with" malware: Https://cnn.com example URL without malware: https://cnn.com

> everything about any website

you're missing subdomains & certs, a very crucial part of investigations imo

Seems like the hostname section detected a different site entirely to the one I input (some site that shared the same IP long ago?), and the mail section failed to detect my (valid, according to gmail) DKIM records entirely...
The UI is slick and presents a lot of info in an easy to parse way, but something is going on with Sentry and FancyBackground.tsx that's causing my laptop fans to spin up while idling on the page.
  • butz
  • ·
  • 4 weeks ago
  • ·
  • [ - ]
Neat, bonus points for colorful log messages in console. One thing though: any ideas what is causing horizontal scrollbar to appear in Firefox? I observe this issue on several websites, but never figured out the issue.
Gotta use the inspector and find the too-wide element. That or a CSS rule for overflow is causing it.
It doesn't work very well. I put in my own web address, which is definitely behind cloudfront, and it said it's unprotected, as well as a bunch of other vulns it doesn't have.
  • g4zj
  • ·
  • 4 weeks ago
  • ·
  • [ - ]
The AAAA record listing seems to only display the A record value(s).
Would be nice to be able to compare results between dates.
Agreed. Also, that could be a paid feature
  • KomoD
  • ·
  • 4 weeks ago
  • ·
  • [ - ]
Maybe I'm misunderstanding but I think there's been a mistake with the "Bad URLs Count", it shows a date instead of what I'd expect (a number)
oooh let me look into that. You're right; it should be a number.
The tech stack check seems to fail every time. Would love to see with the tech stack details included. Nice and fast, otherwise!
Today I learned : https://securitytxt.org/
It says my domain "grepular.com" doesn't have dnssec. It does. It also says I don't use DKIM or DMARC. I do.
Amazing! Reminds me that I need to learn a bunch of stuff I know nothing about.
  • johng
  • ·
  • 4 weeks ago
  • ·
  • [ - ]
This is really neat, kudos!
  • 6510
  • ·
  • 4 weeks ago
  • ·
  • [ - ]
typing example.com should be fine, I tried www.example.com which also didn't work, it had to be https:/ /www.example.com (I didn't try https:/ /example.com )
Host is reporting some domain I've never heard of.
In the tech-stack it gives me "Chromium not found"...
  • breck
  • ·
  • 4 weeks ago
  • ·
  • [ - ]
Hey that was a pleasantly great experience.

I don't have anything to add. Nicely done.

Thanks!

"Energy Usage for Load" is specified in "KWg". What does that mean? Is it a typo for "kWh"?
  • ffhhj
  • ·
  • 4 weeks ago
  • ·
  • [ - ]
it's nuclear powered, g is for grams of uranium
I enjoyed the UI, cool aesthetic.
man this is beautiful and fast. good job!
Very cool tool!
Very cool
[dead]
Let’s add a function which lists up all sub domain.
That's usually not possible. If they are not listed in the cert SAN (often you'd just have a wildcard for subdomains there) you'd need to enumerate them all which is not feasible.
  • leobg
  • ·
  • 4 weeks ago
  • ·
  • [ - ]
+1. And reall well done. I like how you can scroll through to get a good overview without any one section being too long to break that high level overview flow. Excellently executed.