Complete domain name research framework. Thinking beyond OSINT tools.
Learn to anonymously footprint domains and find out who hides behind the website. Easily dig information, you haven't thought was accessible.
Learn how to uncover detailed information about domains and their owners while staying anonymous. This tutorial is ideal for anyone conducting web investigations and researching online entities. By the end of the tutorial, you will be able to gather domain information without revealing your identity.
Before we move into practical techniques, you have to understand the types of information gathering. There are three: passive, semi-passive, and active. The choice between using one method over another depends on the specific scenario and the type of data you are interested in gathering.
Reconnaissance types
Passive reconnaissance
Passive reconnaissance involves gathering information without directly interacting with the target’s environment. It is the preferred method for collecting OSINT intelligence, as it relies on publicly available resources. The goal of passive collection is to obtain information about the target while avoiding any direct contact. Passive information gathering techniques in this article are marked with a ninja emoji 🥷.
Semi-passive reconnaissance
Semi-passive collection involves sending limited traffic to the target to gather information while avoiding detection. This method mimics typical Internet traffic to minimise the risk of triggering alerts or drawing attention. It provides a lighter investigation of the target's resources compared to more intrusive methods.
Active reconnaissance
Active collection involves direct interaction with the target system to gather detailed technical data, such as open ports and vulnerabilities. This method can alert the target as it generates suspicious or malicious traffic. Active collection includes techniques like scanning and social engineering, which are likely to be detected by the target’s security systems.
Registrar records 🥷
When domain name is registered, the registrar has to provide personal details. These details can be queried using domain ownership registrars. To tie domain name to a specific person or business use services such as ICANN Lookup or Who.is.
My preferred way of doing it is WHOIS Command-Line Interface (CLI) tool, as it provides comprehensive and most up-to-date information:
whois example.com
WHOIS CLI Command | WHOIS Website |
---|---|
✅ Pros: Provides direct access to WHOIS databases. Allows for automated queries and integration into scripts or tools. Can retrieve detailed information such as registrar, registration dates, nameservers, and contact details. | ✅ Pros: User-friendly interface, accessible via web browser. Consolidates WHOIS data from multiple sources, potentially providing a broader view. Offers additional features such as domain availability checks, historical WHOIS records, and domain tracking. |
❌ Cons: Requires familiarity with command-line interfaces. May vary in functionality and output depending on the implementation and the WHOIS server queried. | ❌ Cons: May not always have the most up-to-date information. Limited to the features provided by the website. |
However, registrar records might be false or not available. Domain owners often hide registration information from the public, which complicates the investigation. To check historical registrar records use whoxy, whoisxmlapi, and osint.sh. These services might show previously archived records if the current ones are redacted.
Once you established who owns the website you can use Reverse Whois to query domains with the same organisation name or email address on their Whois record. For example, if you are investigating a company named “Alice Doe LLC” you can see all the other domains registered under “Alice Doe LLC”. One of my favorite reverse search services is viewdns.info as it has an extensive toolkit.
Registrar records can also be hidden with Cloudflare. It will make Cloudflare info appear in the WHOIS record, hiding the real identity of the owner. Such websites can be de-anonymised using a combination of OSINT techniques, which Nixintel neatly described in one of his disinformation investigations.
Tech Stack 🥷
Knowing what types of technologies are supporting the domain will help you better investigate the target and to map out the attack surface. You can perform the lookup manually by using "Inspect" tool in browser. Usually HTTP headers and various HTML elements will give away the tech stack behind the website.
Website code inspection above reveals a number of technologies, including Google Analytics, Wordpress CMS, and Yoast SEO plugin.
There is a faster way, than searching manually. Automated lookups are done using tools that crawl websites and determine tech stack in seconds. Here is a list of tools:
- BuiltWith – is the most popular tool for discovering technology providers. Apart from a detailed technology profile, it also displays basic information on the linked entities, average spending on technology, and performance indicators.
- Tiny Scan – free online service that provides a comprehensive technology profile. Additionally, it displays DNS Records, SSL certificates, HTTP Headers, Cookies, and more. It also offers export to PDF, which is useful for generating reports.
- Wappalyzer – free online service with an option to install a browser plugin, that conveniently displays the tech stack of every website you visit. However, in my experience, it doesn't detect all technologies and misses a lot of things.
- Genelify – provides a decent technology lookup, without any information on the company or connected entities. Good for simple lookups, without diving deeper into who is connected to the website.
- Whatruns – a chrome plugin that provides technology lookups and has a useful feature of notifying about website tech stack changes.
Knowing the tech stack, you can access vulnerability databases and find the vulnerability associated with each technology. For example, if the target is using WordPress, you can go to CVE Details and search for relevant vulnerabilities.
Google Dorks 🥷
Google Dorking (also known as Google Hacking) is the use of advanced search queries that help to narrow down results and find specific information. Smart use of various Google operators allows to uncover sensitive information from the domain.
site:example.com "classification: sensitive" "classified" filetype:pdf filetype:doc
- this dork is designed to find exposed documents, potentially revealing sensitive information.site:example.com filetype:config inurl:web.config inurl:ftp
- is used to locate exposed web.config files within FTP directories on the domain. These configuration files may contain sensitive information, such as server settings and credentials.site:example.com intitle:"index of" inurl:ftp
- is used to locate publicly accessible FTP directories on the example.com domain that display their contents via directory listings. This can help identify exposed files and folders that may not be intended for public access.intitle:index.of "id_rsa" -"id_rsa.pub"
- is used to find publicly accessible directories that contain private SSH keys (id_rsa) while excluding public keys (id_rsa.pub).
There are a lot of precomposed lists (like, Google Hacking Database) or tools that create Google Dorks. However, it is better to make custom queries to target vulnerabilities and find sensitive information.
Automated tools
- Dorky – is a tool to automate compilation of advanced Google queries during pentesting or bug bounty hunting.
- DorkGPT – ChatGPT powered tool developed by PredictaLab for creating Google Dorks.
- Dork Search – Automated Google Dorking with ChatGPT AI, speeding up for your searching, for free.
- Pentest tools – Help to compile advanced search operators, using pre-built templates to find juicy information about target websites.
- DorkStorm – helps creating Google Dorks to easily search and discover hidden information on the Internet.
Beware of deprecated and unreliable Google operators. The most recent change in 2024 was about cache
, sadly Google said goodbye to the operator that helped to restore missing web pages. Also, both link
and info
operators were deprecated in 2017. Parentheses were useful in constructing nested queries, but no longer work as well. Operators inanchor
and allinanchor
are not yet deprecated, but deliver inconsistent results.
Metadata Analysis 🥷
You may have acquired documents hosted on the domain using the aforementioned Google Dorks or automated tools. They may not only contain sensitive data but also metadata (data about the data). Analyse PDFs hosted on the domain for metadata and images for EXIF data. It provides information about the creation, modification, and other file characteristics. You might extract usernames, emails, and other essential selectors to pivot from.
The tools below extract documents and analyse them using passive recon. However, depending on the configuration they might use active methods. Check the settings before you run them.
Automated tools
- FOCA – is a tool used for gathering information from metadata in documents found on a target website. It scans domains for hosted files and then downloads them to extract the metadata. Its primary goal is to identify software versions, server types, and other details that can be useful for security assessments.
- Metagoofil – is another tool used for gathering information from metadata in documents on a target website, similar to FOCA. It focuses on extracting metadata from various file types like PDFs, Microsoft Office documents, and other formats to uncover details such as software versions, usernames, and other potentially sensitive information.
- Dork Dump – is a tool designed to automate the process of finding sensitive information using Google dorks. It queries a specified domain name and scans on a variety of file extensions (pdf, doc, docx, etc). Then it downloads and runs Exiftool on them to enumerate metadata.
IP Geolocation 🥷
Various online tools allow you to geolocate the IP address, some more accurate than others. Parse coordinates you obtained to pinpoint the location in Google Maps. Use tools like IPinfo or GeoIPTool to obtain the location data.
Reverse IP lookup 🥷
Performing reverse IP lookup you might find sub-domains, development sites, or links between companies. This technique focuses on domains hosted on a single server or IP address. It is different from the previous methods where we research domains belonging to a certain person. Be mindful of false positives. The domain you query might be running on a shared hosting package.
- Dnslytics - this tool helps to research malicious IP addresses and discover ownership relations between domain names. You can also monitor new and removed domains on IP addresses, and identify domains on the same subnet.
- Hacker Target - the tool helps to find all A records associated with an IP address. The results can pinpoint virtual hosts being served from a web server.
- ViewDNS - a comprehensive set of domain tools, including reverse IP lookup.
- Domaintools - helps to to find all the domains hosted on a given IP address. In cases of Whois privacy on a target domain, knowing other connected domains might surface one with valid owner information.
Domain Name Servers (DNS)
DNS servers are responsible for turning domain names into IP addresses. DNS enumeration involves identifying and listing all related DNS records, like hostnames and IP addresses.
- Web Check – all-in-one open-source tool for analysing websites. It gives a plethora of information, including technological lookups. Moreover, it has extensive documentation on Github with use cases for each data point.
- DNSDumpster – is a free domain research tool that can discover hosts related to a domain.
- CloudPeler – Github tool for deanonimising websites behind Cloudflare.
- Domain Digger – is an advanced, web-based tool designed to provide detailed insights into domain-related data.
Passive DNS 🥷
Passive DNS data is especially useful for tracking how a domain changes and connects with other domains and IP addresses over time. Passive DNS allows to check all the names that resolved to the given IP. Therefore, you can build a useful history of resolutions.
- Risk IQ Community Edition - gives a lot of useful information about domains, including passive DNS data.
- VirusTotal - includes passive DNS data as part of its threat intelligence platform, showing historical DNS resolutions.
- Censys - while it is primarily a search engine for internet-connected devices, it also provides passive DNS data.
Subdomains
To fully understand a website's digital presence, you need to look beyond just its main address. The ability to uncover subdomains is crucial, as they hold valuable insights into organisation technology, structure, and business operations.
There are several methods available to capture and map the entire extent of the domain. From brute force discovery to more efficient subdomain scanners.
There are subdomain enumeration tools that use active scanning and interact directly with a target. These tools typically send DNS queries or HTTP requests to discover subdomains by directly querying the target's DNS servers or by making requests to potential subdomains and observing the responses. For example, Sublist3r is primarily a passive tool but can be configured to use brute force, which involves actively querying DNS servers for potential subdomains.
I have listed below tools that recon passively 🥷:
- Sublist3r – is a Python subdomain discovery tool that returns valid subdomains for websites, using passive online sources. It has a simple, modular architecture and is optimised for speed.
- DNSTwist – this tool generates a comprehensive list of permutations based on a provided domain name, and subsequently verifies whether any of these permutations are in use. Additionally, it can generate fuzzy hashes of web pages to detect ongoing phishing attacks or brand impersonation, and much more!
- crt.sh – is a tool that leverages SSL/TLS certificates to extract subdomain information. By querying Certificate Transparency logs, it can reveal subdomains associated with a target domain that have been registered or issued certificates. This method helps in identifying subdomains that might not be found through other passive reconnaissance techniques.
- SecurityTrails – helps in passive reconnaissance by offering detailed domain and IP information, historical DNS records, and WHOIS data. This allows you to analyse a target's digital footprint and past configurations without direct interaction.
Ports and services
It is crucial to understand open ports and services running on a target’s infrastructure. Open ports can expose various services and vulnerabilities, which help in mapping an attack surface. Scanning ports is useful for identifying what is running on a target machine.
When performing port scanning, use throttling to slow down the scan, reducing the chance of detection by IDS or firewalls. Obfuscation techniques, like randomising the order of scanned ports or using decoy IPs, also help mask the scan's origin and pattern.
Obfuscation and detection evasion are techniques of advanced security researchers. If you don't posses such skills, then go with passive recon techniques. They are safer and usually easier to perform.
Passive recon tools 🥷
- Shodan – a specialised search engine for internet-connected devices, allowing you to find specific devices (e.g., webcams, routers) based on various parameters.
- PassiveTotal - part of the RiskIQ platform, it provides passive DNS data and other intelligence, including details about ports and services gathered from internet-wide scanning data.
- GreyNoise – collects and analyses internet-wide scan data, helping you identify common internet background noise and offering insights into open ports and services without active scanning.
Connected entities 🥷
Research individuals connected to the domain, as they might offer new investigative leads. Tools like Hunter.io allow you to search domains for associated emails, helping you build a complete picture. Collecting these profiles can open new avenues for your investigation. Having an email you can pivot using techniques described in the following article:
Online resumes can disclose a wealth of sensitive information about your target. People often describe technologies they used to protect the organisation in their resumes. Find company employees and research their work history for juicy details. Use the following guide to navigate socials, including an essential LinkedIn business network investigation tips:
Passive recon tools 🥷
- Maltego - helps to visualise relationships between entities like domains and IPs. It also works well for researching people and nicely ties all your leads on the interactive graph. Maltego automates the collection of publicly available data, uncovering hidden connections without direct interaction with the target.
- MXToolbox – is a versatile online tool that provides detailed DNS and email server information, including MX records, SPF records, and blacklist status. It aids in passive reconnaissance by allowing you to gather information about a target's email infrastructure, DNS configurations, and potential vulnerabilities without directly interacting with their systems.
Google Analytics code 🥷
Searching for specific Google Analytics tracking codes, you can identify which sites are using particular versions of Google Analytics or related scripts. This can be useful for identifying sites associated with a particular organisation or campaign.
- NerdyData - offers Google Analytics code tracking across the web by indexing and aggregating web technologies used on various websites.
- Analyze ID - offers reverse lookup Google Analytics, Google Adsense, Amazon affiliates, Emails, IPs, and other third party IDs.
- DomainIQ - a comprehensive domain intelligence platform that does reverse Google Analytics lookups. It also offered reverse MX, IP and DNS lookups.
Archives and changes 🥷
Use the WaybackMachine to search for previous versions of webpages. It will show you how websites looked earlier and might help to recover deleted pages. Archive.today is another web archive service with the ability to manually add website snapshots.
Researching archives might point you to the redacted information. Your target may have removed contact information from the website, but it still might be accessible in the archive. One you have found it follow this guide:
Try visualping.io, a monitoring service that takes screenshots of the webpage at the selected time and sends you an email alert if something changes. Bitreading's DeltaFeed is also useful for monitoring visual changes. It specialises in tracking and alerting users to changes in web page content by comparing snapshots of web pages over time.
Conclusion
Your workflow highly depends on your investigative needs. Follow this guide as a blueprint that you can modify at any point. Start by defining your objectives and choosing the appropriate reconnaissance type based on your goals. Finally, subscribe to our newsletter using the form below for more awesome OSINT tips and tricks.