View Full Version : Search Security Strategies
orion
03-05-2006, 06:12 AM
Everyone in an administrative position should know that poor usage of keyword terms in documents can be targeted by search engine users. Such users not only can have a legit interest but can also be hackers looking for vital information or for specific targets.
The following search commands in Google often provide hackers with invaluable information. A detailed description is given in Google Hacking: Ten Simple Security Searches That Work (http://www.ethicalhacker.net/content/view/41/2/) and in the book Google Hacking for Penetration Testers, by Johnny Long, Ed Skoudis; Published by Syngress; ISBN: 1931836361; Published: June 2001; Copyright; Pages: 528.
I have added to the list other queries I tested for an intelligence project.
Site - provides all sort of information about a site.
intitle:index.of - Universal search for directory listing, especially Apache-style directory listings.
error | warning - Error messages are revealing in just about every context. Warning text in search results can provide important insight into the behind-the-scenes code used by a target.
login | logon - Locates login portals fairly effectively and can be used to harvest usernames and troubleshooting procedures.
username | userid | employee.ID | "your username is" - The most generic searches for username harvesting. The context around these words can reveal procedural information an attacker can use in later offensive action.
password | passcode | "your password is" - This query reflects common uses of the word password and can reveal documents describing login procedures, password change procedures, and clues about password policies in use on the target.
admin | administrator - This query can be used to reveal procedural information ("contact your administrator") and even admin login portals.
-ext:html -ext:htm -ext:shtml -ext:asp -ext: php - This query, when combined with the site operator, gets the most common files out of the way to reveal more interesting documents. It should be modified to reduce other common file types on a target-by-target basis.
inurl:temp | inurl:tmp | inurl:backup | inurl:bak - This query locates backup or temporary files and directories.
List of Sites - This gives you site community information.
intranet | help.desk - This query locates intranet sites (which are often supposed to be protected from the general public) and help desk contact information and procedures.
And these are the one I added.
extranet | help.desk - Same as previous query.
mailto - This gives you email addresses.
phone - This gives you phone information.
ssn - This gives you social security numbers information but numbers may not necessarily be active numbers.
Of these searches, in my opinion the most troubling are searches for social security numbers (SSNs). The facilitation through search results of SSNs by search engines, directories, university sites and other sites is a symptomatic problem across the Internet. Clicking on a record from a search engine result pages can direct one to a document with more incidents. If the search engine distributes its search results to web partners, then the number of incidents are multiplied.
This problem is not unique to search engines. Many web properties are at fault: from government agencies and universities to churches, non-profit organizations and financial institutions. Often public documents form city halls, public minutes/sessions, or senate transcripts are found containing SSN, phones, email, physical addresses of citizens. One only need to search in Google or other search engines to find the incidents.
I have a comprehensive coverage of other techniques relevant to searches and security in general you might find useful at
Security Optimization Strategies in the Workplace (http://www.miislita.com/searchito/security-optimization-strategies.html)
Feel free to comment on the importance of search security in your workplace.
Cheers
Orion
dannysullivan
03-06-2006, 09:32 AM
to be clear, I don't believe Google has an "ssn" command.
I haven't read the book, but I'm betting what it says is that if you use SSN along with a number range, you're likely to find SSN numbers on the web because the letters SSN might be on those pages. But Google doesn't to my knowledge try to catalog pages as an SSN type, so that you can find only those pages similar to how you could do a filetype:pdf search, for example.
login | logon is the same thing. It's not a command. It's a search for words that appear on pages.
orion
03-06-2006, 02:20 PM
to be clear, I don't believe Google has an "ssn" command.
I haven't read the book, but I'm betting what it says is that if you use SSN along with a number range, you're likely to find SSN numbers on the web because the letters SSN might be on those pages. But Google doesn't to my knowledge try to catalog pages as an SSN type, so that you can find only those pages similar to how you could do a filetype:pdf search, for example.
login | logon is the same thing. It's not a command. It's a search for words that appear on pages.
Danny, I believe you are you are mistaken my post and the article it points to. This is not about Google commands (perhaps the expression "commands" is loosely used here), but about how to data mine a database by querying specific terms. It happens that some of these are commands (in the search sense).
Regarding the number range, you don't need to use ranges.
Regarding the SSN part, this can return false matches, such as a submarine id or invalid or inactive numbers (e.g. deceased individuals). However, because of bad business practices or ignorance, it might return active ones.
I did a 2002-3 comparative work on strings of the form (think of it as pattern matching expressions)
"SSNd" a + k
Note the quotes in "SSNd". Try "SSN:", "SSN:a", "SSN#a", "SSN: a", "SSN# a" alone or with "k" where
d = a delimiting character such as ":", "#", "-", ".", a space, etc
a = up to 1973, a state assigned code (not a number range, but you could try that, too). List by states and territories is available at the Social Security Administration site, in books, scripts and elsewhere.
k = a keyword(s), like court, report, lawsuit, case, divorce, bank, bankruptcy, state, etc.
Try the above with or without k and check records.
I did the comparative work few years ago in Google, Yahoo, MSN, AlltheWeb, AltaVista, etc. Back then, of these, MSN did not show incidents, but I could not say the same about the rest.
As I mentined in the link at my site,
"These incidents are mostly due to bad business practices or ignorance. Considering that with a valid SSN other forms of identification can be obtained, this is a problem relevant to law enforcement and homeland security. It is also relevant to inmigration agencies since it can be more evident in states and territories affected by undocumented people and illegal aliens such as California, Arizona, Texas, Florida, New York and Puerto Rico. Stealing SSNs is also an enabling crime since it can lead to identity theft or who knows what sort of terrorist or criminal activites."
"These type of combinations provide interesting possibilities for both law enforcement, profilers, private eyes and unfortunately, for some with criminal intentions. I strongly recommend university administrators, schools, churches, non-profit organizations, government agencies and companies to never assign and use SSNs for any sort of transaction. And how about the practice of using the last four digits of a social security number? Don't even think about it. With the first three digits (given by states prior to 1973) and the last four one only need to guess the two digits in the middle of a SSN."
Orion
dannysullivan
03-06-2006, 03:05 PM
Yep, it was the issue about commands. I just wanted to make it clear to people that Google doesn't provide some type of ssn: style command to find social security numbers. Absolutely, but crafting smart queries, you can find lots of information people probably don't realize can be located through Google or any search engine. It's long been a problem and one that webmasters continue to need to be aware of. Hope they learn from what you've posted.
orion
03-06-2006, 03:13 PM
I never published the work, which was ready to be sent to SSN administration.
The problem and motivations of it arised after 9/11. Both search engines, users, webmasters, and web properties are at fault. One can craft a search and add your state name and find city hall, gov agencies, court record documents, etc containing incidents. (btw, "ssn:a" or any query term when double quoted and combined with keywords can be viewed as acting as a command)
Search security strategies is interesting from both the academic, commercial, intelligence and practical standpoint. It shows how exposed and fragile virtual and physical properties are.
Orion
orion
03-06-2006, 07:01 PM
Thank you for featuring the thread. I bet search security strategies or search intelligence in general could be a good topic for a SES.
Looking at my original work I also compare queries of the form
ss:aDELgeolocation where
a = as mentioned, the first three digits
DEL = "-", "#", ":", etc.
geolocation = state name or abbreviation.
Note I start with ss (not with ssn). Test with/without quotes.
Intelligence searches are different from regular searches in the sense that one need to (a) know where to search and (b) how to search
(a) is not limited to search engines, one can search university, enterprise, vertical portal collections, etc.
(b) deals with mapping/matching, i.e., when crafting an intelligence search think in terms of regular expressions and concept mapping, rather than general searches combinations using natural language expressions.
With regard to geolocation tables, the SS Administration has stated and quote:
"Since 1973, social security numbers have been issued by our central office. The first three (3) digits of a person's social security number are determined by the ZIP Code of the mailing address shown on the application for a social security number. Prior to 1973, social security numbers were assigned by our field offices. The number merely established that his/her card was issued by one of our offices in that State."
That is, the first three digits of a U.S. Social Security number MAY or MAY NOT indicate the state or territory in which your application stated you were born.
Such tables are available elsewhere. For example, JavaScript expert Danny Goodman has written a lookup SS-geolocation table script (The JavaScript Bible (3rd edition, 1998, IDG Books Worldwide, Inc; See also companion CD ROM file \Scripts\ListWind\Chap48\ssn3.htm)
What surprise me is that one can still find gov, university and other web properties facilitating such information (SSNs). Large sites are more susceptible.
Back when I researched this and according to a Congressional Testimony presented by Patrick P. O'Carroll, at the time Assistant Inspector General for Investigations, Social Security Administration (SSA), and entitled "The Homeland Security and Terrorism Threat from Document Fraud, Identity Theft and Social Security Number Misuse", he stated
"The issuance of SSNs and driver's licenses based on invalid documentation creates a homeland security risk, and any failure to protect the integrity of the SSN can have enormous consequences. Identity theft is the fastest-growing form of white-collar crime in the United States. Many expect that incidents of identity theft will more than triple from .5 million in 2000, to 1.7 million in 2005. While identity theft existed prior to the advent of the Internet, there is no question that in recent years, criminals have taken advantage of all of the readily available confidential information on the Internet. Some studies indicate that 10 percent of identity theft currently originates through the Internet. It is projected that by 2005 that number will rise to 25 percent. "
We are now in 2006 and still the problem persists.
Instead of the gov and search engines (eg. USA Gov vs Google) squaring in court for query logs, why not just take action and address the more obvious, bigger and symptomatic problems like this one?
Perhaps inviting concerned parties to a SES track about the subject could accomplish something.
Orion
orion
03-06-2006, 11:37 PM
Before moving to other types of search security subjects at this thread, here is a very disturbing report of few days ago at USATODAY relevant to SSNs
Social Security numbers found on state websites (http://www.usatoday.com/tech/news/internetprivacy/2006-03-02-social_x.htm)
"The disclosure of Ohio residents' Social Security numbers on the state government's website highlights what many privacy experts — and criminals — already know: Such information is readily available to anyone with an Internet connection."
"It is common for the websites of the USA's secretaries of state to contain personal information, including Social Security numbers (SSNs) and home addresses, in business statements. Besides Ohio, the data is available in New York, Florida and at least seven other states, say privacy experts who provided USA TODAY with links to public websites."
"When you have state agencies putting this stuff online, you are spoon feeding criminals valuable information," says Betty Ostergren, a privacy activist whose husband was a victim of identity theft in 1987 and 1989. "And they can be anywhere in the world — an Internet cafe in Pakistan or a library in Mexico."
"Because of the incidents, Ohio Secretary of State Ken Blackwell is under fire after The Cincinnati Enquirer reported this week that an unknown number of business filings posted on the state's website include the SSNs of filers. "
Great!
Now if the SSN is also your driver license ID number, then that's even worse.
Crafting regular expressions for getting on the fly specific chunks of data from public databases is not a difficult task and indeed has been documented.
I don't understand why someone would want to use autentication systems with such strings. But, yes, some webmasters, network administrators and human resource managers don't get it. The not so funny thing is that these are government sites.
Orion
orion
03-07-2006, 02:03 PM
I guess the title of this thread should be "Intelligence Search Strategies", "Smart Searches" or something like that since "Search Security" is a bit limited in scope, but that's OK.
When conducting intelligence searches one needs to know
(a) where to search
(b) how to search
As mentioned, one needs to use a variety of resources, in addition to traditional search engines. It also means thinking in terms of regular expressions when crafting queries, rather than searching using natural language searches.
When interpreting results, don't go by total number of search results or results from snippets as they might not tell the whole story. In the former, total counts, do not equate to incidents and in the latter, snippet entries, might not show incidents occuring in the landing document, which might or might not contain multiple incidents.
You would need to construct or have two harvesting systems. One to harvest snippets (a quick bot) and the other to harvest landing documents.
If the system you target has a predesigned command, that facilitates things. But what if the target does not has the one you need?
Construct one.
Craft a string in EXACT mode and include it in your FINDALL query. In some cases you could map it to geolocation data; in others, you need additional information, like data/time stamps, url stamps or name-value pair patterns. To illustrate, in the previous examples
"ss #", "ssn #" and derivatives were the "command" and the state code was the geolocation piece of information. k was a specific query.
So what type of searches other than SSN are possible? Well how about:
vehicles identification numbers (VIN)
fire arms serial numbers
court Docket numbers
real estate mortgages
loans and financial information
emails and WallStreet (ENRON Corpus, anyone?)
Let's limit this for now to VIN numbers. Fortunately, Google already facilitates searching for VIN records (http://www.google.com/help/features.html#number). These are long numbers in which the meaning of the characters go by positions, running from positions 1 to 17, per ISO standards: 17 characters, A-Z and 0-9. Sounds to me like a simple regular expression script can be written here to harvest results.
Positions 12-17 are the actual serial number and previous positions are manufacturer and geolocation data. There are many sites with tutorials explaining VIN positions. Just do the homework.
Several sites already provide tools for decoding these numbers and associated positions. These tools are trivial regular expression exercises, easy to create. But if you want to skip the effort, just use the one provided at Analox site (http://www.analogx.com/contents/vinview.htm) and CarFax (http://www.vehicleidentificationnumber.com/carfax2.html).
Here is a quick exercise. For the sample VIN number given at the AnalogX link, SCCFE33C9VHF65358, I obtained all this information (http://www.analogx.com/cgi-bin/cgivin.exe?Mode=Decode&VIN=SCCFE33C9VHF65358&Submit=Decode)
Vehicle Identification Number:
Position: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Character: S C C F E 3 3 C 9 V H F 6 5 3 5 8
Information go by Description /Position / Raw Data / Decoded Data
Region: 1 S Europe
Country: 1-2 SC United Kingdom
Manufacturer: 2-3 CC Lotus
Model Specific: 4-8 FE33C Unknown
Check Digit: 9 9 Valid
Year: 10 V 1997
Assembly Plant: 11 H H
Serial Number: 12-17 F65358 F65358
Limiting a search to one tool may not be enough.
If I want to know a bit more, say, the body style, I can copy and paste that VIN number in the CarFax tool (here) (http://www.carfax.com/cfm/check_order.cfm?vin=SCCFE33C9VHF65358&partner=BFR_Q_DEC_27&zipcode=&ClickID=gYAlP6sM7s&SiteID=qIa2RA0iXB&PopUpStatus=0) and find out that is a COUPE, 1997 Lotus Esprit.
Moreover, if I want, with the information already collected I can do a more comprehensive history search on the vehicle or combine the information with Docket document searches, loans and financial information searches. (never heard of private eyes outsourcing IR search services?)
Let's just hope that the driver license and the SSN are not the same as is the case in some States and countries, or that a dumb network administrator did not use the Holy Grail Number for autentication purposes at an online web property or enterprise intranet.
I guess search security strategies and intelligence searches in general are inherent to the nature of the Web, so are here to stay.
Still at some point there must be an equilibrium state of social responsibility between Web properties and users. I just hope others wake up and understand the times we are living in. Don't put online what you don't want others to see. You and they will wake up only when incidents hit home.
Orion
orion
03-08-2006, 01:30 PM
Yahoo! too allows users to search for VIN numbers (http://help.yahoo.com/help/us/ysearch/tips/tips-01.html#VIN) (and even aircraft N-numbers), provided you already know the number you are searching for. But how about if you don't know the number? What if you just need to map results to specific needs? Well, as mentioned before you could craft a "command" and an intelligence search using the following search strategy and convention
"command" + geolocation + k
which was we did for SSN. The same technique could be used with VIN numbers by using
"command" = "vin no.", "vin #", "vin:", etc
geolocation = state, city, etc.
k = keyword(s)
To illustrate, try in Yahoo "vin no." california honda (http://search.yahoo.com/search?p=%22vin+no.%22+california+honda&ei=UTF-8&fr=slv1-&x=wrt)
That will return many results to be harvested and data to be collected from snippets or landing documents.
Why would anyone want to collect such data? There are plenty of reasons and motivations.
VIN numbers can be used to map other car components. For instance, in one of the results returned from Yahoo using the above query many seem to be interested in unlocking radio codes (http://www.honda-tech.com/zerothread?id=1084720&page=5). Often with these and other components VIN numbers are used as reference points.
And what about that used car you bought few months ago? What kind of previous user's history you could map to or unlock? Why should you bother? Check with your local MVD and they can assist you with plenty of reasons to chew on. Believe that.
To illustrate, one of the results returned from Yahoo using the above query (result #12 at search time) links to an OSCN database result in which a VIN number is featured in a court case:
OSCN Found Document:State ... (http://www.oscn.net/applications/oscn/DeliverDocument.asp?citeID=15836)
Need more reasons? Try this "vin no." texas stolen (http://search.yahoo.com/search?p=%22vin+no.%22+texas+stolen&ei=UTF-8&fr=slv1-&x=wrt). Add some license plate data and craft the query that sweets your needs.
SSN, VIN, etc, etc, the point to make is that on the Web specific pieces of data can be mapped to specific scenarios and paper trails by crafting smart searches. Unfortunately, this can include other unpleasent scenarios like bankruptcy, divorce, dui cases etc, etc. You just need to know how and where to search. Scanning intelligence from urls is not that hard to do.
Orion
orion
03-09-2006, 12:36 PM
Greedy Techniques
With intelligence searches, often a greedy approach comes handy, especially when different versions of a "command" map to the same concept. This can be accomplished by using the OR operator. In Google, you can use a pipe ("|") as a shortcut. (However, keep in mind that Google's OR implementation has been questioned many times in the past.)
So, for the VIN case, one could try something like this:
"vin no."|"vin #"|"vin number" + geolocation + k
At this point of the thread, examples are not necessary.
Email Techniques
Searching with the "mailto" protocol as a "command" has been documented elsewhere. Being a protocol, it can be used with email addresses and link searches. For conducting intelligence searches on actual email document contents, however, is not as effective as strings from email headers, from which we can try searches of the general form
"command" + geolocation/date-timestamp + k
"command" + geolocation
"command" + k
etc...
To illustrate let's try the later, "command" + k. Do this first.
1. Open your Outlook Express client, right click on any email from your inbox and select Properties from the pop-up menu (or select Properties from the File menu).
2. Click on the Details tab and the Message Source button. You should see the source code of the email. The top portion has the email headers section.
There are plenty of regexps to choose from to craft a smart search and feed customized extractors. I tried these in Google
"Message-ID:" office (http://www.google.com/search?q=%22Message-ID:%22+office&hl=en&lr=&start=10&sa=N)
"X-RCPT TO:" admin (http://www.google.com/search?q=%22X-RCPT+TO:%22+admin&hl=en&lr=&start=10&sa=N)
I used k = office and k = admin but you can try other generic term(s), a Date or time stamp, the name of a company or individual or combination of these.
Feel free to craft also searches with any of the following:
"received: from" ( or received-from: )
"x-original to:" ( or x-original-to: )
"delivered to:" ( or x-delivered-to: )
This type of smart searches -involving email header strings- can be used to identify spam sources and -ironically- for harvesting email addresses and content of email documents. Unfortunately, from time to time one can also find email content that should not be online at all. As expected large public and private organizations, schools, gov agencies, are more susceptible.
To be sure that email content from your company or organization is not featured in search engines search results, craft a search involving your own one. Better that you find any incident and fix the problem before a competitor finds these.
Don't assume this would never be the case based on the size or nature of your organization. Dumb or reckless employees are everywhere and mistakes happen all the time.
To illustrate the point, why would anyone want to have email content from ecommerce transactions (some intended to be in secured servers) delivered through newsgroup accounts? And indeed, such incidents do occur by mistake or intentionally.
At some point one may ask for some form of social responsibility guidelines or common sense best practices from all; from companies, organizations, search engines, web properties, webmasters, network administrators and users.
Let's make clear that the goal of this thread is not to encourage illegal behaviors but to insure you are educated and understand the risk of not using common sense in your workplace.
If you never heard about these issues, you or your organization would probably be making the same mistakes. Whether such incidents are the result of ignorance, indifference or dumb business practices, only you know.
Orion
claus
03-09-2006, 01:44 PM
Nice posts. It also shows that it isn't always that you really want the search engines to index everything - sometimes you have to think about how to hide information from them in stead. In this respect email addresses is a "classic" example, but still, to this date many people just post email adresses in the open.
Also, Johnny Long (j0hnny), the author of the book mentioned in the first post has a web site with more information (http://johnny.ihackstuff.com/). I don't have more to add right now, except - "don't try this at home" :) I just thought a link would be appropriate as he's been exploring this field for a few years now, AFAIK.
orion
03-10-2006, 02:58 PM
Other applications
Other possible searches are those involving
1. "command" = "account number:", "account number", etc
2. "command" = "case number:", "case no.", "file number:", "file no.", "file case", "lab report:", "docket number:", etc
followed by geolocation and or k. Here k can be anything that would help to narrow the search, like "dui", "divorce", "bankruptcy", "sex offender", first and last names, etc, or combination of these.
3. searching web properties that require disclosure of public records.
4. querying search engines for technical manuals and forums that explain enterprise implementations and flaws.
For instance,
1 are common regexps found in utility bills, phone bills and many other forms.
2 are common regexps found in medical, insurance, financial, legal and other scenarios.
1 and 2 don't need further examples, but feel free to try them.
Believe it or not, it is not out of the ordinary to find documents containing such pieces of data. One reason is that some blindfoldly pull data from databases to fill in documents or include such information with emails and attachments. Spyware and desktop search bars not necessarily help, either.
As for 3, here is a scary example involving SEC filings.
About a year ago the WallStreetJournal published the article Some Mutual Funds Reveal Clients' Data (http://www.privacytoday.com/wsj032305.htm) (reprinted by PrivacyToday site). The article mentions
"Some industry executives blame a fairly simple mistake: In putting together disclosure statements, fund companies or their outside administrators sometimes pick up account ownership information from a computer database. It often includes the customer's bank, mutual-fund or brokerage-firm account number."
And anyone querying a search engine for relevant regexps can find them. Specific incidents are quoted in the article, showing that even such information is available via SEC filings.
Incidents derived from Point 4, above, occur in part because some IT administrators cannot resist to discuss their own enterprise implementations and flaws online.
Consider the case of medical databases with theirPID, PVI (http://www.medinfo.rochester.edu/hl7/v2.3/ch300076.htm) implementers' specifications or PDQ documentation (http://72.14.207.104/search?q=cache:SoZM0fopZAAJ:secure.cihi.ca/cihiweb/en/downloads/v2xIntro_e+%22PID.5%22+patient&hl=en&gl=us&ct=clnk&cd=14) in general. Such technical manuals even give outsiders how-to tips (http://64.233.179.104/search?q=cache:2mNA5wt7JjUJ:www.ihe.net/Technical_Framework/upload/IHE_ITI_Patient_Demo_Query_2004_08-15.pdf+%22PID-18%22+patient&hl=en&gl=us&ct=clnk&cd=10):
"@PID.5.1^SMITH@PID.8^F" requests all patients whose family name (first component of PID-5-Patient Name) matches the value SMITH and whose sex (PID-8-Sex) matches the value ‘female’."
HL7 Discharge Referral Message (http://www.ciap.health.nsw.gov.au/hospolic/gp/docs/Dis-Ref%20Implementers%20Specs%20V0.41.doc) has a lot of regexps to chew on. If an architecture leaks urls containing such strings, that might be indicative of the nature of the http requests.
If an administrator configures an enterprise to recognize specific query patterns but not other, why then discuss the how-to and implementation flaws (http://ihe.rsna.org/showthread.php?threadid=396&goto=nextnewest) in the open? Even if not illegal, how would that improve security?
Often improving security is a matter of common sense practices in the workplace.
Orion
orion
03-11-2006, 04:09 AM
Email Headers Revisited
Crafting searches involving email headers are more powerful than those involving "mailto".
These headers can be used to identify email/ip addresses of spammers, for example by testing headers like
"x-mailer", "x-declude-sender" and those previously mentioned.
Unfortunately when querying email headers one can retrieve all sort of email addresses through search engines, in addition to the actual content of such emails.
To top off, since some web properties use their own email headers, querying a search engine with branded headers has the detrimental effect of exposing the identity of users, suscribers or associates of such properties, as in:
"X-gmail-received" (http://www.google.com/search?num=100&q="x-gmail-received")
"x-yahoo-group" (http://www.google.com/search?num=100&q="x-yahoo-group")
"x-yahoo-profile" (http://www.google.com/search?num=100&q="x-yahoo-profile")
"x-amazon-track" (http://www.google.com/search?num=100&q="x-amazon-track")
"x-amazon-gauge" (http://www.google.com/search?num=100&q="x-amazon-gauge")
"x-amazon-corporate-relay" (http://www.google.com/search?num=100&q="x-amazon-corporate-relay")
Try also with user's groups or newsgroup sections of search engines. These headers uncover all sort of possibilities for competitors, spammers and others with questionable intentions.
Naive Strategies
Exposing users associated to a particular web property does not require of complex queries. Indeed, naive queries can do the trick.
Some of these are so naive that one might think "Nah, you won't get anything from that". Wrong!
A naive query of the form "command" + k like
"userid=*" + k
"userid #" + k
"userid#" + k
"associate id" + k
"associate-id" + k
"associate id=*" +k
etc, etc, can produce surprising results. In fact, even without defining k you can get interesting records, as in
"userid=*" (http://www.google.com/search?num=100&q="userid=*")
"userid #" (http://www.google.com/search?num=100&q="userid #")
"userid#" (http://www.google.com/search?num=100&q="userid#")
"associate id" (http://www.google.com/search?num=100&q="associate id")
"associate-id" (http://www.google.com/search?num=100&q="associate-id")
"associate id=*" (http://www.google.com/search?num=100&q="associate id=*")
If you want to include a k with these, feel free to try any of the following:
k = mil, gov, dhs, edu, amazon, ebay, hotmail, etc.
Combined Strategies
If you want to combine strategies, there are many ways to proceed. For instance, if you use Google, set num=100 in the query string by pointing your browser to
(1) for general searches, to: http://www.google.com/search?num=100&q="command" + k
(2) for group searches, to: http://www.google.com/groups?num=100&q="command" + k
with "command" and k as previously defined. This should return a maximum of 100 results per page.
If you want to combine naive and greedy strategies, then use something like
"associate id=*"|"associate id#" + k
"userid=*"|"userid #"|"userid#" + k
Define k as above or to your heart needs.
There might be other scenarios and applications, but the general approach is more or less the same.
The bottom line: sometimes there are certain things you don't want search engines to index, so don't put such stuff online to begin with.
I hope this discussion has helped some to understand the gravity of the issues herein presented. Search security strategies in a corporate environment, the workplace or at home is often a matter of common sense.
This concludes my presentation on the subject. Thank you for your attention.
Orion
sebastian
03-11-2006, 04:22 PM
This is an amazing thread.
My faith in SEW is back on this one very thread. New, or lesser known issues like these are what drive our industry to further growth and keep us fresh with new information/techniques in which to protect our clients.
I was able to find passwords all over Google doing "password" searches for files that were of type .xls
Always new spammers were using engines to mine email addresses:
Example of "email" search of file type .xls on Google (http://64.233.179.104/search?q=cache:GvfzMz7yOikJ:oire.utpa.edu/Facilitators/FacilitatorTeams.xls++%22email%22+filetype:xls&hl=en&gl=us&ct=clnk&cd=3)
but never thought to think through the other possible security issues...
Orion - great post, man ...really good information.
AussieWebmaster
03-11-2006, 08:22 PM
another amazing thread orion.... I always learn a bunch from your threads... keep teaching....
orion
03-12-2006, 03:28 AM
Thanks.
Sorry, I omitted these notes.
Discovering specific headers
When we want to test specific headers, but we don't know where to start, a minimalistic approach comes handy. This can be accomplished by starting with short headers, as in
1. "x-rcpt" - can yield "x-rcpt-from" and "x-rcpt-to"
2. "x-envelope" - can yield "x-envelope-from", "x-envelope-to", "old-x-envelope-to" etc
3. "x-virus" - can yield "x-virus-scanned", "x-virus-checked", "x-virus-status", etc.
4. "x-pgp" - can yield "x-pgp-fingerprint", "x-pgp-rsa-fingerprint", "x-pgp-sig", "allow-x-pgp-sig", "require-x-pgp-sig", etc.
5. "x-spam" - can yield "x-spam-checker-version", etc
Searches using these headers plus a k are very useful when k is a topic, company, agency, a name or combination of these. Try with
k = virus, paypal, ebay, osama, iraq, irak, linux, cisco, debian, etc.
Searches involving these headers provide useful information regarding the degree -or lack- of protection of specific configurations or individuals associated to such architectures or web properties.
A similar approach can be used with branded headers, as in
1. "x-ebay" (http://www.google.com/search?num=100&q="x-ebay") - can yield "x-ebay-mailtracker", etc.
2. "x-paypal" (http://www.google.com/search?num=100&q="x-paypal") - can yield "X-paypal-mailtracker", etc.
3. "x-amazon" (http://www.google.com/search?num=100&q="x-amazon") - can yield "x-amazon-track", etc.
4. "x-debian" (http://www.google.com/search?num=100&q="x-debian") - can yield "x-debian-pr-message", etc.
allowing competitors and others to collect all sort of intelligence (emails, ip addresses, names, email content, decisions taken, events, paper trails, etc) and to associate user interactions/happenings with specific architectures or web properties.
Automation of the intelligence searches herein discussed is possible via customized extractors and is not that hard to do. In fact, preformatted reports can be obtained in less than few seconds.
Now I'm done. R.I.P.
Orion
orion
03-18-2006, 03:37 PM
I thought previous R.I.P was well deserved. However, the more one digs the more one realizes this not to be the case.
The same day (March 12) I stated these series of posts have runned their course, the Chicago Tribune (http://www.chicagotribune.com/news/nationworld/chi-060311ciamain-story,1,123362.story) published an embarrasing report in which they were able to uncover CIA "secrets" and more than 2,600 CIA employees by just searching across the Internet.
This illustrates that smart searches are not limited to search engines. One can also uncover a lot of data from dating services and job search sites, college campus websites, newspaper & court sites, etc.
Additional Notes
Along the same line of thoughts, here are some additional risks I feel deserve to be pointed out to web properties, search engine users, investigators and the public. The piece should be read twice since its exposition is on purpose a bit convoluted as it overlaps with many different areas already discussed.
First, let me start with smart searches and their risks. When searching the searcher need to understand that intelligence queries are different from regular queries in the sense that
1. when crafting a query, we are not concerned about natural language expressions, popular terms usage (keywords popularity), or even on default query commands but on devising regular expressions that would allow one to conduct concept mappings. This is why crafting pseudo commands works so well.
2. document relevancy is not based on matching queries to terms, but on the degree of metadiscovery. A document is considered relevant if it uncovers contextual information that can be used to discover new information. Here we are talking about high-order relevancy or in-transit relevancy. This is why naive and greedy techniques can produce surprising results.
3. Even if the discovered data is old or outdated, it can provide important clues of past records or paper trails relevant to an investigation.
On and on, smart searches can be applied to areas such as those related to encryption environments, crawls, taxes, loans, and even others more obscure like city permits, cityhall/senate minutes, gambling and liquor licenses. Try with
"x-google" (http://www.google.com/search?num=100&q="x-google")
"x-md5" (http://www.google.com/search?num=100&q="x-md5")
"set-cookie" (http://www.google.com/search?num=100&q="set-cookie")
Here
queries involving "x-google" can returns data scenarios containing "x-google-crawl" and "x-google-token", etc.
queries involving "x-md5" can return data scenarios containing hashed data, to which md5 decoders can be applied to.
queries involving "set-cookies" can yield user, domain, and path values.
How about tax scenarios? Well, consider this.
Many non/for profit organizations publish their TAX ID numbers online. This is often done to encourage and simplify contributions from the public.
Thus, a name-to-TAX-ID mapping might be in principle possible by simply searching for an organization's name.
The reverse process, queries leading to a TAX-ID-to-name mapping, is also possible. An example is given below using Google general searches
"tax id" + california (http://www.google.com/search?num=100&q="tax id" + california)
A similar mapping can be done for organizations that use a person's name as in
"tax id" + smith (http://www.google.com/search?num=100&q="tax id" + smith)
These two queries SHOULD NOT return incidents associated to private taxpayers, unless they or others were stupid enough to facilitate their tax information online or that either by accident or for some equally stupid reason(s) the information ended up in a search engine index. Given the fact that some gov sites have facilitated SSNs in the past, either by accident or because some dumb network administrators have used them for autentication purposes, I would not be surprised of finding such incidents.
The point to make here is that results retrieved from smart queries are frequently surrounded by useful contextual intelligence data. To convince yourself, check any of the above smart query results and draw your own conclusions.
Not limit your search experience to Google; and to get the most out of the serps, set the url queries to the maximum possible number of results and as follows
in Google, set num=100
in Yahoo, set n=100
in MSN set count=100
in Altavista set nbq=100
in Gigablast set n=200
However, all these max serp settings might not work in some sections of a given search engine or directory. Also note that nbq=100 overrides the limit of 50 in Altavista and n=200 overrides the limit of 100 in Gigablast. Obviously these are the result of current setting specification flaws.
As mentioned in one of the posts, not just go by snippet entries as the landing document might contain incidents not shown in the snippets or might contain or link to multiple incidents and other in-transit intelligence data.
Also do not limit searches to just general queries in Google. Check also Google UncleSam (http://www.google.com/unclesam?num=100&q=%22tax+id%22+california), Google Scholar, Blogsearch, Local, etc. I found searching in Newsgroups or Groups (http://groups.google.com/groups?num=100&q=%22tax+id%22+california) sections of search engines really revealing (for gathering emails, phones, and all sort of data).
A web browser-based software for producing smart search reports and for analyzing sites was months ago developed and is currently been tested.
Orion
AussieWebmaster
03-20-2006, 12:49 PM
Interesting results on the Groups link.... wonder if Tom Cruise and John Travolta know the Tax Id of the Scientologists in Cal is up for grabs....
orion
03-20-2006, 02:53 PM
Great catch, aussie.
Replace california for the desired term(s).
Amazing the stuff people put online and incredible the retrieval power of Google.
Orion
orion
03-20-2006, 05:46 PM
Or replace "tax id" for "ssn #" using
http://www.google.com/groups?num=100&q="command" + k
where
command = "ssn #"
k = your desired term(s)
Some records might show multiple incidents. I hope they remove that from the Web. I'm not sure why some want to facilitate that information online.
Orion
AussieWebmaster
03-20-2006, 06:03 PM
I think threads like this will bring these faults to their attention.....
orion
03-20-2006, 06:05 PM
All this raises serious privacy issues and concerns.
Along that line, publication of certain information can be of value to law enforcement, but might also irritate right advocacy groups or later on found an unnecessary exposure. To illustrate, the query in Google Groups
"ssn #" inmate (http://groups.google.com/groups/search?num=100&q=%22ssn+%23%22+inmate)
exposes many SSNs of inmates to the public. Same if you do a general search in this or other search engines, not just Google.
While I'm not sure who might have reasons to publish or use those SSNs, one cannot ignore arguments against their publication, especially if later on the individual is vindicated in a court of law.
And now that I'm on the subject, how about the following queries
"FBI number" inmate (http://groups.google.com/groups/search?num=100&q=%22FBI+number%22+inmate)
"FDLE number" inmate (http://groups.google.com/groups/search?num=100&q=%22FDLE+number%22+inmate)
Compare results with Google Search (http://www.google.com/search?num=100&q=%22FBI+number%22+inmate) and Google UncleSam (http://www.google.com/unclesam?num=100&q=%22FBI+number%22+inmate). These list more comprehensive research resources and pseudo commands to chew on.
What do you think?
Orion
orion
03-22-2006, 05:59 PM
Whois Intelligence
Querying a domain registrar company is the simplest way of mapping a domain name to a specific piece of information.
The reverse case, mapping a specific piece of information to a domain name is also possible.
The following queries in Google Groups do the trick. Here I use a pseudo command and geolocation, but you can set k to your heart needs or not use a k at all; then compare results. In all cases, pay attention to the in-transist information that is discovered.
"registrant id" arizona (http://www.google.com/groups?num=100&q="registrant id" arizona)
"domain id" arizona (http://www.google.com/groups?num=100&q="domain id" arizona)
Once discovered, the new piece of information can be used as input for other intelligence searches.
Postal Mail Box Intelligence
Similarly, to map PMBs to other data, try with something like
"PBM #" texas (http://groups.google.com/groups/search?num=100&q=%22PMB+%23%22+texas)
or try with k = zipcode, phone numbers, last names, etc.
These type of searches expose data that can be harvested by both serious investigators and unfortunately marketing spammers and hackers. Such bits of discovered data is not trivial and can be used by these and others when gaming someone at the other end via the "buying of confidence" game. Thus, a good human resource administrator might want to train customer support staff with proper conversation detection/profiling techniques.
Orion
AussieWebmaster
03-23-2006, 11:00 AM
Keep this thread going mate, it is providing invaluable information.
orion
03-23-2006, 05:35 PM
Yes, but for whom? :rolleyes:
The thing about IR and search engine platforms is that these level the field for all type of information seekers: the good guys (law enforcement, gov analysts) and the bad guys (spammers, hackers, stalkers, etc).
Intelligence from urls
Another great source of information are searches conducted in specific document identifiers, (title, urls, etc). For example in Google and fortunately we can use their true command allinurl, then combine it with some of the above queries. Here are some examples
1. allinurl:"bin/passwd" (http://www.google.com/search?num=100&hl=en&lr=&q=allinurl%3A%22bin%2Fpasswd%22)
2. allinurl:"username=*" (http://www.google.com/unclesam?num=100&q=allinurl%3A%22username%3D*%22)
1 finds bin authentication architectures.
2 should match user's names. Try also with password.
My advice: Think twice or avoid altogether passing name=value pairs between pages when these might contain sensitive information. Such urls can find their ways into a searchable index.
Intelligence from incident reports
Whenever possible an administrator with common sense should not facilitate incident reports online, especially if these interface with security.
I will illustrate this with Google University, but you will get the idea of why facilitating information can be risky business. University network administrators as other administrators might want to read this.
All kind of resources from universities listed in Google University are available online. Due in part to burocracy and contradictory guideliness, some univ sites expose information that common sense dictates should not be of public domain, like usernames and calls to actions, remedies, etc. For instance try with searches of the form
"http://www.google.com/univ/uni?num=100&q=k"
where
uni = university listed in Google University
k = "sos system", "incident report", "police report", "crime report", etc.
For illustration purposes consider the following case. If the url is old, you might want to click the Google Cached link.
sos system (http://www.google.com/univ/asu?num=100&q=%22sos+system%22)
And now that I'm on the university topic. Why some universities are selling student information (http://www.asu.edu/registrar/general/ferpaFAQ.html#12) in chunks of several per academic years? Isn't that another source of intelligence gathering? And even if not illegal, who are buying or eventually reselling?
Orion
AussieWebmaster
03-23-2006, 06:22 PM
Without a doubt some of the novice hackers may learn something but realistically it is a better wake up to novice webmasters
orion
03-23-2006, 11:46 PM
Well, If we revisit the above ASU link, and many of the previous links, savvy and seasoned webmasters and administrators are responsible as well. In their case often one can see the consequences, not of lack of knowledge, but of bad judgement.
Often experts are very good at assessing situations but are very bad at dealing with specifics.
Now that I mention ASU, they claim they sell student information a little as only 4 times a year. Considering the traditional semester sessions (fall, spring summer I and II), that averages one sale per semester.
Universities are hypocritical on this issue. It is not a secret that many of these are the largest generators and sellers of information. Check search results for
universities "selling student information" (http://www.google.com/search?num=100&q=universities+%22selling+student+information%22)
Thus, it is not a surprise that so many well known companies have alliances with universities to "manage" "student information" (http://news.google.com/news?num=100&tab=wn&ie=UTF-8&q=%22student+information%22)
Orion
orion
03-24-2006, 04:14 AM
Advanced Intelligence Searches
Advanced searches (http://www.searchengineshowdown.com/features/google/) in Google using the wildcard * are well documented. These allow one to conduct string range searches. To insure selectivity, submit in EXACT mode by double quoting these. Thus,
"k1 * k2" retrieves a sequence starting with k1 and ending with k2. Here k1 and k2 can be words, single characters or combination of these. Multiple instances of * in a query act as placeholders. Try with
1. words; as in
"a * mischief" (http://www.google.com/search?num=100&q="a * mischief")
"a * * mischief" (http://www.google.com/search?num=100&q="a * * mischief")
"a * * * mischief" (http://www.google.com/search?num=100&q="a * * * mischief")
etc.
2. characters; as in
"a * s" (http://www.google.com/search?num=100&q="a * s")
"a*s" (http://www.google.com/search?num=100&q="a*s")
"1 * 8" (http://www.google.com/search?num=100&q="1 * 8")
"1*8" (http://www.google.com/search?num=100&q="1*8")
etc.
The good thing about this type of searches is that allow one to search for specific nuggets of information when one have a partial knowledge of what to search for. For instance,
"admin@*.com" (http://groups.google.com/groups?num=100&q=%22admin%40*.com%22)
retrieves documents containing a string that start with "admin@" and end in ".com" Great for discovering admin email addresses ending in ".com"
However, since this is a greedy technique it can also match non relevant strings, as in
"admin@*.it" (http://groups.google.com/groups?num=100&q=%22admin%40*.it%22)
To make it more selective refine either k1 or k2 or both. An example is given below.
"mailto:admin@*.it" (http://www.google.com/groups?num=100&q=%22mailto:admin%40*.it%22)
This tells us that is possible to combine these type of searches with all sort of smart searches but a tradeoff between greediness and specificity is required. Otherwise we might end retrieving noisy results.
Orion
orion
03-24-2006, 05:32 PM
Proceeding in this way the query in Google Search
"mailto:*@google.com" (http://www.google.com/search?q=%22mailto:*%40google.com%22&num=100&hl=en&lr=&start=0&sa=N&filter=0)
discovers email addresses ending in "@google.com"
Submitting the query through the software described in post #17 generates the corresponding preformatted reports of email & ip addresses, and all sort of in-transit data. The tool effectively turns any search engine SERPs into an intelligence generator.
Similar reports can obtained from all sort of searchable web properties. Thus, scanning intelligence from urls becomes a one-click-away task.
Orion
orion
03-25-2006, 04:23 AM
Practical Applications of Advanced Intelligence Searches: Clustering Results by Commonalities
Here I want to present several applications of using the wildcard with intelligence searches. To insure secondary pages from a given property are displayed, set filter=0 in the query url when querying Google.
Common phone numbers
The following example discovers phone numbers with a common area code and ending numbers.
"(602) * 7122" (http://www.google.com/search?num=100&filter=0&q=%22%28602%29+*+7122%22)
Common file urls
The following example uses allinurl to discover file urls with a common ending.
allinurl:"/login/*index.html" (http://www.google.com/search?num=100&filter=0&q=allinurl%3A%22%2Flogin%2F*index.html%22)
allinurl:"/passwords/*index.html" (http://www.google.com/search?num=100&filter=0&q=allinurl%3A%22%2Fpasswords%2F*index.html%22)
You get the idea.
Common domain email addresses
The following example discovers email addresses from a given domain.
This is a great way for conducting intel on competitors. Again, note that in the query url I set filter=0 in order to display prospective secondary pages from a given web property.
"mail:*ibm.com" (http://www.google.com/search?num=100&filter=0&q=%22mail%3A*ibm.com%22)
"mail:*whitehouse.gov" (http://www.google.com/search?num=100&filter=0&q="mail:*whitehouse.gov")
"mail:*cia.gov" (http://www.google.com/search?num=100&filter=0&q=%22mail%3A*cia.gov%22)
"mail:*mil.gov" (http://www.google.com/search?num=100&filter=0&q=%22mail%3A*mil.gov%22)
"mail:*ebay.com" (http://www.google.com/search?num=100&filter=0&q=%22mail%3A*ebay.com%22)
Here I use k1=mail, but you can use k1=email or k1=mailto. To insure all possible matches are covered, I recommend to use a greedy technique, for instance by setting
"mail:*google.com"|"email:*google.com"|"mailto:*google.com" (http://www.google.com/search?num=100&q="mail:*google.com"|"email:*google.com"|"mailto:*google.com")
In my next posts I will explain how we can use smart searches with on-topic analysis for, for instance, optimize web documents. I'll not get too much into the details to avoid a diversion from this thread. I just want to mention it to show the importance of crafting these type of searches.
Orion
orion
03-26-2006, 11:14 PM
I've changed my mind. Since there is already an On-Topic Analysis thread (http://forums.searchenginewatch.com/showthread.php?p=53176#post53176), I posted at that thread on optimizing doc content with smart searches. So, let's change the subject to
Smart Searches and Copyright Security
In addition to the many excellent products commercially available, smart searches can be used to detect plagiarism (copyrigh infringment). There are many reasons for using smart searches for testing copyright security.
Teachers, school administrators and commercial sites always struggle with cases of plagiarism. According to the University of Kentucky, the most common forms of copyright infringement (http://www.chem.uky.edu/courses/common/plagiarism.html#Examples) in schools involve:
1. Direct copying from original sources
2. Direct copying from original sources, but with footnotes
3. Rewording a sentence (paraphrasing)
4. Borrowing organization
5. Submitting someone else's work
6. Failing to reference/footnote source material
The ugly face of plagiarism can also show up in a marketing environment.
Imagine the waste of time, effort, and budget invested in crafting a catchphrase, strapline or sloggan for a TV, radio or website ad, only to find out that it was already available online. Imagine also that after you some intentionally uses your original material without your permission or without giving credit to you.
So, how could we use smart searches to address copyright security?
Well, querying your candidate line in EXACT mode is one way. However, this will not work if someone have reworded or tweaked your material to "walk the thin line".
You might want to give it a try to the * operator to check who is using your line or if the candidate line is already owned by someone (though, for the later alternate solutions (http://www.adslogans.co.uk/) are available online). If necessary use several * as placeholders with your slogan.
For instance, suppose that you invented and trademarked the phrase "a rose is a rose" (irronically, a common phrase found in docs that discuss duplicated content). Let say you want to find out if someone has copied your material.
In addition to searching for
"a rose is a rose"
you could also search for
"a rose is * rose", "a rose * rose", or "a rose * * rose", "a * is * a rose, etc.
You get the idea.
Here are several exercises.
1. Check in press release databases for possible instances of plagiarism (assuming these support "" or * operators)
2. Identify a TV ad or radio ad in which a proprietary slogan is used. Then do a smart search for this in Google. Check both organic and paid results to see if someone is using the slogan.
Orion
Sarclaimer = Sarcastic disclaimer: I'm just using the above as a demostrative scenario, certainly not claiming that current Google serps for "a rose is a rose" are the result of plagiarism. I hope nobody copy, register or trade "sarclaimer" after me. Wishful thinking.
orion
03-28-2006, 04:07 AM
I think threads like this will bring these faults to their attention.....
Most definitely. SSA is already paying attention. They also visited me.
VISITOR ANALYSIS
Referring Link No referring link
Host Name
IP Address 199.173.226.228
Country United States
Region Maryland
City Halethorpe
ISP Social Security Administration
VISITOR SYSTEM SPECS
Browser Firefox 1.5.0
Operating System Windows XP
Resolution 1152x864
Javascript Enabled
Let's hope they do fix so many things wrong regarding SSNs publication across the Web.
Orion
orion
04-11-2006, 11:51 PM
It's interesting what a mere search for a given ip address can give you. Using the logged IP from the SSA visitor I got this in Google for:
"199.173.226.228" (http://www.google.com/search?num=100&q="199.173.226.228")
I got more than 800 results. What are the odds that more than one "user" is coming from the same ip address or that some employees at SSA.gov is using their web property for activities unrelated to their work? Gov employees using taxpayer resources for things unrelated to their workplace?
Like this one (http://www.busconversions.com/newsboard/articles/82939.html)?
Or this one from this Google cache (http://72.14.203.104/search?q=cache:uWVB9yPR8PsJ:www.pangasinan.org/Htm_bulresult/stamariaresult.html+%22199.173.226.228%22&hl=en&ct=clnk&cd=35)?
In this last case, scroll down (very down) to find in the cached page the highligthed ip. Interesting contextual information one discovers.
And how about that visit from DoJ I received several days ago?
VISITOR ANALYSIS
Referring Link No referring link
Host Name wdcsun18.usdoj.gov
IP Address 149.101.1.118
Country United States
Region District Of Columbia
City Washington
ISP Us Dept Of Justice
VISITOR SYSTEM SPECS
Browser Netscape 7.2
Operating System Windows XP
Resolution 1024x768
Well, a search in Google for:
"149.101.1.118" (http://www.google.com/search?num=100&q="149.101.1.118")
reveals some in-transit information that can be used to expose or at least go after some identities "visiting" blogs and sites all over the Web.
Too many results? No problem. Try this:
"149.101.1.118" + justice (http://www.google.com/search?num=100&hl=en&lr=&q="149.101.1.118"+justice)
That's the real power of search engines: the ability to discover contextual data to further datamine new results.
To sum up, search security strategies is something to pay attention to. It is not a trivial thing.
At some point, it will be nice to see a summit discussion with all concerned parties and I mean the gov, the academic sector and search engine companies on the role of Internet searches within a homeland security or legal environment. If not possible, I will settle for the social responsibility of a search engine utilitiy.
Let's hope all concerned parties pay attention to the issues I have raised here.
Considering that many goverment and university sites are the first offenders, they should care.
Orion
AussieWebmaster
04-12-2006, 12:48 PM
You keep adding great info to this thread... great work orion!!!
orion
04-12-2006, 09:20 PM
Thanks.
The procedures described above are a no brainer and are the one that one could use in support to Carnivore or a similar technology running at the ISP level.
In the last case above, the procedure could be expanded and as follows
1. Capture visitors' IP addresses and other data.
2. Use search engines and smart queries to find matches.
3. Use contextual data to get more in-transit data through smart searches.
BTW, I wonder who tip them off to remove the cached page about that "denzel" mercado.
Based on above records, either some gov employees are misusing taxpayers money while on the workplace or are "posing" in an investigation.
Like this "anon" (http://www.cardreport.com/wwwboard40/messages/23982.html)?
I doubt the later since at least an investigator could use a dynamic ip from a commercial ISP rather than leaving a trail on the Web for the entire World to see.
Like here:
"149.101.1.118" + "anon" (http://www.google.com/search?num=100&hl=en&lr=&q="149.101.1.118"+"anon")
Orion