Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Searching Tips & Techniques
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 03-05-2006   #1
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Search Security Strategies

Everyone in an administrative position should know that poor usage of keyword terms in documents can be targeted by search engine users. Such users not only can have a legit interest but can also be hackers looking for vital information or for specific targets.

The following search commands in Google often provide hackers with invaluable information. A detailed description is given in Google Hacking: Ten Simple Security Searches That Work and in the book Google Hacking for Penetration Testers, by Johnny Long, Ed Skoudis; Published by Syngress; ISBN: 1931836361; Published: June 2001; Copyright; Pages: 528.

I have added to the list other queries I tested for an intelligence project.

Site - provides all sort of information about a site.

intitle:index.of - Universal search for directory listing, especially Apache-style directory listings.

error | warning - Error messages are revealing in just about every context. Warning text in search results can provide important insight into the behind-the-scenes code used by a target.

login | logon - Locates login portals fairly effectively and can be used to harvest usernames and troubleshooting procedures.

username | userid | employee.ID | "your username is" - The most generic searches for username harvesting. The context around these words can reveal procedural information an attacker can use in later offensive action.

password | passcode | "your password is" - This query reflects common uses of the word password and can reveal documents describing login procedures, password change procedures, and clues about password policies in use on the target.

admin | administrator - This query can be used to reveal procedural information ("contact your administrator") and even admin login portals.

-ext:html -ext:htm -ext:shtml -ext:asp -ext: php - This query, when combined with the site operator, gets the most common files out of the way to reveal more interesting documents. It should be modified to reduce other common file types on a target-by-target basis.

inurl:temp | inurl:tmp | inurl:backup | inurl:bak - This query locates backup or temporary files and directories.

List of Sites - This gives you site community information.

intranet | help.desk - This query locates intranet sites (which are often supposed to be protected from the general public) and help desk contact information and procedures.


And these are the one I added.

extranet | help.desk - Same as previous query.

mailto - This gives you email addresses.

phone - This gives you phone information.

ssn - This gives you social security numbers information but numbers may not necessarily be active numbers.

Of these searches, in my opinion the most troubling are searches for social security numbers (SSNs). The facilitation through search results of SSNs by search engines, directories, university sites and other sites is a symptomatic problem across the Internet. Clicking on a record from a search engine result pages can direct one to a document with more incidents. If the search engine distributes its search results to web partners, then the number of incidents are multiplied.

This problem is not unique to search engines. Many web properties are at fault: from government agencies and universities to churches, non-profit organizations and financial institutions. Often public documents form city halls, public minutes/sessions, or senate transcripts are found containing SSN, phones, email, physical addresses of citizens. One only need to search in Google or other search engines to find the incidents.

I have a comprehensive coverage of other techniques relevant to searches and security in general you might find useful at
Security Optimization Strategies in the Workplace


Feel free to comment on the importance of search security in your workplace.

Cheers

Orion

Last edited by orion : 03-05-2006 at 05:18 AM.
orion is offline   Reply With Quote
Old 03-06-2006   #2
dannysullivan
Editor, SearchEngineLand.com (Info, Great Columns & Daily Recap Of Search News!)
 
Join Date: May 2004
Location: Search Engine Land
Posts: 2,085
dannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud of
to be clear, I don't believe Google has an "ssn" command.

I haven't read the book, but I'm betting what it says is that if you use SSN along with a number range, you're likely to find SSN numbers on the web because the letters SSN might be on those pages. But Google doesn't to my knowledge try to catalog pages as an SSN type, so that you can find only those pages similar to how you could do a filetypedf search, for example.

login | logon is the same thing. It's not a command. It's a search for words that appear on pages.
dannysullivan is offline   Reply With Quote
Old 03-06-2006   #3
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by dannysullivan
to be clear, I don't believe Google has an "ssn" command.

I haven't read the book, but I'm betting what it says is that if you use SSN along with a number range, you're likely to find SSN numbers on the web because the letters SSN might be on those pages. But Google doesn't to my knowledge try to catalog pages as an SSN type, so that you can find only those pages similar to how you could do a filetypedf search, for example.

login | logon is the same thing. It's not a command. It's a search for words that appear on pages.
Danny, I believe you are you are mistaken my post and the article it points to. This is not about Google commands (perhaps the expression "commands" is loosely used here), but about how to data mine a database by querying specific terms. It happens that some of these are commands (in the search sense).

Regarding the number range, you don't need to use ranges.

Regarding the SSN part, this can return false matches, such as a submarine id or invalid or inactive numbers (e.g. deceased individuals). However, because of bad business practices or ignorance, it might return active ones.

I did a 2002-3 comparative work on strings of the form (think of it as pattern matching expressions)

"SSNd" a + k

Note the quotes in "SSNd". Try "SSN:", "SSN:a", "SSN#a", "SSN: a", "SSN# a" alone or with "k" where

d = a delimiting character such as ":", "#", "-", ".", a space, etc

a = up to 1973, a state assigned code (not a number range, but you could try that, too). List by states and territories is available at the Social Security Administration site, in books, scripts and elsewhere.

k = a keyword(s), like court, report, lawsuit, case, divorce, bank, bankruptcy, state, etc.


Try the above with or without k and check records.

I did the comparative work few years ago in Google, Yahoo, MSN, AlltheWeb, AltaVista, etc. Back then, of these, MSN did not show incidents, but I could not say the same about the rest.

As I mentined in the link at my site,

"These incidents are mostly due to bad business practices or ignorance. Considering that with a valid SSN other forms of identification can be obtained, this is a problem relevant to law enforcement and homeland security. It is also relevant to inmigration agencies since it can be more evident in states and territories affected by undocumented people and illegal aliens such as California, Arizona, Texas, Florida, New York and Puerto Rico. Stealing SSNs is also an enabling crime since it can lead to identity theft or who knows what sort of terrorist or criminal activites."

"These type of combinations provide interesting possibilities for both law enforcement, profilers, private eyes and unfortunately, for some with criminal intentions. I strongly recommend university administrators, schools, churches, non-profit organizations, government agencies and companies to never assign and use SSNs for any sort of transaction. And how about the practice of using the last four digits of a social security number? Don't even think about it. With the first three digits (given by states prior to 1973) and the last four one only need to guess the two digits in the middle of a SSN."

Orion

Last edited by orion : 03-06-2006 at 01:53 PM. Reason: fixing typos
orion is offline   Reply With Quote
Old 03-06-2006   #4
dannysullivan
Editor, SearchEngineLand.com (Info, Great Columns & Daily Recap Of Search News!)
 
Join Date: May 2004
Location: Search Engine Land
Posts: 2,085
dannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud of
Yep, it was the issue about commands. I just wanted to make it clear to people that Google doesn't provide some type of ssn: style command to find social security numbers. Absolutely, but crafting smart queries, you can find lots of information people probably don't realize can be located through Google or any search engine. It's long been a problem and one that webmasters continue to need to be aware of. Hope they learn from what you've posted.
dannysullivan is offline   Reply With Quote
Old 03-06-2006   #5
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

I never published the work, which was ready to be sent to SSN administration.

The problem and motivations of it arised after 9/11. Both search engines, users, webmasters, and web properties are at fault. One can craft a search and add your state name and find city hall, gov agencies, court record documents, etc containing incidents. (btw, "ssn:a" or any query term when double quoted and combined with keywords can be viewed as acting as a command)

Search security strategies is interesting from both the academic, commercial, intelligence and practical standpoint. It shows how exposed and fragile virtual and physical properties are.


Orion

Last edited by orion : 03-06-2006 at 07:15 PM. Reason: fixing typo
orion is offline   Reply With Quote
Old 03-06-2006   #6
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Thank you for featuring the thread. I bet search security strategies or search intelligence in general could be a good topic for a SES.

Looking at my original work I also compare queries of the form

ss:aDELgeolocation where

a = as mentioned, the first three digits

DEL = "-", "#", ":", etc.
geolocation = state name or abbreviation.

Note I start with ss (not with ssn). Test with/without quotes.

Intelligence searches are different from regular searches in the sense that one need to (a) know where to search and (b) how to search

(a) is not limited to search engines, one can search university, enterprise, vertical portal collections, etc.

(b) deals with mapping/matching, i.e., when crafting an intelligence search think in terms of regular expressions and concept mapping, rather than general searches combinations using natural language expressions.

With regard to geolocation tables, the SS Administration has stated and quote:

"Since 1973, social security numbers have been issued by our central office. The first three (3) digits of a person's social security number are determined by the ZIP Code of the mailing address shown on the application for a social security number. Prior to 1973, social security numbers were assigned by our field offices. The number merely established that his/her card was issued by one of our offices in that State."

That is, the first three digits of a U.S. Social Security number MAY or MAY NOT indicate the state or territory in which your application stated you were born.

Such tables are available elsewhere. For example, JavaScript expert Danny Goodman has written a lookup SS-geolocation table script (The JavaScript Bible (3rd edition, 1998, IDG Books Worldwide, Inc; See also companion CD ROM file \Scripts\ListWind\Chap48\ssn3.htm)

What surprise me is that one can still find gov, university and other web properties facilitating such information (SSNs). Large sites are more susceptible.

Back when I researched this and according to a Congressional Testimony presented by Patrick P. O'Carroll, at the time Assistant Inspector General for Investigations, Social Security Administration (SSA), and entitled "The Homeland Security and Terrorism Threat from Document Fraud, Identity Theft and Social Security Number Misuse", he stated

"The issuance of SSNs and driver's licenses based on invalid documentation creates a homeland security risk, and any failure to protect the integrity of the SSN can have enormous consequences. Identity theft is the fastest-growing form of white-collar crime in the United States. Many expect that incidents of identity theft will more than triple from .5 million in 2000, to 1.7 million in 2005. While identity theft existed prior to the advent of the Internet, there is no question that in recent years, criminals have taken advantage of all of the readily available confidential information on the Internet. Some studies indicate that 10 percent of identity theft currently originates through the Internet. It is projected that by 2005 that number will rise to 25 percent. "

We are now in 2006 and still the problem persists.

Instead of the gov and search engines (eg. USA Gov vs Google) squaring in court for query logs, why not just take action and address the more obvious, bigger and symptomatic problems like this one?

Perhaps inviting concerned parties to a SES track about the subject could accomplish something.


Orion

Last edited by orion : 03-06-2006 at 07:14 PM. Reason: typos
orion is offline   Reply With Quote
Old 03-06-2006   #7
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Before moving to other types of search security subjects at this thread, here is a very disturbing report of few days ago at USATODAY relevant to SSNs

Social Security numbers found on state websites

"The disclosure of Ohio residents' Social Security numbers on the state government's website highlights what many privacy experts — and criminals — already know: Such information is readily available to anyone with an Internet connection."

"It is common for the websites of the USA's secretaries of state to contain personal information, including Social Security numbers (SSNs) and home addresses, in business statements. Besides Ohio, the data is available in New York, Florida and at least seven other states, say privacy experts who provided USA TODAY with links to public websites."

"When you have state agencies putting this stuff online, you are spoon feeding criminals valuable information," says Betty Ostergren, a privacy activist whose husband was a victim of identity theft in 1987 and 1989. "And they can be anywhere in the world — an Internet cafe in Pakistan or a library in Mexico."

"Because of the incidents, Ohio Secretary of State Ken Blackwell is under fire after The Cincinnati Enquirer reported this week that an unknown number of business filings posted on the state's website include the SSNs of filers. "

Great!

Now if the SSN is also your driver license ID number, then that's even worse.

Crafting regular expressions for getting on the fly specific chunks of data from public databases is not a difficult task and indeed has been documented.

I don't understand why someone would want to use autentication systems with such strings. But, yes, some webmasters, network administrators and human resource managers don't get it. The not so funny thing is that these are government sites.


Orion
orion is offline   Reply With Quote
Old 03-07-2006   #8
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

I guess the title of this thread should be "Intelligence Search Strategies", "Smart Searches" or something like that since "Search Security" is a bit limited in scope, but that's OK.

When conducting intelligence searches one needs to know

(a) where to search
(b) how to search

As mentioned, one needs to use a variety of resources, in addition to traditional search engines. It also means thinking in terms of regular expressions when crafting queries, rather than searching using natural language searches.

When interpreting results, don't go by total number of search results or results from snippets as they might not tell the whole story. In the former, total counts, do not equate to incidents and in the latter, snippet entries, might not show incidents occuring in the landing document, which might or might not contain multiple incidents.

You would need to construct or have two harvesting systems. One to harvest snippets (a quick bot) and the other to harvest landing documents.

If the system you target has a predesigned command, that facilitates things. But what if the target does not has the one you need?

Construct one.

Craft a string in EXACT mode and include it in your FINDALL query. In some cases you could map it to geolocation data; in others, you need additional information, like data/time stamps, url stamps or name-value pair patterns. To illustrate, in the previous examples

"ss #", "ssn #" and derivatives were the "command" and the state code was the geolocation piece of information. k was a specific query.


So what type of searches other than SSN are possible? Well how about:

vehicles identification numbers (VIN)
fire arms serial numbers
court Docket numbers
real estate mortgages
loans and financial information
emails and WallStreet (ENRON Corpus, anyone?)


Let's limit this for now to VIN numbers. Fortunately, Google already facilitates searching for VIN records. These are long numbers in which the meaning of the characters go by positions, running from positions 1 to 17, per ISO standards: 17 characters, A-Z and 0-9. Sounds to me like a simple regular expression script can be written here to harvest results.

Positions 12-17 are the actual serial number and previous positions are manufacturer and geolocation data. There are many sites with tutorials explaining VIN positions. Just do the homework.

Several sites already provide tools for decoding these numbers and associated positions. These tools are trivial regular expression exercises, easy to create. But if you want to skip the effort, just use the one provided at Analox site and CarFax.

Here is a quick exercise. For the sample VIN number given at the AnalogX link, SCCFE33C9VHF65358, I obtained all this information


Vehicle Identification Number:


Position: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Character: S C C F E 3 3 C 9 V H F 6 5 3 5 8

Information go by Description /Position / Raw Data / Decoded Data

Region: 1 S Europe
Country: 1-2 SC United Kingdom
Manufacturer: 2-3 CC Lotus
Model Specific: 4-8 FE33C Unknown
Check Digit: 9 9 Valid
Year: 10 V 1997
Assembly Plant: 11 H H
Serial Number: 12-17 F65358 F65358


Limiting a search to one tool may not be enough.

If I want to know a bit more, say, the body style, I can copy and paste that VIN number in the CarFax tool (here) and find out that is a COUPE, 1997 Lotus Esprit.


Moreover, if I want, with the information already collected I can do a more comprehensive history search on the vehicle or combine the information with Docket document searches, loans and financial information searches. (never heard of private eyes outsourcing IR search services?)


Let's just hope that the driver license and the SSN are not the same as is the case in some States and countries, or that a dumb network administrator did not use the Holy Grail Number for autentication purposes at an online web property or enterprise intranet.


I guess search security strategies and intelligence searches in general are inherent to the nature of the Web, so are here to stay.

Still at some point there must be an equilibrium state of social responsibility between Web properties and users. I just hope others wake up and understand the times we are living in. Don't put online what you don't want others to see. You and they will wake up only when incidents hit home.


Orion

Last edited by orion : 03-07-2006 at 03:18 PM. Reason: fixing typos
orion is offline   Reply With Quote
Old 03-08-2006   #9
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Yahoo! too allows users to search for VIN numbers (and even aircraft N-numbers), provided you already know the number you are searching for. But how about if you don't know the number? What if you just need to map results to specific needs? Well, as mentioned before you could craft a "command" and an intelligence search using the following search strategy and convention

"command" + geolocation + k

which was we did for SSN. The same technique could be used with VIN numbers by using

"command" = "vin no.", "vin #", "vin:", etc

geolocation = state, city, etc.

k = keyword(s)


To illustrate, try in Yahoo "vin no." california honda


That will return many results to be harvested and data to be collected from snippets or landing documents.


Why would anyone want to collect such data? There are plenty of reasons and motivations.

VIN numbers can be used to map other car components. For instance, in one of the results returned from Yahoo using the above query many seem to be interested in unlocking radio codes. Often with these and other components VIN numbers are used as reference points.


And what about that used car you bought few months ago? What kind of previous user's history you could map to or unlock? Why should you bother? Check with your local MVD and they can assist you with plenty of reasons to chew on. Believe that.

To illustrate, one of the results returned from Yahoo using the above query (result #12 at search time) links to an OSCN database result in which a VIN number is featured in a court case:

OSCN Found Document:State ...


Need more reasons? Try this "vin no." texas stolen. Add some license plate data and craft the query that sweets your needs.


SSN, VIN, etc, etc, the point to make is that on the Web specific pieces of data can be mapped to specific scenarios and paper trails by crafting smart searches. Unfortunately, this can include other unpleasent scenarios like bankruptcy, divorce, dui cases etc, etc. You just need to know how and where to search. Scanning intelligence from urls is not that hard to do.



Orion
orion is offline   Reply With Quote
Old 03-09-2006   #10
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Greedy Techniques


With intelligence searches, often a greedy approach comes handy, especially when different versions of a "command" map to the same concept. This can be accomplished by using the OR operator. In Google, you can use a pipe ("|") as a shortcut. (However, keep in mind that Google's OR implementation has been questioned many times in the past.)



So, for the VIN case, one could try something like this:


"vin no."|"vin #"|"vin number" + geolocation + k


At this point of the thread, examples are not necessary.



Email Techniques

Searching with the "mailto" protocol as a "command" has been documented elsewhere. Being a protocol, it can be used with email addresses and link searches. For conducting intelligence searches on actual email document contents, however, is not as effective as strings from email headers, from which we can try searches of the general form


"command" + geolocation/date-timestamp + k
"command" + geolocation
"command" + k

etc...


To illustrate let's try the later, "command" + k. Do this first.

1. Open your Outlook Express client, right click on any email from your inbox and select Properties from the pop-up menu (or select Properties from the File menu).

2. Click on the Details tab and the Message Source button. You should see the source code of the email. The top portion has the email headers section.


There are plenty of regexps to choose from to craft a smart search and feed customized extractors. I tried these in Google

"Message-ID:" office


"X-RCPT TO:" admin


I used k = office and k = admin but you can try other generic term(s), a Date or time stamp, the name of a company or individual or combination of these.

Feel free to craft also searches with any of the following:


"received: from" ( or received-from: )

"x-original to:" ( or x-original-to: )

"delivered to:" ( or x-delivered-to: )



This type of smart searches -involving email header strings- can be used to identify spam sources and -ironically- for harvesting email addresses and content of email documents. Unfortunately, from time to time one can also find email content that should not be online at all. As expected large public and private organizations, schools, gov agencies, are more susceptible.

To be sure that email content from your company or organization is not featured in search engines search results, craft a search involving your own one. Better that you find any incident and fix the problem before a competitor finds these.

Don't assume this would never be the case based on the size or nature of your organization. Dumb or reckless employees are everywhere and mistakes happen all the time.


To illustrate the point, why would anyone want to have email content from ecommerce transactions (some intended to be in secured servers) delivered through newsgroup accounts? And indeed, such incidents do occur by mistake or intentionally.


At some point one may ask for some form of social responsibility guidelines or common sense best practices from all; from companies, organizations, search engines, web properties, webmasters, network administrators and users.


Let's make clear that the goal of this thread is not to encourage illegal behaviors but to insure you are educated and understand the risk of not using common sense in your workplace.


If you never heard about these issues, you or your organization would probably be making the same mistakes. Whether such incidents are the result of ignorance, indifference or dumb business practices, only you know.


Orion

Last edited by orion : 03-09-2006 at 12:08 PM. Reason: fixing typo
orion is offline   Reply With Quote
Old 03-09-2006   #11
claus
It is not necessary to change. Survival is not mandatory.
 
Join Date: Dec 2004
Location: Copenhagen, Denmark
Posts: 62
claus will become famous soon enough
Nice posts. It also shows that it isn't always that you really want the search engines to index everything - sometimes you have to think about how to hide information from them in stead. In this respect email addresses is a "classic" example, but still, to this date many people just post email adresses in the open.

Also, Johnny Long (j0hnny), the author of the book mentioned in the first post has a web site with more information. I don't have more to add right now, except - "don't try this at home" I just thought a link would be appropriate as he's been exploring this field for a few years now, AFAIK.
claus is offline   Reply With Quote
Old 03-10-2006   #12
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Other applications

Other possible searches are those involving


1. "command" = "account number:", "account number", etc

2. "command" = "case number:", "case no.", "file number:", "file no.", "file case", "lab report:", "docket number:", etc

followed by geolocation and or k. Here k can be anything that would help to narrow the search, like "dui", "divorce", "bankruptcy", "sex offender", first and last names, etc, or combination of these.

3. searching web properties that require disclosure of public records.

4. querying search engines for technical manuals and forums that explain enterprise implementations and flaws.


For instance,


1 are common regexps found in utility bills, phone bills and many other forms.

2 are common regexps found in medical, insurance, financial, legal and other scenarios.


1 and 2 don't need further examples, but feel free to try them.

Believe it or not, it is not out of the ordinary to find documents containing such pieces of data. One reason is that some blindfoldly pull data from databases to fill in documents or include such information with emails and attachments. Spyware and desktop search bars not necessarily help, either.


As for 3, here is a scary example involving SEC filings.

About a year ago the WallStreetJournal published the article Some Mutual Funds Reveal Clients' Data (reprinted by PrivacyToday site). The article mentions

"Some industry executives blame a fairly simple mistake: In putting together disclosure statements, fund companies or their outside administrators sometimes pick up account ownership information from a computer database. It often includes the customer's bank, mutual-fund or brokerage-firm account number."

And anyone querying a search engine for relevant regexps can find them. Specific incidents are quoted in the article, showing that even such information is available via SEC filings.


Incidents derived from Point 4, above, occur in part because some IT administrators cannot resist to discuss their own enterprise implementations and flaws online.


Consider the case of medical databases with theirPID, PVI implementers' specifications or PDQ documentation in general. Such technical manuals even give outsiders how-to tips:

"@PID.5.1^SMITH@PID.8^F" requests all patients whose family name (first component of PID-5-Patient Name) matches the value SMITH and whose sex (PID-8-Sex) matches the value ‘female’."


HL7 Discharge Referral Message has a lot of regexps to chew on. If an architecture leaks urls containing such strings, that might be indicative of the nature of the http requests.


If an administrator configures an enterprise to recognize specific query patterns but not other, why then discuss the how-to and implementation flaws in the open? Even if not illegal, how would that improve security?

Often improving security is a matter of common sense practices in the workplace.



Orion
orion is offline   Reply With Quote
Old 03-11-2006   #13
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation EXPOSING USER's DATA

Email Headers Revisited


Crafting searches involving email headers are more powerful than those involving "mailto".

These headers can be used to identify email/ip addresses of spammers, for example by testing headers like

"x-mailer", "x-declude-sender" and those previously mentioned.


Unfortunately when querying email headers one can retrieve all sort of email addresses through search engines, in addition to the actual content of such emails.


To top off, since some web properties use their own email headers, querying a search engine with branded headers has the detrimental effect of exposing the identity of users, suscribers or associates of such properties, as in:


"X-gmail-received"

"x-yahoo-group"

"x-yahoo-profile"

"x-amazon-track"

"x-amazon-gauge"

"x-amazon-corporate-relay"


Try also with user's groups or newsgroup sections of search engines. These headers uncover all sort of possibilities for competitors, spammers and others with questionable intentions.



Naive Strategies


Exposing users associated to a particular web property does not require of complex queries. Indeed, naive queries can do the trick.


Some of these are so naive that one might think "Nah, you won't get anything from that". Wrong!


A naive query of the form "command" + k like


"userid=*" + k
"userid #" + k
"userid#" + k
"associate id" + k
"associate-id" + k
"associate id=*" +k


etc, etc, can produce surprising results. In fact, even without defining k you can get interesting records, as in


"userid=*"
"userid #"
"userid#"
"associate id"
"associate-id"
"associate id=*"


If you want to include a k with these, feel free to try any of the following:


k = mil, gov, dhs, edu, amazon, ebay, hotmail, etc.



Combined Strategies


If you want to combine strategies, there are many ways to proceed. For instance, if you use Google, set num=100 in the query string by pointing your browser to


(1) for general searches, to: http://www.google.com/search?num=100&q="command" + k

(2) for group searches, to: http://www.google.com/groups?num=100&q="command" + k


with "command" and k as previously defined. This should return a maximum of 100 results per page.



If you want to combine naive and greedy strategies, then use something like


"associate id=*"|"associate id#" + k

"userid=*"|"userid #"|"userid#" + k


Define k as above or to your heart needs.


There might be other scenarios and applications, but the general approach is more or less the same.



The bottom line: sometimes there are certain things you don't want search engines to index, so don't put such stuff online to begin with.


I hope this discussion has helped some to understand the gravity of the issues herein presented. Search security strategies in a corporate environment, the workplace or at home is often a matter of common sense.


This concludes my presentation on the subject. Thank you for your attention.



Orion

Last edited by orion : 03-11-2006 at 03:14 AM.
orion is offline   Reply With Quote
Old 03-11-2006   #14
sebastian
Deep in the trenches
 
sebastian's Avatar
 
Join Date: Jun 2004
Location: atlanta
Posts: 187
sebastian is just really nicesebastian is just really nicesebastian is just really nicesebastian is just really nice
what a kick-ass thread

This is an amazing thread.

My faith in SEW is back on this one very thread. New, or lesser known issues like these are what drive our industry to further growth and keep us fresh with new information/techniques in which to protect our clients.

I was able to find passwords all over Google doing "password" searches for files that were of type .xls

Always new spammers were using engines to mine email addresses:

Example of "email" search of file type .xls on Google

but never thought to think through the other possible security issues...

Orion - great post, man ...really good information.
sebastian is offline   Reply With Quote
Old 03-11-2006   #15
AussieWebmaster
Forums Editor, SearchEngineWatch
 
AussieWebmaster's Avatar
 
Join Date: Jun 2004
Location: NYC
Posts: 8,153
AussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant future
another amazing thread orion.... I always learn a bunch from your threads... keep teaching....
AussieWebmaster is offline   Reply With Quote
Old 03-12-2006   #16
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Thanks.


Sorry, I omitted these notes.


Discovering specific headers


When we want to test specific headers, but we don't know where to start, a minimalistic approach comes handy. This can be accomplished by starting with short headers, as in


1. "x-rcpt" - can yield "x-rcpt-from" and "x-rcpt-to"

2. "x-envelope" - can yield "x-envelope-from", "x-envelope-to", "old-x-envelope-to" etc

3. "x-virus" - can yield "x-virus-scanned", "x-virus-checked", "x-virus-status", etc.

4. "x-pgp" - can yield "x-pgp-fingerprint", "x-pgp-rsa-fingerprint", "x-pgp-sig", "allow-x-pgp-sig", "require-x-pgp-sig", etc.

5. "x-spam" - can yield "x-spam-checker-version", etc


Searches using these headers plus a k are very useful when k is a topic, company, agency, a name or combination of these. Try with

k = virus, paypal, ebay, osama, iraq, irak, linux, cisco, debian, etc.


Searches involving these headers provide useful information regarding the degree -or lack- of protection of specific configurations or individuals associated to such architectures or web properties.


A similar approach can be used with branded headers, as in


1. "x-ebay" - can yield "x-ebay-mailtracker", etc.

2. "x-paypal" - can yield "X-paypal-mailtracker", etc.

3. "x-amazon" - can yield "x-amazon-track", etc.

4. "x-debian" - can yield "x-debian-pr-message", etc.


allowing competitors and others to collect all sort of intelligence (emails, ip addresses, names, email content, decisions taken, events, paper trails, etc) and to associate user interactions/happenings with specific architectures or web properties.


Automation of the intelligence searches herein discussed is possible via customized extractors and is not that hard to do. In fact, preformatted reports can be obtained in less than few seconds.

Now I'm done. R.I.P.



Orion
orion is offline   Reply With Quote
Old 03-18-2006   #17
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

I thought previous R.I.P was well deserved. However, the more one digs the more one realizes this not to be the case.

The same day (March 12) I stated these series of posts have runned their course, the Chicago Tribune published an embarrasing report in which they were able to uncover CIA "secrets" and more than 2,600 CIA employees by just searching across the Internet.

This illustrates that smart searches are not limited to search engines. One can also uncover a lot of data from dating services and job search sites, college campus websites, newspaper & court sites, etc.


Additional Notes

Along the same line of thoughts, here are some additional risks I feel deserve to be pointed out to web properties, search engine users, investigators and the public. The piece should be read twice since its exposition is on purpose a bit convoluted as it overlaps with many different areas already discussed.


First, let me start with smart searches and their risks. When searching the searcher need to understand that intelligence queries are different from regular queries in the sense that

1. when crafting a query, we are not concerned about natural language expressions, popular terms usage (keywords popularity), or even on default query commands but on devising regular expressions that would allow one to conduct concept mappings. This is why crafting pseudo commands works so well.

2. document relevancy is not based on matching queries to terms, but on the degree of metadiscovery. A document is considered relevant if it uncovers contextual information that can be used to discover new information. Here we are talking about high-order relevancy or in-transit relevancy. This is why naive and greedy techniques can produce surprising results.

3. Even if the discovered data is old or outdated, it can provide important clues of past records or paper trails relevant to an investigation.


On and on, smart searches can be applied to areas such as those related to encryption environments, crawls, taxes, loans, and even others more obscure like city permits, cityhall/senate minutes, gambling and liquor licenses. Try with

"x-google"

"x-md5"

"set-cookie"


Here

queries involving "x-google" can returns data scenarios containing "x-google-crawl" and "x-google-token", etc.

queries involving "x-md5" can return data scenarios containing hashed data, to which md5 decoders can be applied to.

queries involving "set-cookies" can yield user, domain, and path values.



How about tax scenarios? Well, consider this.


Many non/for profit organizations publish their TAX ID numbers online. This is often done to encourage and simplify contributions from the public.

Thus, a name-to-TAX-ID mapping might be in principle possible by simply searching for an organization's name.

The reverse process, queries leading to a TAX-ID-to-name mapping, is also possible. An example is given below using Google general searches

"tax id" + california

A similar mapping can be done for organizations that use a person's name as in

"tax id" + smith


These two queries SHOULD NOT return incidents associated to private taxpayers, unless they or others were stupid enough to facilitate their tax information online or that either by accident or for some equally stupid reason(s) the information ended up in a search engine index. Given the fact that some gov sites have facilitated SSNs in the past, either by accident or because some dumb network administrators have used them for autentication purposes, I would not be surprised of finding such incidents.


The point to make here is that results retrieved from smart queries are frequently surrounded by useful contextual intelligence data. To convince yourself, check any of the above smart query results and draw your own conclusions.

Not limit your search experience to Google; and to get the most out of the serps, set the url queries to the maximum possible number of results and as follows


in Google, set num=100
in Yahoo, set n=100
in MSN set count=100
in Altavista set nbq=100
in Gigablast set n=200


However, all these max serp settings might not work in some sections of a given search engine or directory. Also note that nbq=100 overrides the limit of 50 in Altavista and n=200 overrides the limit of 100 in Gigablast. Obviously these are the result of current setting specification flaws.

As mentioned in one of the posts, not just go by snippet entries as the landing document might contain incidents not shown in the snippets or might contain or link to multiple incidents and other in-transit intelligence data.

Also do not limit searches to just general queries in Google. Check also Google UncleSam, Google Scholar, Blogsearch, Local, etc. I found searching in Newsgroups or Groups sections of search engines really revealing (for gathering emails, phones, and all sort of data).


A web browser-based software for producing smart search reports and for analyzing sites was months ago developed and is currently been tested.



Orion

Last edited by orion : 03-18-2006 at 02:45 PM.
orion is offline   Reply With Quote
Old 03-20-2006   #18
AussieWebmaster
Forums Editor, SearchEngineWatch
 
AussieWebmaster's Avatar
 
Join Date: Jun 2004
Location: NYC
Posts: 8,153
AussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant future
Interesting results on the Groups link.... wonder if Tom Cruise and John Travolta know the Tax Id of the Scientologists in Cal is up for grabs....
AussieWebmaster is offline   Reply With Quote
Old 03-20-2006   #19
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Great catch, aussie.

Replace california for the desired term(s).

Amazing the stuff people put online and incredible the retrieval power of Google.


Orion
orion is offline   Reply With Quote
Old 03-20-2006   #20
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Or replace "tax id" for "ssn #" using

http://www.google.com/groups?num=100&q="command" + k

where

command = "ssn #"
k = your desired term(s)

Some records might show multiple incidents. I hope they remove that from the Web. I'm not sure why some want to facilitate that information online.

Orion

Last edited by orion : 03-20-2006 at 05:08 PM.
orion is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off