Authors of malicious e-mails beware.
A team of researchers in Canada has developed a new and highly accurate technique to identify those who send anonymous e-mails. And best of all, the results can be used in a court of law—unlike most other methods of ascertaining e-mail authorship.
“For evidence to be admissible, investigators need to explain how they have reached their conclusions. Our method allows them to do this,” says study co-author Benjamin Fung, a professor of Information Systems Engineering at Concordia University in Montreal.
“The expert witness can actually pinpoint exactly what features in the e-mail make him believe that the person is the true author of the anonymous e-mail,” explains Fung, an expert in data mining—extracting useful, previously unknown knowledge from a large volume of raw data.
While police can use the IP address to locate the house or apartment where an e-mail originated, there may be many people living there. The Concordia researchers’ method provides an effective way to determine who exactly wrote the e-mails under investigation.
Based on techniques used in speech recognition and data mining, the method relies on the identification of repeated patterns such as typos or spelling errors—unique combinations of features that recur in a suspect’s e-mails.
After filtering out any patterns found in the e-mails of other suspects, the remaining patterns constitute the author’s “write-print”—a distinctive identifier, like a fingerprint.
“Let’s say the anonymous e-mail contains typos or grammatical mistakes, or is written entirely in lowercase letters,” says Fung. “We use those special characteristics to create a write-print. Using this method, we can even determine with a high degree of accuracy who wrote a given e-mail, and infer the gender, nationality, and education level of the author.”
To test the accuracy of their technique, Fung and his colleagues used the Enron Email Dataset, a collection of over 200,000 real-life e-mails from 158 Enron Corporation employees.
Using a sample of 10 e-mails written by 10 employees, 100 in all, they were able to identify authorship with an accuracy rate of 80 to 90 percent.
Fung says the past few years have seen an alarming increase in the number of cybercrimes involving anonymous e-mails—e-mails that can carry viruses, transmit threats or child pornography, and facilitate communication between criminals.
In developing the identification technique, Fung worked with Mourad Debbabi, a Concordia expert in cyberforensics, and doctoral student Farkhund Iqbal.
“This is the result of interdisciplinary research,” says Fung. “We were trying to use data mining techniques to solve a real-life problem in cyberforensics.”
Their findings were published in the journal Digital Investigation.
A team of researchers in Canada has developed a new and highly accurate technique to identify those who send anonymous e-mails. And best of all, the results can be used in a court of law—unlike most other methods of ascertaining e-mail authorship.
“For evidence to be admissible, investigators need to explain how they have reached their conclusions. Our method allows them to do this,” says study co-author Benjamin Fung, a professor of Information Systems Engineering at Concordia University in Montreal.
“The expert witness can actually pinpoint exactly what features in the e-mail make him believe that the person is the true author of the anonymous e-mail,” explains Fung, an expert in data mining—extracting useful, previously unknown knowledge from a large volume of raw data.
While police can use the IP address to locate the house or apartment where an e-mail originated, there may be many people living there. The Concordia researchers’ method provides an effective way to determine who exactly wrote the e-mails under investigation.
Based on techniques used in speech recognition and data mining, the method relies on the identification of repeated patterns such as typos or spelling errors—unique combinations of features that recur in a suspect’s e-mails.
After filtering out any patterns found in the e-mails of other suspects, the remaining patterns constitute the author’s “write-print”—a distinctive identifier, like a fingerprint.
“Let’s say the anonymous e-mail contains typos or grammatical mistakes, or is written entirely in lowercase letters,” says Fung. “We use those special characteristics to create a write-print. Using this method, we can even determine with a high degree of accuracy who wrote a given e-mail, and infer the gender, nationality, and education level of the author.”
To test the accuracy of their technique, Fung and his colleagues used the Enron Email Dataset, a collection of over 200,000 real-life e-mails from 158 Enron Corporation employees.
Using a sample of 10 e-mails written by 10 employees, 100 in all, they were able to identify authorship with an accuracy rate of 80 to 90 percent.
Fung says the past few years have seen an alarming increase in the number of cybercrimes involving anonymous e-mails—e-mails that can carry viruses, transmit threats or child pornography, and facilitate communication between criminals.
In developing the identification technique, Fung worked with Mourad Debbabi, a Concordia expert in cyberforensics, and doctoral student Farkhund Iqbal.
“This is the result of interdisciplinary research,” says Fung. “We were trying to use data mining techniques to solve a real-life problem in cyberforensics.”
Their findings were published in the journal Digital Investigation.