Joel Joseph

October 17, 2018

Research

Experimental Security Analysis of a Modern Automobile

Joel Joseph

October 17, 2018

Research

Link to the Paper: http://www.autosec.org/pubs/cars-oakland2010.pdf

Summary:

This paper explains the transition of how cars which were once entirely mechanical devices have made the transition into the 21st Century to become more digital and computerized. The reality of this transformation is the number of subsystems that contain computers. When the first computers were introduced into cars it was in response to regulation such as the Clean Air Act which required pollution control. But since then most cars now have 60-70 Electronic Control Units which contain thousands of lines of code. This paper looks at how to exploit 2, 2009 vehicles in 3 different states.

What I didn't like:

The study only explored 2 cars - both of which were form 2009 which begs the study could be outdated
The study noted that modern EV's are much more prone to having a lot more ECU's given their specific hardware requirements but never really flushed this out more. I think this was bad given that EV's are supposed to be the future of automobiles.
The study says its not clear if auto designers made their systems in expectation of an adversary - I think that this could have been solved with a very easy survey that could have been included in this study
Study is really general ie "Looking forward, we discuss the complex challenges in addressing vulnerabilities in the context of the automotive ecosystem" (loosely quoting)
The study is very specific and doesn't go after trends in the industry when noting cyber security challenges - this means its very hard to draw lessons for the general industry

What I liked:

The study paints a clear story of why cybersecurity hasn't kept up. We went from no computers to 50-70 computers really fast.
The study really explained a lot of the different ECU's and how they differ in. a bunch of different cars
The study explores a lot of different facets of car security which aren't just the automatic go to's when you think of a car ie just onstar
The study makes sure to consider actors such as "car tuners" who might not be malicous actors but want more custom control of their car
The study explored experiments on the car in 3 different settings ie (Bench, Stationary, and on the Road)

Points for Discussion:

How easy it is to infiltrate ECUs as a regular user?
Are modular cars a thing right now? How long until I see one on the road in mass production?
Has regulation in the car industry which led to the introduction of ECU's been a positive or negative in the realm of cybersecurity?
What actions in the short term can be taken to help secure cars against future attacks?
How long until we see cars as a major vector for cyber attacks against people?

New Ideas:

How hard would it be to patch ECC bugs with over the network updates
Given the vulnerabilities in cars is it best for some critical systems to stay analog on cars?
Would consolidation of the vehicle software industry improve cybersecurity for automobiles?
Can OnStar be used to leverage new cyber protections for cars - can we learn anything from how they keep their data safe
How does the necessity for cars to be serviced by 3rd parties compromise the vehicle's security?

Joel Joseph

October 17, 2018

Research

Comprehensive Experimental Analyses of Automotive Attack Surfaces

Joel Joseph

October 17, 2018

Research

Link to the Paper: http://www.autosec.org/pubs/cars-usenixsec2011.pdf

Summary: This paper takes a different view on how cars could be compromised in the modern era. Instead of focusing on internal threats that could arise from physical access to the car, this paper starts to pivot the academic discourse toward more external attacks that could happen wirelessly. The fact this study points out a lot of similarities between models of attacks proves there is a lot of work to be done in this field.

What I liked:

The fact the paper stayed away from internal issues with cars meant they were going for a harder study which brought more of a different view
The paper made sure to stick to very practical attacks and stayed away from the very abstract
The study identifies common similarities between different vulnerabilities - which means this paper isn't really narrowly tailored on one product
The paper put realistic limits on the adversary in this paper - it kind of makes me more concerned about physical attacks to be honest but I think they portrayed an accurate attacker
Paper gives a lot of different examples of how a 21st century vehicle could be attacked

What I didn't like:

Take away from reading this paper is that the biggest threats to my car come from people having physical access to my car
I've read a lot about how some of the software updates behind vehicles are hampered by the deal relationship in the US - I'd love to read more about that from a cyber POV - which really wasn't touched on in this paper
I don't really consider the short range wireless attacks different from having physical access as they are both proximity based - and that would deter most attackers
The paper focuses on attacks that could gain arbitrary automotive control which I think is kind of a low bar to set for an attack paper
The study doesn't really dive into how integrated auto supply chains are ie one company could be providing the chips for multiple companies who make cars

Points for Discussion:

The paper mentions the tradeoff in distributed computer systems between efficiency, safety, and cybersecurity - in the context of autos how should this tradeoff be weighed?
Is their a hierarchy in what to defend first in an automobile ie audio systems vs drive train? Does the security approach differ?
By when will we see a rise in these types of auto attacks?
How available are auto manuals and parts online? Is it easy to reverse engineer with all the documentation out there?
How does the development cycle for auto security differ as compared to other tech industries?

New Ideas:

How can we make cars less prevalent to long range attacks
Is there a way to randomize the addresses of every vehicle to prevent targeted attacks of vehicles? Would this deter people from carrying out attacks?
How much time does the average car patch take to deploy compared so say mobile devices? How does this effect safety of vehicles?
Can we add secure hardware to augment vehicle security through the federally mandated OMB ports?
What steps can the auto industry take as a collective to find future vulnerabilities in their autos?

Joel Joseph

October 10, 2018

Research

Rethinking Access Control and Authentication for the Home Internet of Things (IoT)

Joel Joseph

October 10, 2018

Research

Link to the Paper: https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-he.pdf

Summary:

This paper tries to ascertain how exactly an IOT connected home of the future might look - especially in a world that is no longer gives privileges by device like in the past but instead privileges by capability. IE unless you are a teenager you may not unlock the front door. While this paper does do a good idea of getting a lot of opinions on how people might feel about different roles getting different capabilities, it seems like the paper had some issues in the generalizability of the study.

What I Liked:

The fact the study used a 425 participants in order to carry out this survey
The study very early proves how home IOT devices are inherently different than those other security models are tailored for because IOT devices many times are shared
The paper suggests a new paradigm for device security tailored to IOT devices specifically one that is capability based
The paper prepares for insider attacks - which is many time under looked and will play a major role going into the future for people like domestic abuse survivors etc when IOT devices are used in the home
The study very clearly focuses on the general population as a whole as opposed to early adopters to prevent biasing the study

What I Didn't Like:

The fact the study was done online, meaning that is very hard to verify these people were real and actually put some thought into their responses
I feel like the study's focus on creating some type of default setting is bad because a lot of these permissions really are based on the family
The study referenced that they evaluated future capabilities that were "likely" to be deployed which really doesn't seem sound
The use of free text responses which become very hard to evaluate and display in a concise matter
The study does conclude that they do have issues with the ecological validity and generalizability of their data given that their surveys were done online.

New Ideas:

Can we have one device or hub act as the quarterback for authentication for an entire IOT ecosystem?
Why don't all new security features focus on having the hub authenticate the orders from the human? Would this single point create a point of failure?
Devoting feature controls very specific to a cellphone - and then using that as the identifier for a specific user in a world where there is multiple users in the environment
For children who are still developing will having less access to control the environment around them ie lights or like temperature affect how they grow up?
Repeat the study with real people instead of an online survery

Discussion:

What obstacles does voice based security need to overcome in order to be implemented in the future?
How do database systems limit capability on a per user basis and enforce those permissions - can those be shared with IOT systems?
Does creating security defaults for permissions open IOT companies to any liabilities in the future?
Should we grant general access to anyone for certain cases where people are in close physical proximity to the device ie lights?
What percentage of households are actually moving toward an IOT type house?

Joel Joseph

October 10, 2018

Research

BlackIoT: IoT Botnet of High Wattage Devices Can Disrupt the Power Grid

Joel Joseph

October 10, 2018

Research

Link to the Paper: https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-soltan.pdf

Summary:

This paper focused on the theoretical attack where an IOT bot net took over an entire system of high wattage smart devices. This seems very implausible due to the lack of a common framework for these types of devices, but the paper points out that by simply messing just a little bit with power demand, it can create major issues for the power grid as a whole.

What I liked:

The study focused not on ways the grid good be disrupted but the costs could be increased for an actor
These types of attacks are almost impossible to detect because of how distributed they are
Good cost analysis. IE simulations show 5% more energy costs 20% more
Good historical examples of times where this type of attack could have occurred ie historical black outs.
A realistic modeling of how many devices the attacker would need in order to carry out a successful attack

What I didn't like:

I think they need a specific term for this type of attack especially as we start fleshing into different types of IOT attacks
Specific instances of the grid attacks happen on days where we have a peak ie Poland in 2008. What other devices are not being used to increase the energy usage?
There is a lack of discussion on things like how solar energy or green energy may help mitigate these issues in the grid
Didn't talk about how new things such as Tesla's batter might be able to mitigate these types of attacks at least until we get more power generation in
I don't think its plausible to compromise this many devices especially when there might not be a common framework connecting these devices.

Points for Discussion:

Why are things like ovens have a wifi connection? Is there such a thing as over connection in IOT
Would it be better for an adversary to take out the grid or to just increase the cost if they were able to economically benefit?
What would the recovery time be for this type of attack?
What key services would be taken out by this attack that do not have backup systems? What would the transition time be?
Costs for retrofitting the grid to prevent against these attacks.

New Ideas:

What steps need to be taken for dynamic power demand, more batteries?
What area's have the IOT device density and wattage to take down the grid?
Are there protocols at the home hub level that could detect this type of attack and prevent them from happening?
What mitigation techniques can be used in order to keep the frequency up? Are there technologies that allow for a great width of frequencies?
It'd be really cool to make a mini grid and test this out, at least for a small scale model

Joel Joseph

October 3, 2018

Research

IoT Goes Nuclear: Creating a Zigbee Chain Reaction

Joel Joseph

October 3, 2018

Research

Link to the Paper: https://eprint.iacr.org/2016/1047.pdf

Summary:

First thoughts on this paper was that this is really really cool - the paper's abstract kind of paints a picture of some type of dystopia in the near future where worms are infecting IOT devices left and right. The paper goes onto transition that idea into real life saying that they made a worm infection like that that could take down all the lights of a typical city like Paris (105 square kilometers). But while there is a lot of novel approaches to compromise IOT devices in this paper, a lot of the broad industry level information has already been touched on in past papers and this paper mainly focused on a single product and how to exploit that one product. That said I really enjoyed this paper - but I don't think that much can be taken away from it now that the manufacturers have patched what made the exploits very effective.

What I liked:

The paper identifies the key design flaw in Phillips products - that an attacker will not get in close physical proximity to one of their products
As I was reading how the smart lamps can be reset - it was a natural progression in my mind that these types of attacks can be carried out on some type of moving platform. I really loved how the paper addressed this and did its own tests
The entire attack was done with off the shelf activity - there was some custom implementations that needed to be done but it isn't out of the extreme of a hobbyist.
Very in depth explanations that don't go so deep into the weeds Ex: how they loaded the OTA image in to the chip by only setting a flag and knowing the offsets.
Paints a picture of the future in an IOT world and shows how future threats of network jamming, data exfiltration and denial of service attacks might happen

What I didn't like:

For the calculations on the attack on Paris - the paper makes the assumption that the smart lamps are randomly distributed around the city which I think isn't really likely. My assumption would be newer districts would have the smart lamps concentrated in their area while older ones would not have them at all - this prevents the worm from spreading and keeps the worm localized.
The paper points to issues with Phillips implementation of security measures, but doesn’t really target what the industry could do better to prevent these types of attacks in the future
It seems like the major possibility of a worm was brought up in O'Flynn's research. I think that this is a specific implementation of that worm - but I don't see how this paper is extremely unique from past papers.
Since the exploits have been patched I'm not really sure how this paper benefits the community as a whole now
They didn't actually build the worm and see how it would have worked even in an isolated setting

Points for discussion:

What does it mean to really be "adjacent" in an interconnected world of IOT devices?
How will the industry standard of IOT devices communication cryptography change in light of these exploits?
How did O'Flynn propose that a worm attack IOT devices in his past papers - how does that differ from this paper?
How did Phillips change the infection range to less than a meter to prevent these attacs from happening in the future?
Is type of attack novel to devices all on one network - or is adaptable to many different networks?

New Ideas:

Research what other IOT devices are similar to the lamps and can fall victim to similar types of attacks
Is there a way to dynamically change the master key for the ZLL certified products so that if the key gets leaked - it doesn't open the products up to new exploits?
The paper mentions that the Harvard architecture of the IOT devices in this case made it extremely hard to pass a software exploit through traditional means - what percentage of IOT devices have this? Are their alternatives that are more cost effective?
It seems like the worm is only possible due to fake firmware updates - is there a way to create a secret key for each device that is with in the Zigbee protocol that can act as a backstop against leaked master keys?
Is a common update framework helpful or hurtful in the spread of worms?

Joel Joseph

October 3, 2018

Research

SoK: Exploiting Network Printers

Joel Joseph

October 3, 2018

Research

Link to the Paper: https://oaklandsok.github.io/papers/muller2017.pdf

Summary:

In a room full of people, which each person being representative of a cyber security threat - this paper is probably the quietest of them all but probably has the most potential in the near term to do a lot of damage to the world if not taken seriously. The paper brings attention to how printers are usually very unprotected both in terms of a network sense and in terms of a physical sense meaning that they are open to a large amount of vectors of attack. The team uses open source software on a variety of printers to steal or corrupt data in a way most people wouldn't imagine a printer could do.

What I liked:

The fact the study brings a lot of attention to a problem that most people don't think about - that printers are unprotected but carry a lot of confidential information
The paper went for common trends in attacks ie the way they described most attacks on printers were on the implemented interpreters - PostScript and PJL.
There's a lot of different attacks that are looked at in the paper - really broad offering
The creation of the Printer Exploitation Toolkit seems very novel way of creating an attack framework - I'm glad that they used something that was open source instead of propietary in this study
The paper definitely builds on a lot of prior work - as seen in the references there are like 65 other papers

What I didn't like:

Over use of arbitrary acronyms in this paper kind of made it hard to read when I had to constantly flip between pages to remember what an acronym stood for
Study was constrained by the donation of old printers. It begs the question of if these printers were representative of printers currently in use today?
The paper considers certain attacks out of scope such as of any active network attacker was controlling the communication between the end user and the printer
Again the paper kind of went out against specific brands in certain places and doesn't really touch on what the industry as a whole could do to improve
The paper I don't think focused enough on how malicious fake firmware updates could become an attack vector in the future - especially after reading the Zigbee paper

Points for Discussion:

What other unprotected high access devices are in company networks?
Would industry specific rewards incentivize more white hat hackers to traditionally neglected cyber security fields?
Why can factory resets be done over the network? Why not restrict it to physical proximity?
How and why are credentials stored on printers?
How big of an issue are printer attacks currently?

New Ideas:

Does the diversity in printer manufacturers and their implementations make it harder to find a silver bullet hack?
I think this study needs to be redone with modern printers that are coming out today and are kept supported through firmware updates
It might be interesting to classify attacks on printers by hardware and software attacks to see if there are any other common trends that could be extrapolated
Is there different levels or tiers of security across different price points - we could do a study between personal and enterprise printers
Could proprietary information be stolen from 3d printers in an industrial setting?

Joel Joseph

September 28, 2018

Projects

3rd Place: UCLA IdeaHacks

Joel Joseph

September 28, 2018

Projects

This project was made at the IDEA Hacks hardware hackathon hosted at UCLA - this year’s theme revolved around household appliances.

The night the hackathon began - I was grabbing dinner at the In-n-out with the rest of my team and remarked “I wish i made burgers like these back at home, but lets be honest I’d probably burn down the dorms at USC by accident” My buddies knowing how bad of a cook I am first hand continued on with all the bad times I’ve had with cooking and concluded that I should never be allowed any hot surface. And it was right there that the Safe Stove was born.

The SafeStove revolves around the idea that there are certain people (ie kids, Joel’s of the world, etc) who should never be in front of a flame.

Cool Features:

1) Facial Recognition Technology - We actually trained the camera using a facial recognition API on our raspberry pi to detect my face using a couple photos of me. Essentially when I wander around the stove (as seen in the demo) the stove notices and automatically turns off the stove.

2) Ultrasonic Sensors - Sometimes you forget to leave the stove on, you’re a human and it happens. We put ultrasonic sensors on our stove so it can detect when you are no longer in the kitchen - and automatically turn off the stove.

3) Capacitive Sensors - When your hands get too close to the stove, the fire is automatically turns off

4) Interfacing between 2 Arduinos and a Raspberry Pi - This was my first time hooking up two different systems together and it was a pretty interesting experience. My team spent an hour troubleshooting why signals were not being communicated between the boards before I realized the boards were not sharing a common ground line.

5) Completely Joel Proof - Like this is probably the most impressive feature if we are being honest. I could not figure out a way to get burned on this stove even when I tried. And if I couldn’t figure out a way to get hurt on this stove I can almost guarantee no one else would.

Here’s a video of the first round presentation we gave (click here)- keep in mind we had gotten something like 8 hours of sleep over like 3 days:

If you’d like to see the pitch deck click here!

At the end of the Hackathon my team of USC freshman and sophomores took 3rd - the two teams above us were made up entirely of UCLA graduate students :(

But over all it was an amazing experience building this and really having fun with the rest of my team.

Our team mate Jan, had to leave early bc she caught a cold at the hackathon. But honestly I loved my team so much. From left to right (Chris Hailey, Joel Joseph, Ishan Shah, Aditya Bellathur).

Joel Joseph

September 28, 2018

Projects

Wall-E By Blockchain

Joel Joseph

September 28, 2018

Projects

I present the WALLE outfitted with sensors that allows it to drive autonomously and not crash into things

Inspiration

When walking over to the hackathon yesterday morning our team noticed how many parked cars there were on the side of the road not being used. It made us really think about how much cars are actually used and if going further into the 21st Century if we as college students would really need to buy a car in the future.

After all in an age of Uber and Lyft its now a hassle to own your own car. But the problem is everytime we call a car a decent amount of our fare doesn't actually go to the price of the car and gas - but instead to the companies Lyft and Uber. This added price could be the difference between someone buying a car they rarely use for convenience and a world with less cars on the road.

So we thought of a way to almost run this autonomous car sharing service with accountability or trust in a way that was decentralized and immutable.

What it does

Essentially we have a pool of autonomous cars (Wall-E's) essentially in waiting scattered through out a community or a city. When someone wants to use a car to get to a place - they simply log onto our WebApp which is tied into the blockchain - and rent out the car. From there the car is now under their control from the WebApp. They have the ability to move it wherever they want - keep in mind though the car is continuously collecting data from sensors (ie Ultrasound, GPS, Sound) - so if you crash the car it will be clearly logged in the Block Chain in a way that cannot be tampered with. When you are done with your car simply tell the Web App and the car is no longer your responsibility.

Now we wanted a system that did not need a centralized body controlling everything - because we believe that in a sharing economy, no one person or entity should be making the lion-share of profits.

How we built it

The server is responsible for handling all the data that is sent from the sensors on the car and saving it to the blockchain. Further, the web server is responsible to serve the web app to show the data and manage the ownership of the car.

Blockchain. All the data is saved on our Ethereum blockchain using Smart Records. One checkout is considered to be one entity in the contract. We save the checkout and return time as well as the sensor data as part of one contract.

We ran a python script with a GPIO Library to access the pins on the raspberry pi for standard time processing, location gathering, sending get requests, etc. We then used an Arduino to run the motors for the car in a way that allowed it to read data from the ultrasound and run autonomously.

Challenges we ran into

Security. The CCI talk made it very clear that things like DDOS attacks were now a reality in a world of IOT. So we needed a solution in order to solve these types of extremely serious attacks and we found our answer in the blockchain. With BlockChain the data is always accessible and distributed meaning if they take down our node the entire system will still function.
Saving data as part of the smart contract was really tricky. I ran into the issue of mining and getting some ethers transferred into the main account. Developing a simple smart contract was easy, however creating all the events - checkout, return and adding sensor data was tricky.
Getting the Arduino and Raspberry Pi to talk to each other. Troubleshooting and working through the interfacing.
Working with WALL-E and getting him to move and talk the way we wanted him to could be pretty difficult at times.

One of the things that took away from the complexity of the project was the fact that we only had one WALLE so that made things just a little bit easier to implement for this project.

Accomplishments that we're proud of

In terms of technology we are really proud we built something on the blockchain. It was a lot of fun creating smart contracts on the online IDE - Remix. We are proud of the fact that we have a scalable solution to get cars off the road in a way thats cheaper and more sustainable than the way the world currently runs ride shares. Our team seriously sees this as a mobility solution not only for common people in cities but for the elderly, and disabled who currently have a hard time moving around cities with current public transportation.

What we learned

How to build WALL-E. Developing smart contracts using Solidity. How to get cars off the road while making public transport more available to people who really need it.

What's next for WALL-E By Blockchain

Next steps are spending a little bit more time and creating a mobile application that would work a little bit like a lyft or uber application making it really easy to request cars. We could also add more WALL-E's or actual cars to this system. We also think working with AI we could get these cars running purely autonomous meaning we can lower the price point and make it more affordable to anyone who wants to use this car service.

Built With

My team and I taking over a TV in the building to set up our Raspberry Pi for the WALLE

Joel Joseph

September 26, 2018

Research

Cloudy with a Chance of Breach: Forecasting Cyber Security Incidents

Joel Joseph

September 26, 2018

Research

Link to the paper: https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-liu.pdf

Summary:

This study continued on the trend of papers looking to branch out of the normal cyber research and focus on the prediction of cyber events. Specifically in this case the study looked at 258 externally measurable features that made up a security posturing profile. From there the study made a model that tried predicting future cyberattacks using this model. One of the major flaws of this study though was the data they were using - in the end they only had enough data to test and train one event type web applications. Furthermore the study had a higher false positive rate than other methods of prediction like RiskTeller which has been proposed in other papers.

What I liked:

The study encompasses 258 externally measurable features meaning that there are a large amount of observation data the model can use
I really like the anaology to the patient in prediction vs detection - but I think there is a distinction that needs to be made. Prediction is a lot more valuable than detection. If a patient is sick and a doctor detects the sickness he or she can give medicine to make the problem go away - while in the case of cyber by the time you detect a problem the damage could be already done. So being able to predict where cyber attacks occur and being able to shore up defenses in a cost effective matter is a lot more needed than detection
Large amount of hacks from different event databases creates a diversity for the model to learn from
The study does a good job of weeding out attacks from their data that had nothing to do with security posturing ie internal attacks
Testing training data was done chronologically meaning the testing seemed more real life

What I didn’t like:

The study reports a 90% true positive rate, and a 10% false positive rate which is less effective than a lot of the other papers - specifically the Risk Teller paper had a 95% true positive
I don’t think the study defines how they evaluate what counts as a malicious activity in their security posture data which kind of begs the question of what exactly they are predicting
I don’t like the fact the study uses a collection of datasets that are off by a couple of months it seems like the data is disjointed and might not paint the right picture
One of the major issues with this study is that they claim to offer a snapshot of security that doesn't change to much month to month as compared to day to day snapshots that are in other studies like RiskTeller. I think this snapshot is even flawed because the data doesn't overlap in time meaning that the snapshot might not even be clear.
The study says the only incident they had enough data to test and train is the system for web app incidents - which means they really didn't come up with much with their data

Points to talk about:

This study has a higher false positive rate than the RiskTeller study - what methods from that study made it more effective in reducing false positives
Do cyberattack types vary from country to country or are they globalized ie the same across countries?
Are hosting companies like Go Daddy following the best cyber practices - the study omits web hoster's name's from the study to prevent biasing their model. But it begs the question why their names show up so many times in the attack details.
Why are attacks from the WHID Database detected less often than other attacks in Figure 6?
Does size of the network increase or decrease the risk of an attack?

New Ideas:

Look into creating a study that uses all 3 of their datasets: mismanagement symptoms, malicious activities, and incident reports at the same time instead of staggered like this study
How would this study change if we included things like internal attacks which were left out in this study. I'm sure there are posturing techniques that restrict the ability for an internal attacker to really hurt the company?
How would the model change if the study kept the hosting information?
Does relying on multiple databases make this system more reliable in prediction? Can we create novel testing data and compare outcomes with other prediction methods such as Riskteller that only use one source of data?
Reconduct this study with a bigger data set so they can have valid models for different event types

Joel Joseph

September 26, 2018

Research

RiskTeller: Predicting the Risk of Cyber Incidents

Joel Joseph

September 26, 2018

Research

Link to the paper: https://acmccs.github.io/papers/p1299-bilgeA.pdf

*Also just a heads up there is some crazy cool data being used in this data courtesy of Symantec Research Labs

Summary:

The real novelty of this paper is the fact they went down the road less traveled in cyber security - prediction, as opposed to the 3 main paths: analysis, detection, and prevention. This paper is also really timely given as cyber attacks create more and more harm economically for companies - companies are in the market to buy cybersecurity insurance which many times does not have accurate and up to date models. This paper therefore creates a novel analysis tool that is able to detect so called "infected computers" at a rate of 95% with a relatively low false positive rate. The problem though is their definition of "infected" might not actually be valid given they weight a high amount of their modeling on the amount of unique files a device has which just really isn't realistic when considering specific user types such as developers.

What I liked:

The study uses a lot of very comprehensive information ie 1 year of data spread over 18 enterprises and 600k machines with over 4.4 binary file events.
The paper is unique in the fact that cybersecurity research of the past has been on analysis, detection, and prevention while this paper is focused on prediction which has not seen much work
When it comes to the risk prediction they are using a model that prioritizes lower false positives because that is what enterprise industry has demanded in order for a solution to be deployed
Study goes into factors that create malware incidents ie they check if the use downloaded the stuff from home or out of office hours which is an interesting facet
The study uses and builds on NISTS vulnerability Database instead of creating its own scoring system which makes it simpler to evaluate

What I didn’t like:

Very early on the paper acknowledges that the ground truth is the most important thing of this study which is based on observing malicious files and infection records but they acknowledge its nearly impossible to obtain a perfect ground truth with the way they have conducted their study
In the study set up details they only go into depth about how the study was set up for windows and they never really clarify whether the study encompasses other operating systems or just windows.
Probably my biggest issue with this study is how they classify clean and not clean devices. Essentially their methodology is going through the files on a device and penalizing the device if they have unique files. But this really doesn't make sense - say if the device is a developer or something else - you are simply flagging it for normal use which kinda trashes the study.
Again as part of the user profiles the prevalence based features of their data penalizes users like developers who create their own types of files.
The study even though it had so much data really didn't produce that many visualizations;. Out of the 82 factors or profile details they only displayed like 9 graphs of these factors.

Points to talk about:

The paper references a technique to figure out who is vulnerable to phishing emails and have like extra layers of protection around those certain users which is very similar to what one of my classmates is working on.
How does this specific method detect new forms of malware when it only knows what it has seen?
How does the emergence of new malware end up messing with this prediction model? Is it something that is statically priced in at the end or is their other information that can be drawn from in order to make this model more dynamic
Did the system weight any of the factors more than others in determining whether computer was infected or not?
Is simply updating the data set enough to prevent concept drift in anti virus machine learning?

New Ideas:

Maybe we could use the same dataset except reveal the industries these 18 enterprises are in and map how certain industries are susceptible to different types of malware
The study clearly states that they do not seek to figure out the exact causes of the infection - but with the binary event log files it would not bee too hard to extrapolate certain causes for future study
Given that there are different users ie developers can their be more user types in this study to create a more accurate model of what an infected computer looks like? Would this lower effectiveness and/or false positives?
If the system weighted some factors more than others in determining what devices were infected or not can this be translated to a priority list for cyber security practitioners.
Possibly look into exporting this form of analysis into login data to detect false users when signing into portals?