Link to the paper: https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-becker.pdf
Summary
The study took 100,000 staff and students from a major university and studied the strength of user passwords under a new password scheme. The novelty in the university's scheme was that they would vary the lifetime of the password with its strength. The study through this scheme observed over 200k password resets over the course of 14 months and came up with some interesting insights which didn't exactly line up with what other studies out of CMU had observed. The study found that the stronger passwords were the more likely they were to be reset along with a deficit in password strength among users who forgot their passwords more than once a year.
What I liked:
I thought that the following was a very interesting insight "Users who reset their password more than once per year (27% of users) choose passwords with over 10 days fewer lifetime, and while they also respond to the policy, maintain this deficit"
The fact that they were using 100,000 enrolled. Users and 200k passwords over 14 months gave the study a really good data set to go off of - they noted they are probably the largest study of this type
The study did not alter any of the regular systems of the university. "We were not involved in the design of the policy or the choice of password strength estimator"
New users are constantly coming into the system, so it creates different test groups. Students who transitioned over from the old system, and new users - it would be interesting to see how their password strength is different
I really liked the fact they looked at different password tiers, ie certain passwords are a lot more valuable to hackers than other passwords.
What I didn’t like:
The study acknowledges off the bat that Shannon entropy isn't the best way to measure password strength but uses it as the base of their study
The study makes use of 93 anecdotal interviews - which I'm not sure how anecdotal information is
Why wasn't an industry standard password cracking estimation method like zxcvbn used as the basis of this study as opposed to Shannon which doesn't necessarily correlate to password cracking strength.
The data collected on the study wasn't the user's passwords but instead a single number for the user's password strength. That indicator of strength really doesn't get a lot of scrutiny in the study
The study has very different outcomes to a similar study done by CMU - this kind of underscores the fact they had really weak data given that they acknowledge that their method of what they were seeing from the passwords the system was giving them "only weakly correlates to password strength"
Points to talk about:
"The new policy took over 100 days to gain traction, but after that, average entropy rose steadily" - why did it take so long to gain traction? What could be done to shorten that amount of time?
The study casts doubts on some debunked myths on when it is time to change passwords. It begs the questions what merits a mandated password change - maybe taking into account their could be unknown compromises on the system
For password strength estimation why dies zxcvbn error increase as the password bits get longer? Are there any other algorithm's that can determine password strength?
The study makes the point that users don't want to go through a wide ranging security check unless something of value ie money or data is stored in that account. But doesn't every account that you have contain data on the user logging in?
The average age of the staff member surveyed in this study is 34.6 which strikes me as pretty young. Did they get a good representation of the entire university faculty, or was it skewed toward younger members?
New Ideas:
How significant is the tradeoff between security and convenience. Do companies who employ stricter security measures actually end up with fewer customers?
The study makes the distinction in password strength between systems with hard and soft system security transitions, how does the transition type effect security outcomes?
The study makes distinctions between different types of users in the system. I think it would be interesting to track how strong the passwords are among the different user groups
I think the very basis of how we define password strength is flawed. With some using zxcvbn, this study using Shannon, and others using different methods? It begs the question what is the best way to measure password strength going forward. This study says there are more intensive methods but says they were infeasible to be deployed real time when the user is making his or her password.
How does the password length correlate with the different user tier who made the password? I think this would be an interesting follow up study.