musings between the lines

there's more to life than code

user profiling

| Comments

User Profiling

Nearly every system that has any type of persistent user identification needs a profiling system. I’m in the process of writing a framework (more on that at another time) that will let me kickstart future Java projects with a sound base like data storage and retrieval, json based REST APIs, and agnostic web framework containment. So naturally, a starter base for a user system would be something nice to have.

The question then becomes how to write it in a generic manner that wont be too specific to a particular instance of an application but also not too generic that it starts to become too nebulous and not tight enough for use. So choice have to be made.

Part of that will be that the user subsystem will probably be an abstract implementation. It’ll contain the basics that every profiling system should have, and then leave the details of the rest to a higher level implementation. This should achieve the goal of saving time getting the structure in place and the rest becomes filling in the implementation details. The reason for even wanting a user subsystem in first place is to start to enable some basics editing authorizations for the other systems in the core framework. I’m hoping that having, at least in abstract form, a more concrete user system will allow me to start to issue these authorizations at a lower framework level that can then float up higher as more pieces are implemented. Less work later on if a foundation is laid for the basics. Or something like that.


I just wanted to start to review the general pieces of a usable user subsystem and what it should have in order to act as a reasonable base for most projects. I’m willing to have it be a little opinionated in order to satisfy most projects that might need this type of thing, but also willingly exclude some the edge case projects. I figured with those, I can actually fork and refactor the code as needed instead of extending and implementing it.

I’ve designed and built profiling systems in the past, but they were rebuilt each time for each project as needed. Also, being internal to a company, there were vastly different data association requirements (no need for name/email verification, tertiary data store for person information, etc).

At the moment, this subsystem is just getting off the ground, so I haven’t decided yet how granular it will be. Do I store some very generic preferences with the main object? Do I split those out to a separate named object? A generic preference association system? How detailed should it become - track login times? Track it with location information? Keep a history of all login timestamps and duration?

It’s going to be useless to accommodate all possible combinations between projects that want that type of granular information and others that don’t, hence I’ll probably just decide on what most of my projects may need and start from there. In the end, if I build into it a concept of a generic preference storage system, it may suffice for most cases. We’ll see.


So, here are some of the data points about a user that would be convenient to have in a profiling system:

  • Unique System ID - Internal identifier
  • Unique Username - External identifier
  • Display Name - Common societal name
  • Email - More for contact/news delivery purposes
  • Email Verified - Just to make sure the user is who they say they are
  • Is Admin - Simple administrator flag for special access
  • Date Created - Date account was created
  • Date Updated - Date account was updated by the user
  • Date Touched - Date account was accessed by the user
  • Date Deleted - Date of deletion request (if deletion is delayed or if the account is simply removed from visibility)
  • Date Logged In - Date of last login.
  • Associated Accounts - 3rd party accounts used for login. From the Authentication Module
  • Preferences - Simple key/value preference pair storage system

Kill it with fire

One normally salient piece of information missing is “Password”. I’m not going to use one, nor an option for a traditional login. Everything is going to be dependent on some form of 3rd party login that the user should have access to. I do not want to be in the business of storing and securing password information, especially since it raises a lot of security concerns, and it feels like a disservice to store it, even if salted, because it forces to the user to have to create and maintain yet another password, which will undoubtedly be a copy of another password from another site.

By eliminating it, it protects this system from compromises elsewhere, and protects other systems from compromises to this one. So instead, I’ll rely on login via sites like Twitter, Google+, Facebook and even Mozilla’s Persona system in lieu of having a direct login. They can offer the ability to remove access to the project should a compromise occur. It’s a tradeoff with a reliance on a 3rd party system and better (perhaps falsely optimistic) security. It’s the day and age of the interwebs, we’re all going to be connected and networking is almost ubiquitous, so it’s a good time to start taking advantage of it.

Will this cause some issues further down the road should these systems be offline or meet their hype demise? Possibly, but I think some of that can be mitigated by enabling the user to tie in several systems together to offer a variety of ways to get into their account should their preferred one go the way of the dodo.

At any rate, this will be one of the opinionated ways in which I’ll be designing this system to see if it’ll be something that can be sustainable.


Let me know of any additional thoughts on what should be here. I’m sure there’s a lot to add or perhaps some things that just are plain wrong that I’m blind to.

It’s been eons since I’ve written a profiling system… Image sourced from somewhere completely random