Tim Bourguignon 0:05
What is a good software developer? What do excellent developers do? There are probably as many answers to these questions as developers in the world. So let's ask veterans and newcomers what their story look like. Let's learn directly from them. Welcome to developer's journey. Hello, and welcome to developer's journey, the podcast shining a light on Developers Life from all over the world. My name is Tim Bourguignon, and today, I received Michaels. Hi,

Markus Harrer 0:43
hi, Tim.

Tim Bourguignon 0:44
And we're live at the new offices of my company, but it's still kind of construction work all around, but but it should be fine. The workers are gone. They're not supposed to do too much. Too much noise. So we should be fine for recording. Nice to finally have you. I think we push the interview twice. But finally, we have it really wanted to have you. Magazine. I worked on the same project a few years ago, we just realize it's been more than two years ago. Time flies, Time really flies. And then we we saw each other at few conferences, but now it's kind of lost and lost track. I think the last talk I wanted to see from you was travel. And last year, I couldn't attend. I just busted my ankle. David,

Markus Harrer 1:29
was this year? I think it was Yeah,

Tim Bourguignon 1:31
it was Yes, you're sure. You're right. But I think I saw your your talk afterwards. And maybe, yep. So I know what you're talking about. And I want to get to it. It's about software analytics. And I'm sure we're gonna all we have to talk about it at some point. But first things first, we're here to hear your story first. So can you tell us briefly about yourself, how you how you came to this word, or software analytics, or maybe being on the same project as I was at that time? And, and then maybe working for new company? No, we can read the name, right? Yeah, yeah. So we can for YouTube since a few weeks,

Markus Harrer 2:11
since I think 12 days now, or 12 days is fairly new.

Tim Bourguignon 2:17
So I guess that's just the beginning.

Markus Harrer 2:21
Yeah, hi, my name is Michael Cera. And well, a software developer, and I have some passion around legacy code. Well, I like to really get my hands dirty into all the math, all the chaotic structures of old software systems. And to do that very efficiently, I will leverage some data analysis tools, which I call in the hole soft analytics to find out how the software could be structured, in a way, some kind of better, or how we can identify severe problems in software systems. And this is what I love. And I do this now, it'll give you full term. So how did it get started with some lunatics? Well, fool? That's a really deep going question. So actually, I am doing stuff analytics for the second time. My first experience were during my master thesis, where I try to find the ultimate well, software quality management dashboard, you know, where the manager can see some gorgeous and traffic lights with rich yellow and green to to see how the projects go. But well, during my thesis, I realized that it doesn't work, you can do such a dashboard. But nice thing was that I learned some data analytics software or some some data analytics tools like Python, pandas, and so on. And with this deck, I could really figure out to be problems in software systems. Well, that's how I came to it. Today's software metrics topic. And I have so good experiences now with all the tooling, all the techniques, all the approaches, that I want to spread the words in to the software community, that developers also can do it by themselves.

Tim Bourguignon 4:24
Were you always interested in this in this meta topic over development?

Markus Harrer 4:30
No, no, no. I'm a kind of fan of refactorings reengineering, reverse engineering. But there were some times where I didn't speak in the language of the management. So so all these problems around software systems, and I just couldn't get some patches or budgets or money for fixing all the problems in software systems. So I decided to take some additional classes in In business school and have some, some view or to get us some new knowledge into software controlling or management in general, there are found that you have to translate all the technical problems that are in the system into something that management can really see. So when you look at non technical areas like controlling, or mechanical engineering, you see that, that many people are using data analysis or statistics to make the world invisible, more visible to people that are non technical. And this is for me a good approach. And I want to do the same for software developers. So software development in general.

Tim Bourguignon 5:53
And can you give us an example just too long to put some kind of form of concrete idea behind?

Markus Harrer 6:00
Okay. Yeah, we, once I was in a project where we had many different developers from different providers. And there was a time where we have to downsize the project team. And the challenge was to find out which parts of the software systems were documented good enough to let go of developers, and which parts of the software system needed to read documentation or at least knowledge sharing from one developer to another developer. And, and data analysis look like that you get your version control system data, to figure out which part of the systems are well known to the software developers that would stay in company or stay in a project. And which part of the software needs some real documentation because the developers who mainly maintain the software, or the source code are leaving such analysis, it's really cute to steer your resources in the software company and software company, and to really find out where some hotspots are. Another analysis I did was a performance analysis with a profiler in a Java software system. There, we had a challenge to spot severe performance hotspots. What we did here was we just run a performance or stress test, we recorded all the performance measures with a profiler tool. And then I loaded the so called core graph into a good graph database, Cheves spot the problems of our application and to find out the parts of the software that really ditch all the performance issue.

Tim Bourguignon 7:57
So it's mostly Murphy's, always some kind of form of mixing and matching of different systems that give you different aspects of one problem. And then trying to make sense of the combination, or the the the product of the whole American singers.

Markus Harrer 8:15
Yeah. But in general, the real start is a complete need, or some deficiency of the software. And you are thinking about what data in your software development process or in your software development company can, can show the problems. So you always start by a concrete problem, then you're searching for the data that could support your, your case in this kind of thing. And then you are figuring out how you can transform the data that the data shows really the pain points.

Tim Bourguignon 8:59
How you come up with ideas, or just on just trying to trying to find something in the in the dark, or do you have no no.

Markus Harrer 9:09
I said, This is the second time I tried outs of analytics. So this Dart was really to automate some tedious things. So I had to do some I texture or code reviews in my company, and just what it was like to check if the given architecture rules fit to the actual source code. So we had to open more source code file to see if there are any annotations, special places. What are all the namespaces and package names, so as they should be? And this was, as I said, really tedious, tedious, and what do developers when they are some kind of, well, bored of some kind of task they automated and what I did for this coded it Have you searched for tunings that could search the whole source code bases for a larger software system and spot automatically all the pain points. And this worked pretty good with the tuning I I just newly discovered at this time. And this is where I wanted to know more about it. So could it be that it works for other cases that I actually use? Could it be that the tooling also supports spotting those performance issues? Can I identify as knowledge islands of developers with just a few developers commit to some code bases. And as I delve more and more into this area of certain metrics, I found that there are so many things possible with soft analytics. And this is where we got started into the whole topic.

Tim Bourguignon 10:52
In case you put one thing in there and what for y'all arm yet,

Markus Harrer 10:58
there were some meetings where someone proposed a problem or said, Hey, we have here some severe issues in a software system. And then my brain connected all the different data sources that were available, checked for some correlation of different data sources to bring together and to show the pain points was really interesting. So it was kind of a data oriented thinking that really started in me, and I saw many, many, many problems than for my data's perspectives.

Tim Bourguignon 11:30
So interesting. This is fun. Because probably I wasn't probably in some of those meetings as well. But to say that you were you were looking at those meetings and the discussions on the topic, were discussing from a completely different angle and saying, Well, how can I automate this thing? That's really interesting.

Markus Harrer 11:49
To me, actually,

Markus Harrer 11:51
yeah, this, how can I ultimate This is three that I think the key so if you realize that you as developer are in integrated development environment, and you are browsing through the source code, maybe on specific specific kinds of structures, like your inheritance hierarchy, in the jury for the whole code base, you can also automate this. And there are tools out there that can do it, and just leverage those tools, and well, automated all the things.

Tim Bourguignon 12:25
Let's speak just a bit about these tools, you have to learn some completely new things to be able to leverage these

Markus Harrer 12:31
tools. Yeah, you have to go into the data science area, there are some standard stacks, what technology that everybody is using nowadays for doing some analysis. These are tools like Python, the programming language, the data analysis framework, pandas, plotting libraries, like matplotlib, or even the core of all my analysis tasks, which is a notebook system called Jupiter. And those are really software development agnostic, kind of, so you can use it for analyzing some

Tim Bourguignon 13:08
job making big moves with your arms, to say

Markus Harrer 13:12
everything we can well, you can analyze some biological data, chemistry data, or mechanical engineering data, process data, what do you ever like? And I think I just realized that you can also use those tools for software data. And that's it.

Tim Bourguignon 13:32
Whatever, you can fit it in there.

Markus Harrer 13:34
Yeah. And this is what makes really fun. So you, you're trying to figure out how can I transform the data that is available in the software area, to, to fit into those center data analysis frameworks. So this is really a good, it's a really nice thinking, exercise after work, for example. And it's also a great motivator for me. So this is the kind of data science standard stack that I'm using. But there's also more. So if you get into the source code itself, you see that everything in source and the source code is kind of related to each other. You have, for example, Java packages, the packages contains Java classes. The Java classes themselves have a specific hierarchy, inheritance hierarchy, for example. So for classes have fields or methods, and each of those sub elements is connected. And this is where I started to use tools like chikyu systems and the graph database you have what shape which kind of really store all those factual information that you have in your source code. And then it's like an automation have an integrated development environment, where you just say, hey, I want to navigate on those fractures, please Neo for che or Chuck Houston, do it for me and spot the awkward issues that are that I have in my mind.

Markus Harrer 15:20
Am I getting this right?

Tim Bourguignon 15:21
You're kind of doing an analysis code analysis with these tools, and then getting the output of that to mix and match with something else,

Markus Harrer 15:30
almost. So the start is, again, the issue that you have with your source code, and you can find your issue manually with the integrated development environment. But how did you come to this issue in your source code, you've navigated almost always on structural information. So click for example, which method calls yet method?

Tim Bourguignon 15:59
You just navigate from one call to the other and try to get your way through multiple stacks, until you get to the real problem.

Markus Harrer 16:09
Thanks. Yeah. So what did you use, for example, you navigate between all the methods, and you find the spot in your source code that contains the issue. And one result of such analysis is, for example, a list of what you get as a result is, for example, a list of metals that contain a certain issue. So you're just checking for the problem manually for the first time. And then you query your code to find all the other problematic spots in your source base, or source code base.

Tim Bourguignon 16:45
Basically, you find it once and then try to replicate and see where it happens on

Markus Harrer 16:49

Tim Bourguignon 16:50
Yep, exactly. And this you get kind of list as a result of your data set. And either you can work with this data set right away, or you have to map it, then you for j with something else to just get a different structure and see where it comes from, or if it's correlated to something.

Markus Harrer 17:08
Yeah, exactly. So basically, it often is enough to just have this list of potential problems. And then you can see what the real problem is. And then you can start developing some recipe for the solution. And then you can take the list of problems together with the recipe, give it to a developer, and he or she can fix all the issues.

Tim Bourguignon 17:34
That's cool. And how is the acceptance from the developer community with this? I mean, from from, from the point of view of a developer, you're kind of doing metalwork over the codebase. Which is fun, certainly, but not necessarily where I would like to be doing on my on my day to day job, I'm trying to produce something that solves the problem for customer. And you kind of going meta on top of this, and maybe kind of overseeing or checking problems in somebody else's code base. How's the acceptance both on both on the on the user end, meaning when you come with an idea, but also on Hey, I could be doing this myself? Besides,

Markus Harrer 18:19
yeah, so the acceptance on the developer side is a little bit tricky, because there's a high barrier to get into the topic. So you have to learn all those toolings around data analytics or soft analytics in general. There is a there's certainly a need for tutorials, or some other workshops that can enable developers to do it in their own environment. Yeah, but I think when developers did such analysis, once they are getting into it, like they are taking drugs for the first time. And they are moving on and fixing or identifying more and more problems, enhancing their analytic skills, and such kind of a self fulfilling prophecy, if they do it, always in their daily work. And I think if you are aware of this barrier to use the data analytics tools for the first time, you are really getting the ideas and you will really love it. So the other question was, how are the results accepted when you do size analysis? This is a little bit it's actually a good story. So in my talk, I always have some kind of a dangerous slide because on one hand, you spot really severe problems that could cause some disturbance in the employment of some people. Because you are really delivering real data and real facts that are an easy maybe. On the other hand, you should also be very careful, because if you analyze some data that's private or has some relationship to your employees at work, this is maybe a thing where a work council will not look away. So always be careful to present the data that you can do any performance tracking of some people you're working with. But overall, if you present the results, there is a huge acceptance if they are really solving the Gordian knot or something like this. So I'm always trying to figure out first, what is the elephant in the room. So a problem that nobody wants to talk about, but is really a problem that could let fail your software project. So you take this to be a problem, that make it more understandable, more addressable by doing a data analysis on those problems, to figure out what's really behind the problem. And to give also the security or try also to give some safety that you can solve this problem. Because if you have identified a problem in an automated way, you really, you're really filled with a problem. And it's more easily to solve it afterwards.

Tim Bourguignon 22:09
No, it makes sense for what you were searching for. I like this, the sentence, you never look good by making someone look bad. Which is kind of a form of mantra that like to work with. Never make somebody bet that that never does. Good things. And this is one step further, because there are some legal legal grounds behind it. Yeah, but it makes sense. That's, that's interesting. You mentioned before, that there's a lack of tutorials. I see you just put your foot in the door in this direction, your blog, this kind of offer tutorial in itself. And this talk you've been holding at many conferences, I have no idea how many, it's kind of the first, the first step in this direction. Do you have something else planned? Or what would you describe on your blog? Maybe first?

Markus Harrer 23:08
Yeah, my blog, I'm just kind of writing coding cutters like this. So I have an idea. And I just want to figure out, how can I grasp this, this problem that I'm searching with all the data that is lying around in software development, and I'm just coding, like afterwards some, some, uh, trying out some to you algorithms from a data mining area. Somehow true results occur. And I just, I want to figure out if they are correct, and write about it, to make sure that they are kind of correct and block and publish it on my website to get some discussion started. So that's the main point of my blog. So to to learn in

Tim Bourguignon 23:58
public server, what I read was on, you just take a public repository, some some data that is publicly available, and then crunching and crunch it on your on your own, and then kind of make a cool result out of it. I think that's mostly what I've read.

Markus Harrer 24:16
Yeah, but most of the blog posts originate from real problems in a real software system, but I can't really write about it in public. So. So I'm always searching for kind of similar data out in the open and take that to show what you can do and how you can make problems visible.

Tim Bourguignon 24:43
It's kind of a showcase, I think, yeah, it's showing what's what's possible with all this.

Markus Harrer 24:48
That's cool. But I also have some tutorials on my blog, but I think it's one. Okay, I've wanted to try my luck, and I should certainly do more. But well, I don't know how exactly to start or what's needed. So what I'm trying now is to give another version of my original talk. It's more like data science bit softer data, where I want to give software developers first hints, what is needed when it deals such kind of analysis, how you can arrange data, or how you how you have to arrange your data that it fits into the data analysis tools, and how you can move on and make real progress on your own.

Markus Harrer 25:45
I think you should do more more.

Markus Harrer 25:48
Yes, sure. But as the problem is, there are so many cue problems out there that I want you to analyze and to see if it works with the standard with the status with the data analysis tools that I'm using. And it's not much time to go back and explain how you can do it on your own.

Tim Bourguignon 26:13
just just just thinking out loud right now. Would there be a way to put your Jupiter notebook online visible for everyone's read only visible for everyone to just browse through all the all the the cases you analyzes already and see how you did this?

Markus Harrer 26:31
Yeah, actually, I do this. So what Um, so I'm writing my blog posts in Jupiter as notebooks. And then I push them publicly onto GitHub. And then I take the HTML output and place it onto my blog.

Markus Harrer 26:51
Okay. And there's

Markus Harrer 26:53
always a link to the original book. And in most cases, there's also the data publicly available, or at least the steps are shown how you can get the kind of data set that I'm analyzing. So the core idea is indeed to enable others to repeat my analysis,

Tim Bourguignon 27:15
okay, I realized that I'm always reading you for vi SS. So I'm, I'm kind of three steps down the drain, I didn't realize how cool they are to

Markus Harrer 27:25
have you have to reach to the very end, because there's often the link to the GitHub repository.

Tim Bourguignon 27:32
And I didn't pay attention to it to read it.

Markus Harrer 27:34
But sometimes I just forget it. So okay, I would be happy to to give me a hint. I'll double check that.

Tim Bourguignon 27:42
Okay, we cannot reaching the end of the tables already. I one thing I wanted to ask you, and as all the guests if you had some some new developers around some, some some new colleagues or somebody, not necessarily new on their journey, but but not necessarily mentees as well. But so somebody that you should kind of, kind of take care of all guide on the journey? What would be your advice? As a senior developer that has a lot of experience in many different your framework? What would be your advice for new developers?

Markus Harrer 28:24
Well, the I think the most important advice would be never stop learning, be open for new things. Because I mean, this is that we really need as software developers would also support my new colleague to figure out how he or she can learn efficiently. So then some tricks how you can organize your day? Or what resources would be kind of helpful to learn the topic that he or she does at work. Because I'm really into books and such. So I think I know some literature in the software web area. And I would be always there with some crystals arise. Or I would also say, Hey, I'm not sure about those either. So

Tim Bourguignon 29:28
if I may, you had a very interesting blog post about how you crunch audiobooks and podcasts into your week? Hmm, I think it was at least two years ago. When you listen to podcasts and to which speed and how and that's where we'll realize that you weren't putting a strain kilos on it like I am. So just making the most of every minute of the week to just read some new books or listen to new books and stuff. Yeah, I find it very interesting. I'll link that to the to the to the show notes. I should read this. Yes. Oh, cool, cool, cool, cool. Um, never stop learning be open for new things. And then how to learn efficiently organize yourself, etc. That's good. advices good. advices? Um, what's on your plate right now? You have some talks coming, you have some stuff with interview coming in here? Are you going to be publicly visible somewhere where people can just come to you and and tell you great things about your work?

Markus Harrer 30:35
is running again?

Markus Harrer 30:39
Well, I don't think that I'm in the position to that someone can come to me and says, Hey, cool.

Markus Harrer 30:49

Markus Harrer 30:52
So I think there's much work to be done before people really get into the topic. So as I said, I'm working more right now on content status, more for beginners, that once you really get into use of analytics, I also have plants block serious that introduce different kinds of software data sources, or different kinds of techniques that you can use in your journey to become a software analytic guy. And this is what I'm focusing right now. Okay, so that's what's

Tim Bourguignon 31:37
on your plate right now. And where should the listeners look for that on your blog? Probably on Twitter, where where should they reach you best? Yeah, you can certainly

Markus Harrer 31:49
follow me on Twitter. This is the part where I often announce publicly what I'm doing right now. So it's kind of weird for Not a chance because I think, because my Twitter handle is festal tester. So probably, as an English speaking, person, you should translate capslock into German. And then you take the one with the F at the beginning. And that's festival tester. And this is my handle for Twitter. But I have also a block that's called festival tester dot d, where I write about all the things of legacy code, software analytics, and software development in general.

Tim Bourguignon 32:37
Okay, is there a story behind festival officiator?

Markus Harrer 32:41
Well, it was a nickname that was free on most social platforms. So I took it.

Tim Bourguignon 32:48
That's fun. things come up with always always insignificant stories. Yeah. But it's just take just take it become a fully for yours. Yeah. That's cool. That's cool. Did you forget to speak about something?

Markus Harrer 33:04
No, think we have it all covered. Cool, then I think

Tim Bourguignon 33:09
we have a show. Thank you very much for coming. That wasn't very insightful. Um, g listeners, talk to Marcus reached him on Twitter and, and on his blog. I will put the the exact spelling of history on the show notes. So you can go in there now and have a look. And we see each other in two weeks. Thank you. Bye, bye. listener. If you haven't subscribed yet, you can find this podcast on iTunes, Stitcher, Google music and much more. And if you like what we do, please help your fellow developers discover the podcast by writing it and writing a comment on those platforms. Thanks again. Two weeks