2020-09-29 Apache Cassandra Contributor Meeting - 4.0 push edition

This meeting is specifically to address items in CASSANDRA-15536 - Getting issue details... STATUS

Today's meeting, we have three dependencies that are still sitting with the owner as 'Unassigned' From talking to folks, we may have more progress than is noted. Let's go through these and see if we can update the status or assign new owners.

CASSANDRA-15580 - Getting issue details... STATUS

CASSANDRA-15582 - Getting issue details... STATUS

CASSANDRA-15585 - Getting issue details... STATUS

September 29, 2020. 1230PM PST

Zoom information: Apache Cassandra Contributor Meeting

Video Link:

Transcript of the call:

Patrick McFadin 00:25
Welcome, everyone. Welcome to a contributor meeting. We're all family and friends here. If anyone would like to start with yelling at somebody, that'd be a great way. That's how my family starts a meeting. So maybe not yours. Okay, not yours. I think it would be the purpose of meeting in the agenda was around Cassandra 4.0, getting that out the door, but specifically, it was Cassandra, there's an epic out there 15536. And there's, my impression was a lot of activity, but there's some stuff that needs to get cleaned up or discussed, specifically things that are unassigned right now. So I will pause there, and Would anybody like to take that initial stab and where we're going here?

Josh McKenzie 01:30
To talk, I created the epic. So I'm guilty, we got two things. Number one is assignees. And number two is shepherds. And whether or not the shepherds that are on there have the bandwidth to engage or not. I talked to Jordan john last week about this, because we were gonna get the project management emails back up and going, because we kind of dropped those after the beta hit. So what we were like is, is a shared point of view I'm going to propose is just let's figure out if the shepherds that are on there have the bandwidth to help mentor and guide those tickets. And if not, we'll pull them off. And that way, we have a view of which of these tickets need shepherds, which ones the assignees, then we know where the gaps are, in terms of resourcing and go from there. does seem sane.

Patrick McFadin 02:14
Yeah. My, my, what I was hearing is that, um, that the energy of getting those moving is important. I mean, having it sit there, unassigned seems like it'll never get done.

Josh McKenzie 02:27
Yeah. And that's gonna require just constant attention, essentially, somebody's just gonna have to have a checkbox, where every day or every other day they go through and make sure that things are still moving along. And to paulose point from the dev list, there's, there's an opportunity to use the status in JIRA to surface the stuff. So blocked or waiting feedback, we can just create quick filters and see when things are stalling and just poke and keep them going.

Scott Andreas 02:52
Yeah, So Pualo, went through several of the tickets and asked folks for a quick update. So it's nice to see that there is some progress on those.

Jordan West 03:04
And I think, that we talked about to which is important is right now there's no progress on them, we don't have to make progress on all of them, just like progress on some is is progress. So I think it's also Apollo's picture is nice to see, because we see that also three of them have had some work done, I think those can be a model for for how the others get done.

Scott Andreas 03:29
Actually, yeah, to that point, he found out an update on the upgraded diff test ticket a little bit ago today, which has some good information on it.

Patrick McFadin 03:40
So I, one of the things I noticed is like, inside, like 15538, I'm sharing my screen if you didn't see that. You know, there's there's really nothing going on inside of here that I can see. It looks like, you know, it was there was some activity, but there's nothing attached to it. And is that the updates that we were talking about? Josh?

Scott Andreas 04:06
If you don't want activity, I'm sorry.

Jordan West 04:08
Oh, I think Scott was referring to one, there was a slightly different ticket, the upgrade and diff tests instead of other areas. But But actually, what I was going to add is just some background that may be helpful is a lot of these, you know, like like it says referenced from Oh doc from ngcc, which may be helpful in figuring out like where we broke all these out. But we were talking about a lot of things and I think like a lot of them were like we knew an area we needed to focus. But part of the work really was to drill down and figure out what we needed to focus on specifically. Which is I guess kind of what I saw is step one is the shepherd and the people who are going to work with it sit down and figure out what the reasonable testing and that component is.

Patrick McFadin 04:57
Yeah, is he pylos on Here, as advertised, here he is, Hey, what's going on? Um, is is a matter of just making sure that we start pinging people and reminding, at this point

Scott Andreas 05:12
to go really long way. Yeah. And in cases where folks are assigned to something, but are currently working on something else, if they can offer an estimate of when they expect to be able to pick it up, I think that'll help us all plan as contributors to a project or figure out if it might be good for somebody else to pick it up. If that's too

Jordan West 05:26
far out. And Patrick, can you pull up the metrics? one?

Patrick McFadin 05:30
Which ones and metrics?

Jordan West 05:32
If you go back is a few down?

Adam Holmberg 05:35
Over here?

Jordan West 05:37
back? Yeah, there was that list of subtasks? You had? I don't know if that's all the way

Patrick McFadin 05:41
Yes, right here.

Jordan West 05:42
Yes.

Patrick McFadin 05:44
This one?

Jordan West 05:45
Yeah, eight, two? I think the metrics one has been a good example of I think, like, ideally, what would happen were a few people got together. I don't know if David and Dinesh and others want to speak to this, that were involved. But they got together and sort of figured out, you know, here's a proposal for some testing, here's how we can take that and make it a little bit better. You know, here's the ideal world. And this is one that's had good progress, I think can be a model for for a future tickets that

Patrick McFadin 06:12
well, we still have the unassigned on here, though, Jordan. And that's maybe that's where I'm getting hung up is like, well, when it's what I see on a sign, I'm like, oh, no one's no one's responsible for it. There's no shepherd.

Jordan West 06:23
Yeah, I don't think we've had the like, I think an area we could improve is probably like project management, hygiene, both in this case. And another one that Josh, john and I talk a lot about is a fixed version for that, oh, beta just tends to be the default for everybody. So I think there's a few areas where we could be like, as a group, a little bit more on top of that.

Patrick McFadin 06:47
Yeah, okay. And I mean, as much as No, I'm not going to do.

Jordan West 06:52
So I would go down. And look, I mean, there are people who have if you go through these comments, who would be the logical assignee, and also assignee isn't singular, there can be multiple, I believe, in JIRA.

Josh McKenzie 07:04
Yeah, Shepherd is kind of a new concept. But I think a very fitting one for what we're facing here, which is due to the fact that we don't have empirical data to really help point our attention toward defects are surfacing. We don't RCA things, we don't tie them to subsystems. What we're leaning on is the qualitative assessment and synthesis of practitioners of operators if people don't know the code base, but also run it. But that means we've got to figure out a way to articulate and then kind of codify what that what that is, where's the uncertainty? Where's the fear where the defect seen? And I think that's the big gap that we've got to bridge on these tickets. Right? Where the statement is, we don't trust subsystem x, there's a massive amount of value in signal there. But the question is, what aspects of it how do we test it? How do we get to a point of confidence? How do we bridge it to be quantitative and empirical? Knowing we're never going to reach perfection in a timeframe that's sustainable, but knowing we've got to do better, and agree and and I think,

Jordan West 08:01
again, for some background on the shepherd to point that out, like, that was a proposal Joey made to address I think kind of exactly what you're talking about, which is, we may not need subsystem experts to be working on the test all the time. But there does need to be a subsystem expert, the shepherd who has an idea of sort of Where's valuable test, etc, because of that knowledge, because we don't have that data to drive rose decisions out in the public.

Patrick McFadin 08:27
Well, the most of looking at like, the issues that are dependencies on here, and I mean, as I look if you go into the ticket looks like wow, this is pretty much almost done. You know, that's that's

Jordan West 08:38
what I'm saying. This is one that is actually I believe, very close to David, I don't know if you're on here and want to comment, I believe you've been pretty heavily involved.

Patrick McFadin 08:44
Yep. Um, I can comment on the original Shepherd for this even commented at the very bottom, the reviews were taking too long and no one is ending reviewing things. So he kind of just stopped contributing.

Adam Holmberg 08:59
be talking about even

Ekaterina Dimitrova 09:01
Yeah, so

Adam Holmberg 09:04
project. So he's, he's left for no other reason, then he's gone to work on something else. So Ah, okay. Yeah, when I saw this, like, almost around, but not quite.

Josh McKenzie 09:15
Yeah, I think and we talked about this a little bit last week, the the definition of done is pretty crucial. And

Rahul Singh 09:21
exactly what I was gonna say is that

Scott Andreas 09:22
what is the acceptance

Rahul Singh 09:24
criteria? Right, like, if it's clear what the goal is, then?

Ekaterina Dimitrova 09:34
No, no.

Patrick McFadin 09:36
Rahul cuts off just as important.

Josh McKenzie 09:40
Yeah, I think that, that that right, there is the bridge for each of these. If we can figure out what the acceptance criteria is. It's going to require some exploration that's going to require a separate input that's going to require that dialogue but then once we have that, anybody should be off to the races. That can work. Absolutely.

Jordan West 09:55
And again, besides Steven leaving, I think metrics was a good example of That in that David had some good knowledge of sort of where we've been bit before, as you know, operator, and and Dinesh as well, and Steven went in and did a bunch of work in that back and forth. It was a while ago now and that tapered off. But I think prior to that it was a good example of forward progress and finding that point.

Adam Holmberg 10:20
Yeah, it's it. It worked out. I will say, Stephen struggled with that in the beginning. And I think it was crucial for someone with, like, some pre existing expertise, yeah, in a given domain to sort of like define the, the outline of this thing.

Jordan West 10:35
Otherwise, and I felt that too. So when I first joined, went back to working on Cassandra because I had taken a break in between. I started with testing, and I kind of said, like, I need a domain expert to tell me what's valuable to test because I've been out of it so long. So I definitely think that's

Adam Holmberg 10:49
important. that's still true. Yeah.

Patrick McFadin 10:53
What's a what's a way to make that easier? It's hard, because we're all not only are we distributed, we all work in different places. So I mean, I'm just trying to think of more meta here. Because I think this is what I see is happens, we get a good set, we get a good set of energy, there's a lot of excitement, a lot of things happen. And then just because we're disconnected, it tapers off. And the shepherd seems like that's a good move to, you know, someone who's like feeling like they keep moving. But like to your point, Jordan is like, hey, I want to dig in here. Where do I even start?

Josh McKenzie 11:32
Well, people come and go right on open source projects. And like Steven being a great example of that. Essentially, somebody has to take on the mantle of applying energy into the system, and have continued to check to make sure that shepherds are engaging, of seeing if things are blocked for a long time of seeing that people aren't responding and things are stuck awaiting feedback, and of helping kind of navigate transitioning to different shepherds, if somebody doesn't have the time to take on that role. And I think that's really all, all this subset of work actually needs is, is just that energy put into making sure the shepherds are available and engaged for the timeframe being worked on expectations are clear availability, and that we get acceptance criteria for everything that's on here. And then the role of the shepherd becomes significantly less necessary, because we've codified the value, but we want to explain the domain, and then the work can be done. And it's a utopian ideal, right? It's a lot more complex than that. And then you get into into, like, here's all the reasons why none of this works. And like we've all been through before this code base, but I don't think there's anything like too, too terribly, externally threatening about this, I just think it's gonna require someone doing it. Which is easier said than done with open source projects and volunteer,

Adam Holmberg 12:39
it's gonna require some experts making time upfront to define these things to unlock the energy of these other people who are willing to contribute, but maybe not able to at that level yet.

Jordan West 12:55
Although I would add that I don't think it's solely up to those people. Like I think having a shepherd who has like a few volunteers that know they're going to work on the project and having that dialogue.

Josh McKenzie 13:04
Yeah, it's helpful. None of these subsystems bus factor one, right, like, we can start Yeah, taking a bat to the weeds and in the bushes and rally ups and people that know these areas and see if we can get some time out of them.

Rahul Singh 13:15
Also, another question I had was, you know, I looked at the epic, I saw all these unassigned tickets, like, at the very least, what I would want to know is which one do we want to hack away at first, right? What is more critical? Like, I don't believe in everything is important. That's total bullshit. Like, there's always going to be something that's slightly more important than something else, right? So I look at these tickets and like, I want to help, but which one should I help with? You know, and it could even be like, oh, some of them are easy. They're not as important. But I'd be like, okay, sure, I'm gonna lazy I'll work on the easy one, you know, at least knowing the priority gives me a signal. And I'm being selfish.

Adam Holmberg 13:53
I say that. But I think it actually gives

Rahul Singh 13:56
other people a signal to it's like, hey, these are the things that I think can be done in order. Here's an easy one. Here's a hard one. To think that's hard to understand to look for. Somebody who's an expert can say, this is probably going to be difficult, because you got to go through the code and look at which metrics are in each of the versions, make an exe file, compare, there's real work involved, just to come up with the acceptance criteria right here. Right, but looking at it with a one sentence description. I'm just like, I don't know. I don't know if I can do this. Sure.

Scott Andreas 14:28
Yep. Good. clarity, though, sounds super important. We can use the priority field to indicate what we assess as the relative priority for the project. I know that I have a couple of personal biases in terms of like correctness of the readwrite path repair, compaction and that type of thing. And a little less concerned about like first party and bundled tooling.

Adam Holmberg 14:47
But yeah, absolutely.

Patrick McFadin 14:50
Is there a way to use this epic to communicate that because this

Josh McKenzie 14:54
is kind of opening Pandora's box but I got Gavin to, to fix our project to where we can actually rank things. On backlogs on Kanban boards, for instance, but you know, Harry dragons, right, because the relative priority of different tickets to each other on a granular basis to queue work is something that we probably have a lot of different opinions on. And so I think I think Scott's kind of more broad brush of let's just set the priority field for how important it is to like the fundamentals of what the database offers is probably the right the right way to approach it without starting a holy war of dragging things in backlogs. So

Patrick McFadin 15:27
just real quick, I also encouraging anyone that wants to comment in the chat that is available. So if you want to plus one, or tell joshy, stupid, whatever you gonna do, you know, that's a great place to do

Josh McKenzie 15:38
it. So your words are stupid, not me.

Patrick McFadin 15:42
I'm kidding. But I mean, I want to make sure that everyone knows that there's a chat, you don't have to just you don't have to join by voice if you don't want to, I do save the chats after and post those. So.

Josh McKenzie 15:53
So one of the things, Thanks, Chris. Jordan alluded to this, Thank you, Chris. Jordan alluded to this where like the fixed version, we kind of have this default bucket that we're dumping everything into. And we don't really have a muscle as a project for going through and grooming that backlog and saying, you know, this is this is okay to be going for about one, for instance. So the from, from an optics perspective, the backlog of work for for Oh is growing at a pretty intimidating rate. But from a reality perspective, we could probably serve to groom that stuff out, I'm not sure how we would want to approach that.

Jordan West 16:49
The most effective way I've seen to approach that is to ping each ticket.

Scott Andreas 16:52
To be honest, it's a lot of thinking.

Ekaterina Dimitrova 16:59
You better than before, there were tickets that we were pinging since three years, so the alpha, so now, it's way better.

Jordan West 17:07
There's a lot less.

Josh McKenzie 17:13
I mean, there's just gonna like brainstorming here. So probably stupid ideas forthcoming. But dropping everything into some kind of like four oh, triage, fixed version, or giving them a label. And then having people actually go through and select from that bucket and opt in for them into something where we say, okay, a human being that is on the project, right now, cares about this being done. That might be a way to approach it. Instead of like having a single person go through and do the, you know, the Contra of that negatively exclude things. Yeah,

Jordan West 17:44
it makes my cause some waves along the way. But I do think it'll be more effective. We've tried the other way of asking people to opt things out. That's what we did with the spreadsheet. Yeah, a while back, and we ended up hanging all the tickets. So

Josh McKenzie 17:58
we can make a really simple query where it's like, if label is in blah, and then just label all these things and have a have a board of that and ask everybody to go through it. And weeks timeframe, for instance, and then anything that didn't get flipped, we just removed. I mean, within reason, right? If it's like massive data loss and storage engine, and nobody opts for it, we should probably be like, Oh, hey, we'll do this. But

Jordan West 18:19
the maybe the middle road there would be to use the multi fixed version, and leave the current one and set the triage, which would allow us to find them, but also not unilaterally changed people's

Josh McKenzie 18:33
Oh, yeah. I was thinking labeled not not fixed, for instance. Okay. Yeah, either way. Yeah, but not not losing the metadata of where they're flagged right now would be key with whatever we do. I think

Scott Andreas 18:43
one of the things that I think we see a lot pattern wise, when looking at the burn up charts that Josh has shared, is that we're making rapid progress on new issues that are identified and fixed quickly, as part of different contributors for upgrade programs, and their quality and testing work. But we probably have a pretty substantial backlog that isn't moving much hasn't moved much in a long time, and isn't really being actively worked on. That would probably be good for a reality check and saying it's a center. Do we really think it's acceptable?

Patrick McFadin 19:15
I'm gluing what you're saying, Scott with what Rahul is saying. And it's like, I think that's really important is like, the there's an overwhelm factor. When you show up and you look at a combat or something like that, like, I I Can I have the ability? I just am a little nervous about where to start, or I'm not sure where to start? And it's, yes, uh, you can go into the dev list, or you can go somewhere and ask, but it's not, you know, just how do you lower the barrier. We've had low hanging fruit for a long time. And I know that was just a discussion in slack. But lhf was great because someone like me and community when someone says, How can I start contributing? Go look at lhf is there a tag that we could ask? Or something in that realm where I can this is something I could use. Like, whenever I'm talking to folks and, Rahul, you're going to be doing a workshop soon, like, Hey, you want to contribute? Go look up this tag. You know, we can start rallying community around, like a few things that are like, like you said that backlog. Oh, sorry.

Scott Andreas 20:23
Sorry. Let's go. That's all.

Jordan West 20:25
I was gonna ask are you talking about a tag in parallel or the lH lhf tag for for Dotto? And specifically, now? Well, if

Patrick McFadin 20:33
if the lhf tag is appropriate in this case, I don't know if there's any. Because I mean, I, I know there is some definition of what exactly is low hanging

Josh McKenzie 20:45
there. So we could infer lhf from some of the metadata on these issues, right? We could say priority low or normal complexity, low equals lhf. And then just create a query for new contributors to come in. And they can go to a board where it's defined, it's like, Look,

Jordan West 20:57
the kanban has one already. I think it's not exactly well exposed. And I mentioned that buried in a thread. But if you pull up the combine,

Josh McKenzie 21:06
like read the devil, unless it's a mailing thread, which is not much better,

Jordan West 21:09
yeah, actually. I'll draw out I think,

Patrick McFadin 21:13
this is something I'm I'm thinking in terms of I think everyone here is, is probably much more in the know about a lot of things project. But I'm thinking of like, Melissa, you and I come together on like a blog post that talks about like, Hey, you want to help get four.org across the board. Here's some stuff that would be really helpful right now. And it's, it's these, like this backlog that we clear out, but be more forthcoming about it. Like here's, you know, Shepherd that effort,

Jon Meredith 21:44
for lack of a better word, I think the prerequisite for that is having the shepherd sort of filling those out a little bit more and having specificity that Scott talked about. Otherwise, you end up just saying there's low hanging fruit, come work on it, we'll let you know what it is when we're ready.

Jordan West 21:58
I'm also not sure any of those wiki tickets are most low hanging fruit. I'm trying to see what is but JIRA is being very slow for me right now. But the links in the chat if folks want to pull it up, it's a lhf with the no assignee open only, but you can pull the no assignee off if you want to see what other tickets are in there as an example.

Jeremy Hannah 22:18
I don't know what's in there, it may be completely not valuable to new contributors are people that can help. As we come closer to the four Dotto release, though, I think lhf is going to be decreasingly valuable in terms of just how how a new contributor can jump in and, and contribute to something that's critical to the release, also. And so I think, as we if we can finish off the the 400 critical stuff, because it's either it we as we come down to the wire, it's either controversial, and needs some some conversations around it, or it's not really trivial.

Jon Meredith 23:04
But the things I would think of in that bucket are possibly the non coding tasks. So perhaps things like reviewing release notes, reviewing documentation and checking that you can actually do the things that it says that you can the darks, looking for holes where you know, when you tried to do something and raise it as a flag saying, Hey, I tried to install Cassandra for Oh, and I'm in this terrible mess. Yeah, those sorts of things that could smooth a path for other upgrades later.

Jordan West 23:29
Also, sorry, Scott, go ahead. No, you're good. It's also actually a wonderful opportunity, I think, to bring people in if you have an act of shepherd. Because I've always found at least that pairing someone with a fresh set of eyes with an experienced person leads to some really like effective testing. And it gives the person who's new a chance to learn a subsystem that they're probably going to need to learn deeply to work on it anyways. So I think it depends, in part, how it's framed, although, yes, over time, like, that will diminish, but I think the shepherds can, like with what john and Scott mentioned, with better definition, improve that.

Scott Andreas 24:11
Yeah, what I was going to add as well was that I think one of the best opportunities for new contributors to contribute to the project would be to help prepare for their own upgrade via things like executing diff tests, or any performance tests that they'd like to run on their site that will both help them gain confidence in the quality of the build and their ability to upgrade to it. But also help us identify performance issues or potential correctness issues before we got the release too. So that's also a hugely valuable area of contribution that doesn't require deep knowledge of the Cassandra codebase or contribution process.

Patrick McFadin 24:46
Yeah, I just think there's a world of amazing distributed developers out there, not that they are distributed, that they work with distributed systems that and you know, to the point of some of the discussion we were having last week, and this is Some of the more the more touchy part of it was like, we're not doing a really good job of encouraging new people to join the project. And that's one of the things I want to try to help stimulate a bit. And the ways we do that are create these nice on ramps and open the door. And I think this is a good way. And we can have shepherds that can create some lhf. And we can go out and promote that better. To get more contributors, we have a lot of people around the world that are, let's face it, no one's going anywhere. That's how we get 12,000 people to show up for a workshop because they are stuck at home. So let's put him to work. I think it'd be I think this is an idea that could have some legs,

Scott Andreas 25:44
you're reading a good set of lhf tickets and publicizing it sounds like a really good first step.

Jon Meredith 25:48
Yep. So one of the other ways that we could perhaps push those is through the status updates that the three of three of us James was sending out. And we can include things that that aren't being worked on right now and will be good to being picked up. Which is an interesting segue into asking the community more broadly how useful they found them. And if we would be able to pick up the pace by producing them slightly more frequently, like two or three times a week and listing what we think people are blocked on, just to remind people that they're not working in a vacuum and other tickets are blocked on their responses and those sorts of things.

Jordan West 26:21
And we'd love to get some ideas to like what content format, because we played with many would benefit people

Jon Meredith 26:33
wasn't used for the tool for anyone. I guess that's step one. Did anyone read it? Or were they motivated by it to do things they wouldn't have otherwise?

Erick Ramirez 26:48
It's also a good signal for the community to say that, hey, we're accepting, you know, you contributors and that things haven't come to a standstill?

Jordan West 26:59
Yeah, I did partially see it that way from the beginning of trying to communicate that we were making progress and have measurable progress for us to look at recorded somewhere.

Jon Meredith 27:13
Yeah, but you could achieve that in other ways, as well, by posting a bi monthly stay to the community where you talk about all the new things that happened. Yeah, I guess the reason we're doing it specifically now is to try and accelerate the completion of the four oh, project as much as the general community health with which I do think it's important. But everyone seems very hung up on completing 4.0, and being able to move on past it. So that to me feels like it's most important to achieve some success there. And then look, if we're smart, we'll build on that momentum and continue to, to keep the community strong by pushing on past that.

Patrick McFadin 27:55
This is a good line. So okay, well, I think, pick this up. I mean, this is, this is really good. I think this is a nice direction that we're getting out of this is a good discussion. But I know Melissa and I will talk about how we can do like a blog post and I'm gonna go tap on people. Who, if I'm going to be the person who's going to rattle the cage on, Hey, who's got some lhf? For me? I need to make sure that I'm not just bugging, you know, Jordan, or Josh or john all the time? Or is that who I'm bothering all the time for now?

Josh McKenzie 28:40
There's a lot of people in the Cassandra dev slack room, you raise a bad signal their community can rally, right?

Patrick McFadin 28:46
Yeah. All right. I will throw the signal.

Jon Meredith 28:51
And again, the more we can engage those shepherds, the more we're likely to have things written up that we can mine and, and share around. So that's in fact, priority number one for me.

Rahul Singh 29:02
And Patrick, I know how to commit to the blog post.

Patrick McFadin 29:06
You know, I'm gonna commit to the block. You're on the team, man, you're on the team. thread on? Yeah, that's a rare skill.

Rahul Singh 29:16
No joke.

Mellissa Logan 29:18
I can second that. rahul. It's crazy.

Rahul Singh 29:21
It's like, really, this is what we have to do to put one page up on. But I'm serious, Patrick, like that kind of stuff. You can just shut you know, just put it over the wall. Now that I've got my light set up and I can really crank it out. And it doesn't now that I know exactly what to do. It's not that hard.

Patrick McFadin 29:39
He's a blog machine.

Scott Andreas 29:42
That's awesome. Thank you. I think the blog posts are roughly my extensive contribution, at least in terms of patches to the project as well.

Patrick McFadin 29:51
A contributions or contribution. Let's do this. All right. different topics, more topics. Nobody has any more. That's fine. I don't have to use the whole hour. Okay. Well, go ahead, Scott. I think

Scott Andreas 30:20
I was gonna highlight the update that he found out on 15537. Because it is something that I'm glad that we're able to get up on the ticket. If ons update on the status of upgrade and diff test was the dozens of clusters so far, that he's been working with the past the diff test, which compare a three oh build with the latest forro with the number of clusters increasing each week and tested clusters with data sizes ranging from gigabytes, 10s of terabytes. The dip test does a comparison on the value of every row and every column with randomized forward and reverse scans to assert complete identity between the data. So to be able to have an linearly increasing amount of clusters gift, an increasingly larger volume of data to is helped us gain a lot of confidence in the build.

Josh McKenzie 31:10
I mentioned this to random people and random context. But a couple of contributors are looking open sourcing something that does similar things that uses Cassandra diff that uses a workload generation tools to compare different clusters spanning different versions and things of that nature to make it available for the community. Pretty nascent, but clear through legal just have to find a name that is not terrible. And naming things is hard. So you can expect that to be be around whenever we can think of a good name.

Scott Andreas 31:38
Awesome.

Patrick McFadin 31:42
naming things is hard. logos are even harder, by the way. Just gonna tell you that right? Because once you come up with a project, you got to have a logo, and then you get the T shirts. That's the way it works.

Rahul Singh 31:56
Well, with a name, it's a lot easier. You don't have to be an artist, you just have to like think when it comes to design activity for a logo. Man, that's hard for the

Josh McKenzie 32:08
name, you got to pass the Urban Dictionary test, which is becoming more and more difficult. So effing unfortunate. I discovered that earlier today. So it's a real thing. Anyway,

Patrick McFadin 32:18
I just threw shade at Kas operator. Because it First of all, it's the most unoriginal name, and it doesn't have a logo. That wasn't my talk for a batch. Come on. We're not even trying anymore. We'll just use a fixed font and move on with life. So yeah, I mean, I just general feel. I mean, the thing that I felt after many conversations last week is we're a lot closer to shipping for Dotto than we thought, is that fair?

Scott Andreas 32:54
I feel like is

Patrick McFadin 32:59
anyone dissent on that like, Oh, my God, you're delusional. Because I hear it a lot. So I'm okay with that.

Jon Meredith 33:06
I would certainly say that we're not finding things faster than we're fixing things, which means we're going in the right direction.

Patrick McFadin 33:14
Because that was serious. And that mail I, I really loved it, stick it to 2020 and shipped this year, just because I need some good this year. I don't know about you all. But this is a bit of fun here. I feel like I need to end on a good note. And right now, this is pretty much all I'm hope I'm hinting all my hopes on

Josh McKenzie 33:32
right now. You are tying this project 2020 in any way, you are jinxing everything we care about on this call. So

Patrick McFadin 33:40
I know that that could be Yeah, it could be bad. But I'm just saying All I want for Christmas. You know.

Josh McKenzie 33:48
I think in terms of next steps, I don't want to volunteer the other two Jays for things they don't want to volunteer for. But we were we were putting energy into the project management side of things before and I'd be I'd be happy to do that. Again. In terms of the whole add a new fix version, let's opt in for what we want to do. Let's get our status updates out in the shepherds all that I'm okay helping take on that mantle of other people are as well. So I see that smile dog. That's very fuzzy. Yeah,

Patrick McFadin 34:20
start sending daily emails to the dev list out of JIRA. Here's your latest report.

Josh McKenzie 34:25
That was actually John's first suggestion. I said, you know,

Patrick McFadin 34:29
john, it's not a bad idea. And lately, I've been getting into Zapier zaps. I can do some stupid stuff with that now, so just let me know if you want me to turn that loose because we can. I could ping individuals. That'd be amazing. Hey, Shepard, how you doing today?

Josh McKenzie 34:47
It needs that human touch on slack where you're like, I'm so sorry. I'm bothering you. And I know you're so busy but for real, what's going on here?

Patrick McFadin 34:54
So let me tell you about the Zapier slack bot.

Jordan West 35:00
doesn't come with a language model that delivers the Poke in a nice way if you're something like Paul Watson NLU,

Rahul Singh 35:09
and

Patrick McFadin 35:10
it will do that actually from, it'll take some personal information, twist it up. It's actually kind of like Siri. I know if you've ever heard of it. But yeah, it's like I'm sorry.

Scott Andreas 35:22
very motivated, new contributor shows up in Slack, volunteers to help with project management and very aggressively begins reaching out to everyone exactly the same second, I guess, then we'll know.

Patrick McFadin 35:32
And I have passed the Turing test, I will take my award,

Josh McKenzie 35:37
like everything Scott's ever written, and feed it to a GPT, three model, and just have that go and be the

Patrick McFadin 35:42
project management Bob for the project, and we good Oh.

Jeremy Hannah 35:47
So can I just ask something really quick. We have Harry, we have no SQL bench can generate data based on data models, with a broad ask of the community to say, as we lead up to the four Dotto release, if you want to find a way to contribute, one thing you can do is anonymize your data model and contribute that to a GitHub repo that separate from the project to say, let's go through and, and I don't know if this is too broad, but maybe have some selective selection of selective criteria from a shepherd like what you guys have been talking about to say, let me go and, and see these different data models and see what representative of things that we may not have considered. And then do some no SQL bench or hairy type of testing around those data models. And and then have that be kind of a and I'm kind of I don't know what's been already done. But I thought maybe if we can get some more kind of exotic data model representation, so that we can broaden the testing, because I just feel like previous releases, we get we get as far as we can with the the testing. But we don't know one of the one of the things that is more exotic that comes in in the four point or like the the dot one dot two dot three bugs, is Oh, wow, we didn't realize that this upgrade scenario with this data model would be problematic or something like that. Is that reaching at straws here? Or is that something that would be helpful for for the separate now that we have, like data generated tools and testing tools that allow us to do some of this?

Jordan West 37:34
I think that's a ideal goal, we want to get to I would even add that on top of capturing data models, we'd want to capture the shape of people's rights and queries. Because often we find those edge cases are some we like reverse iteration on some weird boundary mixed with exactly the things that you mentioned. So I think it would be great to get to a place where we have that. I think, like you mentioned, like, we're just getting the primitives out to build that, like Harry and NO SEQUEL benches.

Jeremy Hannah 38:09
So So you're saying, okay, so if we could have like a, just I'm trying to think of like, to cast a wide net without making a shepherd's life, living, like living misery of trying to, to to curate this, but just saying, okay, you have a data model, which is kind of just like, even just at the table level, a query model based on a single table, and a query pattern, read and write query pattern and percentages of query pattern and some idea of the data that's being written into the table. So not just to say, like, field foo, which is text, but giving some idea of how that data is distribution is.

Jordan West 38:50
Absolutely, and I think the idea there is right now, Harry just pulls from a uniform distribution, and I don't work on every day today. So Alex, and that may tell me I'm missing something horrible here, but in my mind, and the parts that I have used, it should be possible to feed it a different class that pulls from a distribution that exactly what you said has been built from scanning a user's data.

Josh McKenzie 39:13
Yeah. Expect the modification SQL tool may give us the ability to do live profiling of like cardinality and iteration direction, things like that, without actually dumping the binary content itself. I haven't looked at it. So I am arguably probably very wrong on some points there as well. But profiling those aspects of a workload to feed into a generator shouldn't be rocket science, it's gonna apply energy to it.

Jordan West 39:36
No, we just we needed the library first that that lets you do it right. So but i i think those would be awesome places. Yeah, to, to get to

Josh McKenzie 39:47
the as yet named project includes a script that will dump out at anonymize people schema. So it should be like a one click one run. If you want to contribute your schema to a GitHub repo. This is how you can totally scrub the thing. I have to do any extra work.

Jeremy Hannah 40:02
Yeah, and I'm more familiar with no SQL bench, but that no SQL bench has some distribution types of stuff built in, that maybe we could share across and kind of cross pollinate there as well.

Josh McKenzie 40:14
Yeah, I think we're, we're probably going to need to, as a community figure out what we want to do from that front, because my understanding is that Gianluca, he's the engineer that's working on this took a look at NO SEQUEL bench. And at least the the usability and its initial incarnation of trying to get generative workloads based on schemas was prohibitively complex, assuming the functionality was actually there. I think shucky Jonathan, Chuck is actually working on that right now. So I mean, there's there's moving parts that we need to get lined up and figure out what the right way to do it is. But we'll get there for sure.

Jeremy Hannah 40:53
But somebody want to shepherd that, like, like this data model, collection, of cyber type of thing, I think there's some management to that to make it useful. And then there's that into the machinery of, of the hairy and other things. So like, I think there's the collection. And that's the easy part, I think, but but important to make it simplistic enough for people to contribute on a wide basis. And then there's the curation and there's putting it in and adapting it to the machinery that is the testing

Rahul Singh 41:31
model models that I've been collecting, because, you know, I mean, if you work in practice, it's like, that's essentially 80% of the bullshit in failure is data model, right? So, um, I can lend a hand with the shepherding, I don't have the bandwidth to do the the whatever hairy or NO SEQUEL kind of test cases around it. I'm not familiar, I've used Gatling. So I can make test cases that generate data. So it really comes down to what the goal is. So like, we got to have some constraints, you know, are we talking? Like maybe use cases, right? So time series, use case, customer record use case? And that's like, yeah, use these constraints that I can I can get, I can get that done in a week or so. And in terms of the data model, because I have a bunch of them that are fairly generic and not really customer specific. That is, it's mine, right? Well, I can I can give that away.

Jeremy Hannah 42:37
And I and I'm not the intent is not to add more kind of work creep into the finalization of the but I was thinking of things that I've seen in the past that have been problematic on on the initial release that we can mitigate, and also encourage contribution in a simplistic way, as we get closer. So I'm not trying to add to the the work that needs to be done for the release. That's absolutely not the goal there. I just was trying to think of ways that we could cast a wider net, since we're talking about new contributors. And then and then Marshal that into something useful for the release.

Patrick McFadin 43:14
Is this a good? I think this might be a good opportunity to take this to discuss thread on the dev list, just to broaden the net. And, Jeremy, I think you've what you've outlined is great, it may be too much of an effort too soon, or I'm not sure how it could get in the way. But you know, there are very few people on this call, say that this is a deciding factor. Now let's let's open it up as a discuss.

Jeremy Hannah 43:43
I'll make a discus thread and and just see where that goes. And I'll explicitly say, you know, this is not trying to add more for work, but in order to see see what would be useful to mitigate certain things that we hadn't foreseen in the past.

Patrick McFadin 43:57
I think the the key would, what we were, we would hope to find is maybe a potential sponsor of that. And you know that because you're the right, it is an effort. But who would be willing to take that? And who knows who would be willing to do that?

Jordan West 44:15
The two things I add, I think this is a goal that we should get to it may not be in the four hour cycle. But regardless, I think it will help you know, towards this idea of us wanting to have a better definition of done, you know, saying that we run whatever changes through an arsenal of real users, schemas and queries, etc. will really upped the game in terms of like Cassandra, quality and stability. And then the other thing, it definitely I don't I think there is a significant amount of work to get to that point from the sort of like primitives we have right now. It's non trivial. But we may find if we do better define the work in the testing epics, that A lot of what we want to test is covered by that. So you know, that we kind of knock out a whole bunch of testing by accomplishing it. So I think it kind of goes multiple ways.

Patrick McFadin 45:13
I would also pile on follow up into this same type of thing to find something that would break Android. It's it's not data model, specifically, but follow any kind of programmatic, defined testing tool. There's, there's some cool bug boundary kind of stuff that you could do there. Here, define it. Oh, you found something that really broke Cassandra, congratulations here, some dough.

Jeremy Hannah 45:37
Yeah, and this, and like, I think it's a good point to say that this isn't, this isn't on the critical path to the release. But it's, this could be a broader effort to just say, you know, we could add to our library of things that could like with Fallout one with this generation and things like that, this can this could help with because I think anybody in the project, and I'm hoping this, this comes out on that discussion, tried to set anybody in the project, this isn't like a new feature, this isn't a CP, this is more of a, let's broaden the scope of, of things that are known that that can be tested so that we can broaden the library of what we're trying to do. So I may actually be

Jordan West 46:12
good as a CEP if we, if we decide to not do it for for work, or maybe a good example. Just I guess, in the sense of saying, like, you know, I think there's some something to be said for like putting in more effort into testing as we go forward. And not just saying like, okay, we've reached this, like, new, you know, status quo for for Oh, and it's what we need for the future. But either way, I think it would be a great addition. Over time, I don't I don't know where it falls in terms of release cycle.

Patrick McFadin 46:47
It was like fun to like to see myself spending a Saturday morning breaking Cassandra. But like, no one else is okay.

Jordan West 46:55
Yeah, kinda. It's kind of why I got into it. Breaking.

Patrick McFadin 46:59
Yeah. Because I kept breaking him.

Jordan West 47:01
It's John's fault. He showed me quick theories. And then it was over from there.

Patrick McFadin 47:09
Alright, so Jeremy, looking forward to a discuss thread. You'll you'll be starting.

Jeremy Hannah 47:17
Okay, cool. Thanks.

Jordan West 47:20
I sort of the action items and takeaways are Shepherd Shepherd shepherds. Is that? Is that what I'm getting from? Yes, I mean, the other things we discussed but

Patrick McFadin 47:33
so yeah, Shepherd Shepherd shepherds. Let's, let's get on that. Like, who are they? Where are they? I think that's just the overall I'm looking at also, how do we how can we find the low hanging fruit towards for Dotto? That's another big one, Melissa, and I will be talking about that shortly today. We've already been pinging back and forth like let's get on this. And then the other one was, was Jeremy's discussed thread topic on? You know, the testing? How do we how do we improve testing and user contributed tests? Awesome. Good call. All right. Thanks, everyone. I will have the transcripts. Well, my virtual assistant will have the transcripts done shortly, hopefully. And then I'll get those posted up into the sea wiki and I'll get this posted on the YouTube channel. Planet Cassandra. And so everyone can see what happened. But um, thanks again, everyone for showing up. Good participation today. We'll we'll do it when needed. We won't schedule it just yet, but as needed. Thanks, everyone.

Josh McKenzie 48:49
Thanks, everybody.

Transcribed by https://otter.ai

Space shortcuts

Page tree