WEBVTT 1 00:00:00.599 --> 00:00:02.370 Josh Moore: Okay that's kicked off. 2 00:00:06.060 --> 00:00:06.720 All right. 3 00:00:13.230 --> 00:00:22.440 Josh Moore: All right, no one showed up in a bit so hi everyone um I think there are a few people I don't know, so I am josh more from the open microscopy environment i'm. 4 00:00:23.520 --> 00:00:25.080 Josh Moore: set this up, so thank you for coming. 5 00:00:26.550 --> 00:00:36.030 Josh Moore: I will flop here to doing a quick introduction run through everyone we're at 18 people, so we you know, he must be it. 6 00:00:38.190 --> 00:00:46.260 Josh Moore: Under a minute right at a minute each trying to get that over with though fairly quickly the morning session got it over in 10 minutes, so you can try to top that. 7 00:00:47.250 --> 00:00:58.320 Josh Moore: um but just for anyone who hasn't been to one of these meetings before they are largely technical, so the goal of this is really to move us towards a common file format. 8 00:00:59.130 --> 00:01:11.850 Josh Moore: kind of in some of the burdens that least some of us have put up with with for over 20 years I want to ask to find out who has put up with the pains the longest um. 9 00:01:12.930 --> 00:01:19.380 Josh Moore: So yeah I mean nothing's really prepared, so this is really just a discussion feel free to break in turn on video. 10 00:01:20.160 --> 00:01:28.020 Josh Moore: Add things to the to the hacking D document to the agenda, so at the bottom, there are two sessions, you can see all the notes from the first session this morning. 11 00:01:28.380 --> 00:01:34.470 Josh Moore: I can also go over anything that you guys missed, if you would like, and I probably will, as we touch on something so that we don't start from. 12 00:01:35.550 --> 00:01:45.120 Josh Moore: ground zero each time um but yeah so under the the session to section quite towards the bottom feel free to add in points that you want to talk about. 13 00:01:46.350 --> 00:02:03.180 Josh Moore: And links at the very bottom in markdown style and then we can refer to those links, as necessary, so list your projects anything you're working on i'm have a go and then everything that's in this document and the videos will get posted to image sc as soon as I can get around to it. 14 00:02:04.200 --> 00:02:05.040 Josh Moore: i'm. 15 00:02:06.300 --> 00:02:22.800 Josh Moore: Right that's it from my side, so I will go around i'll follow my um screen, which is largely alphabetical it starts with David and then goes to build cats and I will add names into the chat for the next people go for it. 16 00:02:25.290 --> 00:02:28.230 David Gault: Okay hi everyone i'm difficult. 17 00:02:28.260 --> 00:02:28.890 David Gault: Part of the. 18 00:02:29.160 --> 00:02:35.130 David Gault: Meeting primarily been working on file formats only tiff all sorts of file formats. 19 00:02:38.580 --> 00:02:38.820 Bill Katz: hi. 20 00:02:39.570 --> 00:02:42.810 Bill Katz: i'm bill cats i'm working on be. 21 00:02:43.830 --> 00:02:48.750 Bill Katz: Jeff I am team gianelli if I am team and mostly working on data systems for that group. 22 00:02:52.590 --> 00:02:56.190 Chris Allan: hi my name is Chris Allen mvp software engineering for glencoe software. 23 00:02:57.120 --> 00:03:03.990 Chris Allan: Working closely with me team and pushing this stuff forward and developing around a variety of tools around it that's me. 24 00:03:07.830 --> 00:03:25.530 Christian Tischer: I am a Christian tisha and the hindenburg I have been running a few workshops, with just lately about the topic so i'm very motivated to make this work and i'm also maintaining Fiji implementation to open the current version in big data. 25 00:03:38.370 --> 00:03:48.960 Christoph Gohlke: yeah my name is Christopher corker and you work for uc Irvine at the laboratory for fluorescence dynamics and you might know me from the defiant Python library. 26 00:03:51.810 --> 00:03:52.080 Josh Moore: well. 27 00:03:53.490 --> 00:03:54.480 Josh Moore: done yes. 28 00:03:54.690 --> 00:03:55.350 Damir Sudar: Yes, we are. 29 00:03:57.120 --> 00:04:03.630 Damir Sudar: down here so i'm working for a small company called quantitative imaging systems which develops software for. 30 00:04:05.400 --> 00:04:13.740 Damir Sudar: A multiplex imaging like cyclic if and such and i'm also very interested in file formats and i've been involved a little bit with. 31 00:04:13.740 --> 00:04:15.660 Damir Sudar: The development of the. 32 00:04:16.830 --> 00:04:28.140 Damir Sudar: The pyramid pyramids version of the bio AMI tiff file format and, and this is of course the, the next step and i'm really excited about it. 33 00:04:32.970 --> 00:04:49.740 Dan Toloudis: i'm Dan to Lewis i'm at the Allen Institute for cell science i'm a senior software engineer and mostly work in gpu graphics and 3D rendering for our microscopy data but also was one of the originators of the ACS image library. 34 00:04:59.790 --> 00:05:00.900 Josh Moore: Dave if you have your beer. 35 00:05:03.420 --> 00:05:16.230 Dave Mellert: No beer sorry i'm trying to i'm on my phone right now, so I was trying to like set this up my name is Dave Miller, I work at the Jackson laboratory and I manage a group that's responsible for providing image data management, support for the lab. 36 00:05:20.010 --> 00:05:22.020 Davis Bennett: So i'm Davis Bennett I worked at. 37 00:05:22.290 --> 00:05:30.240 Davis Bennett: hdmi Julia and I basically upload a lot of data to s3 and keep it organized. 38 00:05:36.960 --> 00:05:38.100 Jackson Brown: And Jackson Browne. 39 00:05:39.750 --> 00:05:46.440 Jackson Brown: i'm not associated to at the current moment, but I am a maintainer on a scenario and. 40 00:05:47.880 --> 00:05:49.650 we'll be doing a PhD in the fall. 41 00:05:51.750 --> 00:05:54.480 Josh Moore: you're fairly quiet for the future Jackson, but we did here. 42 00:05:57.090 --> 00:05:59.220 Josh Moore: I think we had this game last time we had a meeting. 43 00:06:02.100 --> 00:06:03.810 Jamie Sherman: Jamie Sherman I work on. 44 00:06:04.920 --> 00:06:09.720 Jamie Sherman: It so much I owe trying to keep Jackson happy etc and ACS pilot. 45 00:06:11.220 --> 00:06:11.970 Jamie Sherman: yeah and so. 46 00:06:13.260 --> 00:06:15.480 yeah and other random tasks so cool. 47 00:06:19.320 --> 00:06:24.330 Melissa Linkert: i'm Melissa linker I also worked for Blanco software and I work on all things file formats. 48 00:06:28.590 --> 00:06:36.180 Kevin Kozlowski: i'm Kevin kozlowski also software developer at glencoe i've been working on miscellaneous and je FF stuff recently. 49 00:06:38.850 --> 00:06:50.220 Lee Kamentsky: i'm looking at ski I work at the quantum Chung lab at MIT we do volumetric imaging of brains terabytes petabytes scales. 50 00:06:53.610 --> 00:06:54.510 Josh Moore: The ours. 51 00:06:58.200 --> 00:07:12.600 Robert Haase: hi i'm Robert i'm group leader at the University of Technology and dressing, and I do gpu accelerated image, processing and high performance computing in this direction, but i'm also responsible for establishing the principles in data sharing and that's why i'm here today. 52 00:07:19.770 --> 00:07:23.430 Rohola Hosseini: Gone yeah hi my name is RON so. 53 00:07:24.840 --> 00:07:30.780 Rohola Hosseini: i'm from a university in the Netherlands and i'm mainly involved in data management. 54 00:07:31.920 --> 00:07:36.780 Rohola Hosseini: and fair data management here in student as for me i'm not a software engineer. 55 00:07:40.470 --> 00:07:52.890 Talley Lambert: I am tally I am a microscope best at Harvard Medical School i'm also a core developer on the party I don't know much about ND FF so i'm mostly here to kind of see what the the the chat is about. 56 00:07:56.100 --> 00:08:04.650 Trevor Manz: hey i'm Trevor i'm a PhD student at Harvard Medical School and I worked on and je FF in that trying to support it in the Web browser and. 57 00:08:05.790 --> 00:08:06.900 Trevor Manz: With the tool called Peter. 58 00:08:11.010 --> 00:08:28.830 Volker Hilsenstein: Hello i'm foca I work at the mpl in Heidelberg in the Alexander of cream on special multiple nomics joined recently, and we have different modalities of images and potentially annotations and segmentation so that need to be displayed together and I just wanted to. 59 00:08:29.940 --> 00:08:33.810 Volker Hilsenstein: Hopefully do that with om visa and follow the progress on this. 60 00:08:35.910 --> 00:08:36.210 Volker Hilsenstein: Great. 61 00:08:36.480 --> 00:08:44.700 Josh Moore: Thanks everyone so um we'll call it a tie me, been a little bit faster, but I like stop it, or anything um. 62 00:08:45.570 --> 00:09:02.370 Josh Moore: hmm Okay, so I will slip to trying to give a status update of everything, I know of so, starting with the tsar project that one means are and a little bit of what's going on the me sign and so that's just kind of at a high level i'm happy to answer questions about those. 63 00:09:03.420 --> 00:09:17.220 Josh Moore: But importantly if there's anyone else who has done anything let's then follow up with you know if your Community or your repository has has news, it needs to share with anyone um will oh bill got. 64 00:09:18.570 --> 00:09:30.090 Josh Moore: thrown out um we'll follow up with that, before we start talking about specific specifications and kind of diving in because that's the overall goal of this is really to try to find. 65 00:09:30.690 --> 00:09:39.360 Josh Moore: specifications that the Community wants implemented and people who want to get involved with implementing them, you know it's great to have everyone who's just kind of following along no objections whatsoever. 66 00:09:39.810 --> 00:09:52.530 Josh Moore: You know just having your support and applause is always welcome um, but we do want to try to get to the point where in parallel we're developing multiple multiple specifications and implementations and kind of keep everything moving. 67 00:09:55.350 --> 00:09:57.420 Josh Moore: Okay sorry, people are still showing up. 68 00:10:02.550 --> 00:10:04.830 Josh Moore: So i'm Czar. 69 00:10:06.000 --> 00:10:07.410 Josh Moore: For. 70 00:10:09.210 --> 00:10:11.730 Josh Moore: I don't know how far to go back um. 71 00:10:12.960 --> 00:10:14.880 Josh Moore: There was a conversation so roughly. 72 00:10:17.220 --> 00:10:23.040 Josh Moore: At the images to knowledge meeting, where we said we would try to get zarin in five these two very simple. 73 00:10:24.420 --> 00:10:35.460 Josh Moore: Next Generation file formats to work more closely together so that kind of kicked off this set of meetings there's a regular so that's a biweekly bi weekly meeting from the tsar. 74 00:10:35.940 --> 00:10:49.380 Josh Moore: And in five groups it's actually tomorrow night, so if anyone wants, you know just desperately wants to talk about this more you're welcome to come tomorrow night at eight o'clock Central European time um I can send you coordinates for that. 75 00:10:51.150 --> 00:11:04.230 Josh Moore: um as a part of those conversations we eventually got funding so both does our project got funding and when he got funding to work on a new specification that would unify the two formats that's that's Czar v3. 76 00:11:05.370 --> 00:11:07.500 Josh Moore: That funding is coming to an end. 77 00:11:09.060 --> 00:11:29.280 Josh Moore: at a high level um yeah it's not complete so that's the best kind of take home um there have been issues with keeping contractors in position over the course of the pandemic so where we are, is that there is a v3 specification branch there's documentation and you can read it i'm. 78 00:11:30.510 --> 00:11:36.300 Josh Moore: At the specification meetings that take place, basically, the agreement has been. 79 00:11:36.750 --> 00:11:48.540 Josh Moore: We will consider it a working milestone and implementations can start being built on top of that specification, but to really get it over the finish line there's likely going to be need to be some specification changes. 80 00:11:49.950 --> 00:12:04.230 Josh Moore: um the specifications that exist at the moment are there's there's our Python one which is on a branch there's a c++ implementation based on X tensor and. 81 00:12:05.100 --> 00:12:23.790 Josh Moore: there's a as far as I know, there's a beginnings of a rust implementation as well, so tomorrow night, we will be discussing getting Java implementations across the line, how do we get v3 implemented in Java and hopefully that can happen over the course of the next month or two. 82 00:12:25.170 --> 00:12:26.130 Josh Moore: um. 83 00:12:28.200 --> 00:12:35.430 Josh Moore: But really ever so that Czar v3 that's where we need to be to kind of uniform unify more of the libraries. 84 00:12:35.910 --> 00:12:44.040 Josh Moore: But that doesn't stop anything we're talking about so I sound fairly pessimistic about all that because it's been kind of a slog to get us through the past year of development. 85 00:12:44.580 --> 00:12:58.890 Josh Moore: um but everything oh emmys are based still on V2 so at some point, as soon as we think it's stable we'll swap the only means our flag from V2 v3 right, so what what we're using underneath the hoods. 86 00:12:59.340 --> 00:13:06.810 Josh Moore: um, but I think all the features we're talking about we'll just keep working but it's just good for everyone to be aware of kind of the state of what's happening there. 87 00:13:08.250 --> 00:13:10.500 Josh Moore: So the state of autoimmune Czar is. 88 00:13:13.860 --> 00:13:24.120 Josh Moore: Like guess different so for anyone who wasn't there, towards the end of 2020 there was the images to knowledge meeting again so two years later, after the initial conversation. 89 00:13:24.690 --> 00:13:42.360 Josh Moore: i'm for that we were prepared to New Orleans our specifications, as the one for the labeled images so segmenting your images and storing them together, so I a couple people mentioned those um and storing high content screening data in. 90 00:13:43.830 --> 00:13:44.820 Josh Moore: In armies are. 91 00:13:45.840 --> 00:13:56.370 Josh Moore: um those are you know listed under the specifications today everyone's encouraged to use them, it may be that those need further modification, so you know. 92 00:13:56.640 --> 00:14:06.360 Josh Moore: You know if someone has looked at those and I said, our data doesn't fit into them, then we should probably list that in you know next specifications that we should look at as a community and go to the next. 93 00:14:06.750 --> 00:14:10.980 Josh Moore: The next milestone so it's I think everyone agrees, so each of these are going to kind of. 94 00:14:12.060 --> 00:14:19.710 Josh Moore: develop, step by step, but we need to get something that's working and in place and we have implementations for those in. 95 00:14:20.850 --> 00:14:25.620 Josh Moore: Python and javascript and I think Java, is on the way. 96 00:14:27.300 --> 00:14:43.860 Josh Moore: What we're look we've started looking at since the beginning of this year, really is is how to get large 3D structures into all enemies are so for anyone who doesn't know, one of the major differences between zar the two and in five is how chunks are stored. 97 00:14:45.690 --> 00:14:56.910 Josh Moore: For anyone who doesn't know so each array is broken up into many chunks and the chunks are stored in disk in separate files or in the cloud on s3 or the Google cloud storage as separate objects rank. 98 00:14:58.710 --> 00:15:10.590 Josh Moore: Czar, for whatever reason, chose a separator of a point adopt for the individual chunks, meaning that when you're using a local file system all your chunks get put into the same directory. 99 00:15:11.070 --> 00:15:21.360 Josh Moore: Which means if you're starting is trying to store very large volumes you end up with 10s of millions, if not hundreds of millions of chunks and a single directory, which makes everything fall over um. 100 00:15:22.350 --> 00:15:27.600 Josh Moore: So some of us have discussed this on Twitter and on issues um. 101 00:15:28.530 --> 00:15:37.260 Josh Moore: and basically i've been going around various repositories and trying to get everything working so that does our V2 can use nested storage that's already a step of brings our in five closer together. 102 00:15:37.860 --> 00:15:45.480 Josh Moore: um that's going fairly well it's not done yet various upstream repositories need to accept PRS to get that all working. 103 00:15:46.020 --> 00:15:55.350 Josh Moore: um that's what we're focusing on and as we'll see so you know if others want to focus on other parts of the problem that's what we can discuss today. 104 00:15:56.190 --> 00:16:05.970 Josh Moore: And then finally from the oil me point of view i'm really our driver, so the Open Source driver and we can talk about others particular needs. 105 00:16:06.630 --> 00:16:21.810 Josh Moore: Are is is the ability to transform data sets that are coming to the image data resource the idea into armies are and store them publicly um so we've had several large submissions 3D submissions. 106 00:16:23.010 --> 00:16:31.800 Josh Moore: Also, some cytometry data and so basically we want to get specifications to a stable milestone state something that's you know written down. 107 00:16:32.100 --> 00:16:44.880 Josh Moore: And will be supported as quickly as possible so that we can then publish the data and then iterate and it's kind of the problem that Davis is talking about you know if you're putting this stuff in the cloud people need to be able to read it, so we have the basic same problems. 108 00:16:45.960 --> 00:16:53.400 Josh Moore: And the other thing that we've been working on, is, I think we have three to four hires or so interview processes that are ongoing at the moment. 109 00:16:53.790 --> 00:17:01.650 Josh Moore: People who should be working on this and as soon as there and post, we will get them hooked up on the github issues and hopefully we'll have a bit more momentum. 110 00:17:03.000 --> 00:17:12.810 Josh Moore: To ultimately that's the problem right getting all this stuff implemented across all the domains we're talking about and then implemented in you know four or five different languages so that's the ultimate goal. 111 00:17:15.450 --> 00:17:19.770 Josh Moore: Sorry, that was a bit of a slog um any questions on any of that. 112 00:17:23.250 --> 00:17:35.580 Josh Moore: Everyone so, especially for anyone who hasn't attended before a general feeling good feeling or that you understand what's going on with oil, these are what we're talking about what the heck in gfs even stands for. 113 00:17:37.200 --> 00:17:37.530 Lee Kamentsky: Okay. 114 00:17:37.950 --> 00:17:41.430 Josh Moore: i'll take that as a given um feel free to ask, though, if you need. 115 00:17:43.230 --> 00:17:47.580 Josh Moore: c plus plus so I can send the link to the. 116 00:17:49.290 --> 00:17:52.500 Josh Moore: The c++ implementation, Dom year. 117 00:17:56.100 --> 00:18:09.180 Josh Moore: So it's called extensors are um and they've added various I can, I think it's other libraries in their github organization so you'll have to look around a bit or you can follow their see make files. 118 00:18:09.690 --> 00:18:26.550 Josh Moore: um you know kind of adding their own stack similar to desk and you know so everything that exists in the Python stack or in the in five stack they're needing to redo and C plus plus it's a bit unfortunate that everyone has to redo the same things themselves, but so being. 119 00:18:28.230 --> 00:18:29.850 Chris Allan: What happened with said five josh. 120 00:18:30.960 --> 00:18:32.070 Josh Moore: said five still. 121 00:18:32.070 --> 00:18:42.240 Josh Moore: exist um it's a c++ library that concurrently reads in five ends are i'm constantly Constantine was here this morning. 122 00:18:43.020 --> 00:18:48.390 Josh Moore: So I can send the link to that and it has a Python wrapper around it. 123 00:18:48.930 --> 00:18:57.120 Josh Moore: um interestingly, so I need to talk more to Constantine about this so i've mentioned some problems with the differences between nested storage and non nested chunks. 124 00:18:57.690 --> 00:19:09.840 Josh Moore: And he didn't realize that any of that was a problem, I think, just because his library takes care of all that so um by all means do try it out the other implementation that. 125 00:19:11.100 --> 00:19:17.130 Josh Moore: i'm getting suggested more and more often as tensor store also supports both Czar and in five. 126 00:19:19.410 --> 00:19:22.680 Josh Moore: it's getting quite complicated keeping up with all the different implementations. 127 00:19:24.030 --> 00:19:30.420 Josh Moore: So anyone else have any that they're really happy with and there's the rust in five implementation. 128 00:19:30.750 --> 00:19:33.180 Chris Allan: there's still nobody working on C sharp right josh. 129 00:19:33.660 --> 00:19:37.170 Josh Moore: As far as I know, no one's working on C sharp. 130 00:19:38.040 --> 00:19:38.790 Josh Moore: that's correct. 131 00:19:39.210 --> 00:19:42.270 Josh Moore: Though I haven't asked, I don't even know if there's an issue asking that Chris. 132 00:19:48.030 --> 00:19:48.870 doesn't work like it. 133 00:19:51.450 --> 00:20:04.020 Josh Moore: Yet and there's a see implementation that's coming as well from the uni data team, just to kind of round out so that's javascript Java Python rust there was a scholar one C c++. 134 00:20:05.460 --> 00:20:06.510 Josh Moore: Not to mention 10s of store. 135 00:20:10.560 --> 00:20:19.920 Damir Sudar: What you just mentioned the sea native see implementation is that not CDF or is that related to the net CDF activities. 136 00:20:19.950 --> 00:20:25.590 Josh Moore: that's net CDF it's certainly on master i'm not sure if it's been released, they call it mcs are. 137 00:20:28.560 --> 00:20:29.970 Josh Moore: And they are so they'll. 138 00:20:30.030 --> 00:20:45.000 Josh Moore: Be at the meeting tomorrow night as well, because they are while i'm hoping they'll be there because there, they will support a Java wrapper around the sea implementation so that's one of three Java implementations I know of. 139 00:20:50.820 --> 00:20:58.290 Josh Moore: Sorry, so that was the state of implementations that at least I know of anyone has any status reports from their side. 140 00:21:01.440 --> 00:21:05.640 Josh Moore: Good kind of look at Trevor I assume as rgs is still happy. 141 00:21:09.000 --> 00:21:15.450 Josh Moore: yeah trevor's happy excellent, as long as Trevor stays healthy then we'll keep going um. 142 00:21:15.540 --> 00:21:18.690 Trevor Manz: I did write a v3 implementation a while. 143 00:21:20.190 --> 00:21:20.700 Josh Moore: Oh. 144 00:21:22.410 --> 00:21:29.880 Josh Moore: So that's actually something that really needs to happen and I guess that's largely my fault, there will probably be a table of. 145 00:21:30.630 --> 00:21:41.310 Josh Moore: Support matrix coming up this morning, one of the decisions from Constantine was too so there's a repository from Constantine called Czar implementations. 146 00:21:42.150 --> 00:21:52.020 Josh Moore: um where he has a mini too many tests so every implementation that can write one of these files reading each one of those files and seeing if everything works. 147 00:21:52.620 --> 00:22:00.300 Josh Moore: And so Constantine is going to migrate that to the tsar developers organization and basically we can keep adding to that so Trevor it would be good to get. 148 00:22:01.080 --> 00:22:12.420 Josh Moore: Your V2 and sorry I won't be able to make that shut up a V2 v3 version of you, the javascript implementations into the matrix and figure out if everything's. 149 00:22:13.980 --> 00:22:14.460 Josh Moore: aligned. 150 00:22:17.160 --> 00:22:21.270 Josh Moore: But it'll get easier to manage that once it's it's part of the organization. 151 00:22:24.390 --> 00:22:24.960 Josh Moore: cool. 152 00:22:26.250 --> 00:22:30.210 Josh Moore: that's the state of Czar at least as as of today. 153 00:22:32.700 --> 00:22:37.350 Josh Moore: getting any when else a chance, otherwise let's start switching to specifications. 154 00:22:39.600 --> 00:22:40.530 Josh Moore: yeah okay. 155 00:22:42.120 --> 00:22:44.430 Josh Moore: So here's the process, I know that it. 156 00:22:45.720 --> 00:22:46.710 Josh Moore: um. 157 00:22:48.150 --> 00:22:55.530 Josh Moore: So the first specification that we did, and you know some of actually several of you are involved was on this. 158 00:22:56.130 --> 00:23:02.310 Josh Moore: In the tsar spaces, you know, so we have this attempt to come up with a specification, that is, the new. 159 00:23:02.880 --> 00:23:09.420 Josh Moore: Proper subset of Czar and in five so that's our P three so as a part of writing that specification. 160 00:23:09.900 --> 00:23:20.790 Josh Moore: um the imaging Community said, well, we need to be able to store sub resolutions, we need to be able to store an image pyramid let's write an extension to the tsar be three spec. 161 00:23:21.210 --> 00:23:31.110 Josh Moore: So that we can represent that, in the same thing we did with on me tiff storing very large images um we want to be able to do that in multiple dimensions, so that is. 162 00:23:33.090 --> 00:23:33.720 Josh Moore: I have it here. 163 00:23:35.130 --> 00:23:49.380 Josh Moore: That is the famous issue 50 from the tsar spec repository um and they're actually good conversations largely it's this set of people who are having those conversations and we work through how we want to store multi scale data. 164 00:23:50.670 --> 00:24:03.660 Josh Moore: And then, at some point the tsar community and one of these bi weekly meeting said Well, this is all great you're doing good work, but we don't want to be responsible for the specification right, so this needs to live somewhere else. 165 00:24:04.170 --> 00:24:14.340 Josh Moore: So that's where n G F F really started is okay, this needs to happen somewhere where are we going to talk about imaging specifications on top of these next generation file formats. 166 00:24:14.970 --> 00:24:24.990 Josh Moore: So that's where we move to image sc you can follow the only n G F F tag on image sc there's the end gfs group and we started having these calls. 167 00:24:26.040 --> 00:24:43.170 Josh Moore: So the Multi skills got implemented I think everyone was generally happy with how that process went, although I did at some point just kind of draw a line and say okay some of this information there's no consensus on so that's the scales, the offsets the positions, the grid size. 168 00:24:44.370 --> 00:24:48.780 Josh Moore: we're just going to do the Multi scale and then all this other metadata needs to be specified in the future. 169 00:24:49.470 --> 00:24:55.560 Josh Moore: um that's probably something that will keep happening so with each of the specifications that we're going to look at. 170 00:24:56.040 --> 00:25:03.960 Josh Moore: you're not going to be able to get the entire specification done right, so you kind of have to pick your battles and get us from from solid. 171 00:25:04.350 --> 00:25:10.170 Josh Moore: usable point to a next solid solid usable point right that's kind of the game we're playing here. 172 00:25:11.040 --> 00:25:24.960 Josh Moore: um interestingly, all that stuff that got cut off at the bottom isn't got pushed to an issue this morning and conversations have started up again and we'll look at those because everyone here is obviously welcome to join into those conversations. 173 00:25:26.100 --> 00:25:27.930 Josh Moore: So, then, we worked on the. 174 00:25:29.070 --> 00:25:40.620 Josh Moore: Labels specifications, so there is a specification for pixel based annotations right so that's in the spec um and, as I mentioned earlier, the high content screening data. 175 00:25:41.130 --> 00:25:51.540 Josh Moore: um so a lot of the conversations from this morning kind of built on those two efforts right, so there was the pixel based annotation commerce. 176 00:25:52.110 --> 00:26:07.320 Josh Moore: specification that basically got done by beginning of December um but it's clear there also needs to be a vector based annotation specification so polygons meshes point clouds etc um. 177 00:26:08.940 --> 00:26:15.360 Josh Moore: i'm done here, can you add at the bottom of the file cuz i'll i'll lose the chat but I won't lose the hacking define. 178 00:26:16.800 --> 00:26:25.740 Josh Moore: cheers um and then so there were a number of people who raised their hands to talk to work on those. 179 00:26:26.310 --> 00:26:33.510 Josh Moore: I can anyone here who's interested in those people well actually what i'll probably do is, by the end of this call, hopefully we'll have some people here. 180 00:26:33.960 --> 00:26:43.020 Josh Moore: who are interested in being involved in the vector based annotations and then we'll get all of those names on to a github issue, and you can all work together right. 181 00:26:44.460 --> 00:26:47.070 Josh Moore: Similarly for the high content screening. 182 00:26:48.090 --> 00:26:58.410 Josh Moore: specification it's basically a collection of images and then what we talked about this morning was much more, how do we generalize that, how do we take um. 183 00:27:00.810 --> 00:27:24.120 Josh Moore: You know, whole slide images with images laid out in various ways, on top of it have a tissue and and represent that as a collection of images or collections of collections of images, etc, etc, so draga from monash actually has a proposal that i'm someone's just joining like can Carlisle. 184 00:27:29.520 --> 00:27:31.800 Josh Moore: hi Ken sorry i'm going to keep going. 185 00:27:34.200 --> 00:27:41.250 Josh Moore: that's not working, if someone can paste the hacking D notes in the chat that'd be great um. 186 00:27:42.720 --> 00:27:46.740 Josh Moore: So she has a representation that she's basically proposing. 187 00:27:48.240 --> 00:27:59.130 Josh Moore: And she said she would open issues of thinking of Christian, who certainly interested in this as well, so we need to link up the github issues and the image sc post and get everyone working at the same time. 188 00:28:01.050 --> 00:28:17.460 Josh Moore: And then I think the other big categories of specifications that that are certainly are listed here and then i'll shut up and see if anyone has anything that I haven't talked about yet that we need to get on the list, this evening and or this morning i'm are. 189 00:28:19.230 --> 00:28:33.720 Josh Moore: The so remember this is a bit something else we left out of the label specification was the ability to have in two separate czars possibly in two separate location, so one locally one s3. 190 00:28:34.350 --> 00:28:41.610 Josh Moore: A link from the original image data to the to the to the labeled image, or vice versa, from the labeled image to the. 191 00:28:43.380 --> 00:28:45.840 Josh Moore: The original image data so that's. 192 00:28:47.130 --> 00:28:52.620 Josh Moore: That falls under remote links there's an issue for that so anyone who's interested in working on that concept. 193 00:28:55.020 --> 00:29:06.990 Josh Moore: is more than welcome to and then there's a very kind of all of that is really more binary data and then there's this huge other component which is storing metadata. 194 00:29:07.380 --> 00:29:15.630 Josh Moore: And that's what damn near was getting to which is there's quite a lot of information in that one the xml that needs to live somewhere where is it going to live, you know. 195 00:29:16.320 --> 00:29:28.620 Josh Moore: A a very simplistic implementation and could just encode that one the xml as a Czar array and you just have a one dimensional string i'm somewhere that you're reading. 196 00:29:30.120 --> 00:29:40.950 Josh Moore: I assume from most of the metadata it actually makes sense to pull it out and make it accessible in in desire metadata and that will be a body of work that we need to get through. 197 00:29:43.530 --> 00:29:58.830 Josh Moore: The last thing that's on this list the specifications to be discussed is tabular data which is you know if you have a Polygon representation or a bunch of mesh vertices you want to store various information so types of information I get discussed this morning war. 198 00:30:00.600 --> 00:30:09.810 Josh Moore: Were vectors associated with every point So how are the the molecules moving i'm very other kinetic or chemical information. 199 00:30:11.130 --> 00:30:18.840 Josh Moore: But it could also just be the label or some intensity value that's been measured and so having some way to store that, along with all the other representations. 200 00:30:21.240 --> 00:30:30.000 Josh Moore: And now, the floor is kind of open, so this is, first, I think, a process is just getting everything off of all of your chest so that's one. 201 00:30:30.900 --> 00:30:42.750 Josh Moore: Does everyone, you know is up to speed with what each of those specifications is trying to do um do you want to add requirements to the specifications, do you do you want to implement one of the specifications. 202 00:30:44.190 --> 00:30:45.030 Josh Moore: The sky's the limit. 203 00:30:45.870 --> 00:30:54.540 Davis Bennett: The, can I just ask about the high content screening situation, how is that not just handled by the translate parameter of a transform. 204 00:30:55.860 --> 00:31:08.100 Josh Moore: um so for a single plate, it could just be a transform there a couple more layers of the of that hierarchy Davis so each well then, has multiple fields. 205 00:31:08.610 --> 00:31:23.850 Josh Moore: You know, and so there's the information about when the plate was acquired what what its position was um so we needed places basically to store all of that it may be, and so this is part of this was also discussed this morning. 206 00:31:24.960 --> 00:31:33.120 Josh Moore: When there is kind of a generic collection concept, you know once that specification has been written and I assume it will have transformed in it to some degree. 207 00:31:33.720 --> 00:31:42.090 Josh Moore: I think we didn't you just take a step back and rewrite the high content screening to use whatever that concept is we don't want to concepts, but we also didn't want to. 208 00:31:43.770 --> 00:31:47.430 Josh Moore: kind of do the hard work before we got something in place to just show the data right. 209 00:31:48.180 --> 00:31:57.000 Davis Bennett: it's just like one approach to describing a bunch of images that are from the same space is to give the space that name and then the transform handles everything else. 210 00:31:58.770 --> 00:32:11.580 Josh Moore: yeah so we'll so when when we come us assuming we have exactly what you just said, then what we will do is will come and say Okay, but at this point, this point at this point, we need to add more Meta data and we'll do that. 211 00:32:23.610 --> 00:32:29.340 Josh Moore: i'm reading what people have written here now so someone feel free to just speak up. 212 00:32:35.190 --> 00:32:39.510 Davis Bennett: Is the the dimension names, but those still hard coded. 213 00:32:41.370 --> 00:32:47.850 Josh Moore: yeah so that's certain I guess I didn't mention that so that's falls under physical size. 214 00:32:49.050 --> 00:33:02.490 Josh Moore: um so currently every array is time channel that y X and that needs to be loosened and made you know made flexible, so you can have named dimensions. 215 00:33:03.840 --> 00:33:21.720 Davis Bennett: But the approach i've taken as I put the names of the dimensions in the transform object mm hmm and I posted a link in hack md to an implementation of its kind of a nasty mix of multi scale metadata implementations but in there there's one that I like. 216 00:33:22.860 --> 00:33:27.120 Davis Bennett: Its associated with the open organ l data sets that we uploaded on s3. 217 00:33:28.830 --> 00:33:39.090 Davis Bennett: Basically, we went with the Multi scales description of the Multi resolution pyramid and associated with every single data set in that pyramid we have a transform object. 218 00:33:39.990 --> 00:33:49.740 Davis Bennett: Which is pretty simple object it just has a list of axes that are the names of the axes like I really tired of Python Java access indexing issues. 219 00:33:50.130 --> 00:33:50.490 Josh Moore: They just. 220 00:33:50.580 --> 00:33:59.100 Davis Bennett: put this the access names in there and then for every transform that we're applying to the data which currently is scale and translate put those in there. 221 00:34:01.440 --> 00:34:08.220 Davis Bennett: And the nice thing about this is that it's explicit about the fact that your multi resolution pyramid involves the translation step. 222 00:34:09.990 --> 00:34:11.340 Davis Bennett: explicit about the resolution. 223 00:34:12.390 --> 00:34:19.800 Davis Bennett: Sorry explicit about the scaling transform needed to map the data into physical space we don't talk about resolution is that's a separate thing. 224 00:34:27.960 --> 00:34:28.890 Davis Bennett: And then theory that. 225 00:34:31.200 --> 00:34:33.900 Christian Tischer: Each resolution layer has a transformation, a text. 226 00:34:34.170 --> 00:34:34.710 Yes. 227 00:34:36.300 --> 00:34:51.990 Davis Bennett: The underlying principle here and there were, this is an iteration on previous multi resolution metadata schemes and there were some that I thought were to implicit like listing off scale levels, without listing physical information about the underlying data set that's to implicit. 228 00:34:53.370 --> 00:35:05.640 Davis Bennett: So the principle here is a multi resolution image is just a list of data sets so multi resolution metadata should be a list of data of single resolution metadata. 229 00:35:07.380 --> 00:35:12.240 Davis Bennett: Those that exact transform can be found in the metadata of the individual data set. 230 00:35:13.740 --> 00:35:19.740 Davis Bennett: means that generating this multi resolution metadata is really simple you just take the metadata for each of the data sets and you stick in that list. 231 00:35:20.880 --> 00:35:26.070 Davis Bennett: Is there is no multi resolution metadata except the name multi scales and the list of data sets. 232 00:35:27.780 --> 00:35:31.770 Josh Moore: But there will be metadata that applies to this, some of them right. 233 00:35:32.790 --> 00:35:33.960 Josh Moore: So when you specify that. 234 00:35:33.960 --> 00:35:38.460 Davis Bennett: Somehow yeah if you want, you can put some description in the group right. 235 00:35:38.880 --> 00:35:40.920 Josh Moore: Which is basically what we have now so. 236 00:35:41.880 --> 00:35:48.090 Josh Moore: So I think it would doesn't even need to be a breaking change to go from what we have now to what you're suggesting. 237 00:35:48.690 --> 00:35:55.800 Davis Bennett: yeah I think the only difference is mine i'm explicit about the transform and we punted on that in the issue 50. 238 00:35:56.400 --> 00:35:56.730 Right. 239 00:35:59.910 --> 00:36:00.330 Josh Moore: yeah so. 240 00:36:00.720 --> 00:36:10.380 Chris Allan: I was just going to say, I think the conceptual issue that we're going to have going forward with this is that the more explicit we get the more difficult, we make it for implementation of this and it's going to be a. 241 00:36:10.680 --> 00:36:25.140 Chris Allan: it's going to be a push and pull of trying to trying to have the explicit mappings whereas also making it so that someone doesn't have to understand 14 different metadata layers in order to just write five layers of excuse me five layers of an array. 242 00:36:27.720 --> 00:36:34.560 Davis Bennett: And so the logic, I had was like this is how I like to express the physical transformation, but he else might express it differently. 243 00:36:35.280 --> 00:36:42.600 Davis Bennett: But I think if the constraint is the Multi resolution metadata is a list of physical transformations However, you want to express them. 244 00:36:43.380 --> 00:36:52.440 Davis Bennett: It doesn't require somebody to know by my preferred physical transformation scheme, they can have them at their own and it's on the onus is on the reader to understand it. 245 00:36:54.960 --> 00:36:59.070 Josh Moore: yeah, but I think you know, we would need to have one that we're suggesting so. 246 00:37:00.180 --> 00:37:00.600 Josh Moore: I mean we. 247 00:37:00.660 --> 00:37:06.630 Josh Moore: Already so it's an interesting process so for it okay moby so the big data viewer. 248 00:37:07.290 --> 00:37:16.740 Josh Moore: Implementation from from Christian and co you know basically looked at davis's from open or Gal and then already changed it somewhat so you know. 249 00:37:16.980 --> 00:37:25.740 Josh Moore: there's no way to write one implemented in well I guess you could write up an implementation that reads both but you're you'll be hard coding support for for each of the variances. 250 00:37:26.250 --> 00:37:32.520 Josh Moore: And so the process that we're going through here is trying to say Okay, you know as a community we're going to agree on something so that. 251 00:37:33.420 --> 00:37:34.140 Davis Bennett: That they change. 252 00:37:35.460 --> 00:37:36.030 Davis Bennett: order. 253 00:37:37.350 --> 00:37:40.440 Josh Moore: Christian can tell you what happened, I think they pulled it up a layer or something. 254 00:37:41.520 --> 00:37:43.110 Christian Tischer: I don't remember the details. 255 00:37:45.240 --> 00:37:49.020 Josh Moore: But yeah there's a lot of well I like it this way, and you know that's kind of like the worst thing. 256 00:37:50.100 --> 00:37:53.580 Josh Moore: That can happen, so we do need to make the hard choices. 257 00:37:55.530 --> 00:38:05.220 Josh Moore: Okay, but that's actually good, so I mean Davis is already do you have the link to the the the issue that got created this morning Davis up. 258 00:38:05.610 --> 00:38:06.600 Davis Bennett: I can paste it in there yeah. 259 00:38:06.870 --> 00:38:10.140 Josh Moore: yeah cheers R amp D or the the zoom chat. 260 00:38:11.370 --> 00:38:18.150 Josh Moore: Anything that's in the hacking D won't get lost so that's I guess I shouldn't have started the process by pasting all this stuff in the chat. 261 00:38:18.720 --> 00:38:38.250 Josh Moore: i'm MIA culpa so the and then certainly anyone who's interested in and specifying the the the the physicals you know the the spatial context that the images are living in should get involved in that in that issue um issue 28 good job number. 262 00:38:42.450 --> 00:38:46.020 Lee Kamentsky: Are people thinking about working transforms as well. 263 00:38:47.700 --> 00:38:57.600 Josh Moore: yeah, so I think that's the point at which I pulled the plug last time Lee so someone started suggesting that i'd like okay i'm not dealing with this anymore i'm not saying it shouldn't be dealt with, but I. 264 00:38:57.600 --> 00:38:58.110 guess. 265 00:38:59.190 --> 00:39:10.770 Josh Moore: Does someone have a recommendation for what it should actually look like, or is it something that we need to actually get the that the athlete transforms done first before we deal with something more complicated. 266 00:39:10.980 --> 00:39:16.080 Lee Kamentsky: yeah might be something that that standard line, but we use like a beast line. 267 00:39:17.100 --> 00:39:30.360 Lee Kamentsky: of work being field and it's really useful because our images are so large that we don't want to keep writing them as we integrate them into the next in the next layer 268 00:39:31.530 --> 00:39:43.650 Lee Kamentsky: So people are using Trent sort of transform stack to get from the raw data as the image gets worked on line to the eventual. 269 00:39:44.940 --> 00:39:46.440 You know, typically that lists. 270 00:39:49.440 --> 00:39:52.320 John Bogovic: In my view, it should be yeah I think. 271 00:39:53.730 --> 00:40:05.760 John Bogovic: i'm in favor of pushing it a little bit in the sense that if it's a bit too dependent on particular implementation of things, except in one example which i'll say, but like. 272 00:40:06.120 --> 00:40:17.880 John Bogovic: One person could be storing beast blind like coefficients other another person might be storing explicit displacements and another person might be storing who knows what so i'd say except in the case of. 273 00:40:18.420 --> 00:40:26.820 John Bogovic: displacement fields which are pretty standard across tools and which themselves can essentially be stored as a you know X by why by to. 274 00:40:27.840 --> 00:40:43.020 John Bogovic: array in 2d or X, Y y Z by three array in 3D I think and then maybe just a tag thing, this is a displacement field, I think, maybe now it's not the right time to be doing anything fancier but that's just my opinion. 275 00:40:46.350 --> 00:40:47.370 Lee Kamentsky: Just down the line. 276 00:40:48.390 --> 00:40:49.890 Josh Moore: But if someone. 277 00:40:50.070 --> 00:40:58.200 Josh Moore: So, and this is, this is the process that's interesting is you know if someone wants to write up that issue, even if we don't do anything with the issue for the while for a while. 278 00:40:59.010 --> 00:41:16.140 Josh Moore: You know, feel free it will gather you know it's kind of like fishing for people in the Community to either say you're stupid or yeah I agree with what you're doing um so feel free to get it in place, you know we'll look through an example of doing that, here in a minute um. 279 00:41:17.550 --> 00:41:32.460 Josh Moore: You know, and if it really is a case of everyone agrees, and there are a large number of tools across the languages that will support it, you know, then it's then it won't hurt and if someone's willing to do the work than I did I think that's fine but it's. 280 00:41:34.740 --> 00:41:38.280 Josh Moore: there's certainly limited resource question that we're all dealing with right. 281 00:41:46.080 --> 00:41:48.030 Josh Moore: England tells strong feelings on any of these. 282 00:41:48.030 --> 00:41:48.930 specifications. 283 00:41:50.100 --> 00:42:09.720 Davis Bennett: Another thing to consider and I don't think this is necessarily something I want to do, but it's something that they do in the climate field with X Ray and I think the, whatever their hdfs plus format involves but explicitly saving out the axes of your data as a separate data set. 284 00:42:11.100 --> 00:42:12.870 Davis Bennett: That is kind of in between. 285 00:42:14.220 --> 00:42:25.230 Davis Bennett: Putting the metadata as a string in in the attributes file and saving out the full transformation and I think the kit where people wanted to do that, I remember correctly. 286 00:42:27.750 --> 00:42:30.360 Josh Moore: that's an interesting point so. 287 00:42:32.910 --> 00:42:46.410 Josh Moore: All these are currently is not X Ray compatible and you know, I think that is an issue um X Ray doesn't have support for multi scale so that was why I didn't pursue it I tried to get. 288 00:42:47.130 --> 00:42:54.600 Josh Moore: I tried to use multi scales with X Ray and basically got to a Blocker they have a couple of open issues on that Davis so. 289 00:42:55.230 --> 00:43:03.930 Josh Moore: Either what we would have to do is each of the single data set so maybe you know your concept of multi resolution or multi scale is just. 290 00:43:04.470 --> 00:43:18.000 Josh Moore: A set of single scales right each of those could be an x Ray data set that would work or we wait until X Ray basically takes on board our concept of a multi scale and then we can work together with them. 291 00:43:18.600 --> 00:43:23.160 Josh Moore: But I think, and I guess, this is kind of open to to a vote and. 292 00:43:24.210 --> 00:43:30.600 Josh Moore: Opinions from everyone, it seems like a lot is happening in the X Ray space, and it would make sense to interoperate with them. 293 00:43:30.930 --> 00:43:39.450 Davis Bennett: Right yeah, I would like that I do think there's a fundamental issue with representing a multi resolution image in X rays model. 294 00:43:39.990 --> 00:43:54.270 Davis Bennett: Because X Ray wants a collection of images to all have unique access names unique dimensions it's like exactly what you don't have with a multi resolution image, unless you give them like you know see one Z two you don't want to do that. 295 00:43:56.670 --> 00:43:57.960 Josh Moore: Someone else was gonna say something. 296 00:43:58.500 --> 00:44:10.410 Jackson Brown: yeah I mean that's great so the estimate, I have four is entirely backed by X Ray and so, in four dot one of like we're planning to support bellamy's are reading and go play. 297 00:44:11.670 --> 00:44:24.780 Jackson Brown: With for that one and it's like well yeah how are we going to handle multi scale or cedar multi user multi image, or what everyone call it is basically we were just saying each image reach multi scale or each level of the pyramid is a different X Ray object. 298 00:44:26.190 --> 00:44:31.650 Josh Moore: that's that's the way we've kind of seen and you're generating the dimension metadata on the fly. 299 00:44:33.330 --> 00:44:38.250 Jackson Brown: If it's an all of us are we use the metadata right if it's if it's not yeah. 300 00:44:38.970 --> 00:44:41.880 Josh Moore: But there's no dimension array. 301 00:44:43.380 --> 00:44:44.790 Josh Moore: And there's no a rate and. 302 00:44:47.610 --> 00:44:49.800 Josh Moore: I will look at how you're doing this because that sounds like. 303 00:44:49.830 --> 00:44:51.660 Jackson Brown: Oh yes, thing is, we haven't started like. 304 00:44:51.840 --> 00:44:52.950 Josh Moore: Oh, this is. 305 00:44:52.980 --> 00:44:57.750 Jackson Brown: On the roadmap before, not one of what we want to support and. 306 00:44:58.950 --> 00:45:02.610 Jackson Brown: Just because we yeah we know we needs are, but we are going to tackle yet. 307 00:45:05.490 --> 00:45:11.490 Jackson Brown: But at least that's how we know that we can't do it with straight like Czar to X Ray, we have to do some other. 308 00:45:11.970 --> 00:45:12.810 Josh Moore: Right rapper. 309 00:45:13.650 --> 00:45:14.130 Josh Moore: or. 310 00:45:14.370 --> 00:45:17.580 Josh Moore: And we can all keep pushing against X Ray to get them to support this. 311 00:45:20.040 --> 00:45:21.840 Josh Moore: So they're thinking about it. 312 00:45:27.540 --> 00:45:29.880 Davis Bennett: have an issue for that X Ray topic. 313 00:45:30.960 --> 00:45:33.240 Davis Bennett: But just Google X Ray multi skills. 314 00:45:34.290 --> 00:45:35.340 Josh Moore: or one one. 315 00:45:41.220 --> 00:45:41.610 There you go. 316 00:45:46.740 --> 00:45:51.330 Josh Moore: Okay, we just went through a bunch of stuff any questions or concerns or. 317 00:45:53.160 --> 00:45:54.420 Josh Moore: Go for a team okay. 318 00:45:54.450 --> 00:46:01.620 Jamie Sherman: One we've been kind of one or two things we've been wondering about and would like to bring up our concept of samples per pixel. 319 00:46:02.820 --> 00:46:07.050 Jamie Sherman: And because we've run into a lot of headaches around that in terms of like. 320 00:46:08.520 --> 00:46:23.310 Jamie Sherman: You know, when you have an rgb image that's been stored or something like that and it's got three channels, then, how do you unpack that and how do you keep a consistent interface when you don't have you know you either conflate samples and channels. 321 00:46:24.360 --> 00:46:37.620 Jamie Sherman: Or you know where we're trying, with a new dimension right like samples dimension in the case where that pops up and so that's one, and then the other one is like mosaic or tile images. 322 00:46:40.410 --> 00:46:42.450 Those are another messy. 323 00:46:43.680 --> 00:46:44.790 Jamie Sherman: piece of fun to deal with. 324 00:46:45.720 --> 00:46:59.700 Josh Moore: So we have Christoph and we have Melissa here so hopefully we can get some help on this um my understanding, so this is so, this came up the last call I think Jamie brought it up and I didn't have a good answer, I think I asked Melissa. 325 00:47:00.750 --> 00:47:12.750 Josh Moore: And so, at least from the old me tiff spec that the samples per pixel is something we don't necessarily want to propagate into me Czar that it's a complexity that if we can get away from that would be a good thing. 326 00:47:13.770 --> 00:47:28.740 Josh Moore: So floor is kind of open for so yeah there's one thumb up right so um and then I if i'm remembering right in Jamie maybe you have the link, there was a conversation, where the Zeiss developers certainly sebi. 327 00:47:30.390 --> 00:47:46.950 Josh Moore: i'm not sure if Stephen kind of that Wagner Wagner Conrad got involved i'm saying yeah there needs to be support for like scene, or you know another s dimension, you know, so if if that's a general feeling that sorry go ahead. 328 00:47:48.960 --> 00:47:59.880 Josh Moore: No, so if you know if so, if, by having a another name dimension s with some meaning for s would actually fix this I could get behind that box. 329 00:48:01.050 --> 00:48:01.530 Josh Moore: plots. 330 00:48:05.970 --> 00:48:21.360 Jamie Sherman: I mean what we've done with at least in our first or in the implementation for cgi files, for example, because I have to deal with it right i've just since s is taken for samples I just used a, for example, for the next letter in the word so. 331 00:48:23.010 --> 00:48:29.610 Jamie Sherman: Something silly like s for seen in a person yeah that's what I meant sorry thanks. 332 00:48:30.330 --> 00:48:30.990 Josh Moore: So that again. 333 00:48:32.280 --> 00:48:37.680 Jamie Sherman: Is for scene and a for samples so just take the second letter and samples, because the scene was taken already. 334 00:48:39.000 --> 00:48:49.230 Jamie Sherman: But you know something silly like that, but it doesn't really matter what the letter is just as long as it's you know there's some way to populate it or something that it does make. 335 00:48:51.690 --> 00:48:52.500 Jamie Sherman: It prevents this. 336 00:48:53.640 --> 00:49:10.560 Jamie Sherman: That horrible mess of like so i've been given a like i've seen this where you get a seven channel rgb image and then you have to try and untangle you know, like, how do you make the interface clear to somebody as to how to access what channel, and you know. 337 00:49:11.670 --> 00:49:14.400 Jamie Sherman: Like input and output become very different. 338 00:49:16.770 --> 00:49:24.360 Dan Toloudis: yeah and you have branching code paths right because you don't want the last dimension to be one in the case of scale or data. 339 00:49:26.190 --> 00:49:31.080 Dan Toloudis: But you do need that last channel to be three for rgb data or that last dimension. 340 00:49:48.540 --> 00:50:04.140 Josh Moore: No one else has immediate solutions or thoughts on that, I guess, a question would be where does that conversation happen so is that part of the overall just dimension conversation that we need to have or someone want to kick off a particular samples conversation. 341 00:50:06.510 --> 00:50:18.240 Jamie Sherman: I mean, in my head I think if if you really want to keep it out of the you know, like if we don't want to serialized the data that way such that, in a sub block or whatever the right word is. 342 00:50:19.800 --> 00:50:26.970 Jamie Sherman: has three channel three samples per pixel, for example, or for in the case of other image formats right the. 343 00:50:28.470 --> 00:50:45.330 Jamie Sherman: Then we'd need some kind of a mechanism, and it may exist already in the metadata to annotate it such that there's a way to know how to reconstruct the image into you know into a similar to do the mapping forward and backwards. 344 00:50:50.160 --> 00:50:52.920 Jamie Sherman: Unless i'm missing something feel free to speak up. 345 00:50:56.040 --> 00:50:57.750 Josh Moore: I don't have the answers here so. 346 00:50:57.810 --> 00:51:07.410 Josh Moore: i'm kind of thing if anyone else does like my my gut feeling as though that we finally have a library that's actually built to be multi dimensional and you know, we should really use it. 347 00:51:07.890 --> 00:51:20.550 Josh Moore: As its intended to be used as opposed to building and hacks from as we're not day one, but you know we're awfully early in this process if we could keep the hacks out of it for a little while that would make implementations easier. 348 00:51:23.040 --> 00:51:39.060 Damir Sudar: Is rgb or nba is that the only example of where this this heck would have to be, or are there other examples, or can we think of other examples where where you would bump into the same type thing. 349 00:51:40.200 --> 00:51:47.340 Damir Sudar: Because if it's only rgb rgb a maybe we should just bite the bullet and say well that's where we'll have to. 350 00:51:48.870 --> 00:51:49.860 Damir Sudar: Be special. 351 00:51:50.610 --> 00:52:00.510 Davis Bennett: Well rgb a is a that's a display thing, but the source of data will be a microscope that someone may have acquired five channels seven channel. 352 00:52:01.740 --> 00:52:02.010 Davis Bennett: Oh. 353 00:52:02.970 --> 00:52:15.990 Damir Sudar: yeah and then those channels are real channels right, so I was more thinking of this specific thing of this yeah this biological natural thing of rgb being special. 354 00:52:17.850 --> 00:52:23.160 Damir Sudar: Because it's multiple for us and channels sure that those needs to be in their own dimension. 355 00:52:26.610 --> 00:52:32.460 Christian Tischer: But in a in a way, if you think of a microscope I mean you could get the same image. 356 00:52:33.810 --> 00:52:41.760 Christian Tischer: In principle, right by taking free images, with a camera and putting three different filters on top so maybe. 357 00:52:42.870 --> 00:52:49.380 Christian Tischer: just saying, these are three channels and then indeed somehow fixing it with the Meta data I don't know. 358 00:52:51.150 --> 00:53:02.970 Davis Bennett: That the access pattern, I think, is pretty important so guess if you're pushing pixels to the gpu or on to the monitor screen itself you want rgb values to be right next to each other in memory. 359 00:53:03.870 --> 00:53:15.840 Davis Bennett: If you have acquired data it's a separate channels, you will probably do something different to the different channels, then you want all the ours to be next to each other and memory all the g's of the seas and whatever. 360 00:53:17.190 --> 00:53:25.230 Davis Bennett: And to add on to the question of another data set that could be like this, I think, for us in this lifetime imaging choir like a time series for every pixel. 361 00:53:27.210 --> 00:53:32.340 Davis Bennett: So I think the access pattern should maybe determine this more than anything else. 362 00:53:37.740 --> 00:53:43.350 Josh Moore: So I think that's right in terms of how you lay out the data, but in terms of this specification. 363 00:53:44.490 --> 00:53:46.410 Josh Moore: you're just thinking about in five and Czar. 364 00:53:47.760 --> 00:54:00.120 Josh Moore: Does having a dimension for these that you can choose to put either at the front, or the end not cover the the data layout that you're talking about to optimize and then it's up to the user to make those choices. 365 00:54:02.520 --> 00:54:04.890 Davis Bennett: Think, as long as you have names for all your array axes. 366 00:54:06.120 --> 00:54:09.150 Davis Bennett: And you push all this under the user for better for worse right. 367 00:54:14.670 --> 00:54:30.120 Christian Tischer: But this this access pattern stuff could that be dealt with by chungking differently, I mean country save it, I mean already maybe in the way that you shank the next level way along the Channel access or something like that. 368 00:54:30.480 --> 00:54:31.830 Josh Moore: So I think by. 369 00:54:31.920 --> 00:54:37.020 Josh Moore: By dimension order chungking parameters and compression parameters. 370 00:54:38.070 --> 00:54:45.360 Josh Moore: With those three three primitives you should be able to lay things out on disk like you want without us needing to build into the specification. 371 00:54:45.690 --> 00:55:01.710 Josh Moore: Oh, is it an rgb a therefore handle this way right, I guess that's what i'm trying to avoid as the special casing where we actually you know, like you have a if clause checking for some for some weird flag which is my understanding that's that's where we are today um. 372 00:55:05.460 --> 00:55:06.330 Josh Moore: Maybe that's naive. 373 00:55:17.670 --> 00:55:20.250 Josh Moore: don't really have a good answer yet about who's writing this off, but. 374 00:55:21.360 --> 00:55:22.410 Josh Moore: someone's welcome to. 375 00:55:23.460 --> 00:55:25.800 Josh Moore: give it a go it's a good problem. 376 00:55:26.550 --> 00:55:28.050 Josh Moore: it's a good thing to get right let's put it that. 377 00:55:28.050 --> 00:55:30.960 Jackson Brown: way, yes I. 378 00:55:31.980 --> 00:55:44.730 Jackson Brown: i'm assuming I will likely be doing the work for me, is our writer, is to jail and i'm basically saying i'm concerned as to if is our only supports five D, or it has that like five dimensional. 379 00:55:45.150 --> 00:55:52.950 Jackson Brown: Like hard coded requirements i'm like what's going to happen as soon as someone says, I have sample data sorry. 380 00:55:53.940 --> 00:56:04.050 Josh Moore: yeah so yeah everything i've said so far is predicated on getting rid of the five D hard coded order, so we must have flexible dimensions, they must be. 381 00:56:05.520 --> 00:56:18.180 Josh Moore: You know, basically you define your your dimension order, I guess, like you do and by formats are and tiff file and it works, the same enormous so if we have that. 382 00:56:19.260 --> 00:56:22.380 Josh Moore: Does everyone feel comfortable without special casing. 383 00:56:27.630 --> 00:56:39.060 Jackson Brown: I think that I think that also answers the other question that gave me lead off with was like music handling and if you can say like this dimension is titled dimension that kind of also handles that too. 384 00:56:39.900 --> 00:56:42.030 Josh Moore: yeah I mean that's what chungking gets you so. 385 00:56:43.410 --> 00:56:43.620 Jackson Brown: yeah. 386 00:56:44.010 --> 00:56:52.440 Jamie Sherman: The Dolby just on that in that vein one weird it depends on the file manifest the manufacturers format as well, but. 387 00:56:53.520 --> 00:57:01.350 Jamie Sherman: Did the does the indexing have to be a dense representation or can it be sparse mean doesn't have to be contiguous. 388 00:57:02.490 --> 00:57:05.310 Jamie Sherman: 123 or can it go 135. 389 00:57:07.320 --> 00:57:08.400 Jamie Sherman: In its indexes. 390 00:57:12.660 --> 00:57:14.010 Jamie Sherman: I know that sounds weird but. 391 00:57:15.300 --> 00:57:19.620 Jamie Sherman: Scope manufacturers, essentially, or at least site Nice in this case uses. 392 00:57:20.970 --> 00:57:36.210 Jamie Sherman: The m index to align across scenes or yeah across sub walks and and so they're not guaranteed to be sequential in a in a scene, for example, which makes it all really weird. 393 00:57:37.290 --> 00:57:41.130 Jamie Sherman: We haven't dealt with that cleanly in a CS. 394 00:57:42.150 --> 00:57:42.990 Jamie Sherman: Island cgi. 395 00:57:43.020 --> 00:57:43.380 But. 396 00:57:44.790 --> 00:57:48.570 Jamie Sherman: Like we're just dense packing them at the moment but i'm wondering if. 397 00:57:49.620 --> 00:57:50.730 Jamie Sherman: There are better ways to do it. 398 00:57:53.520 --> 00:58:07.350 Josh Moore: I mean my assumption, I guess, I would have to see the data, and this is getting quite detailed would be if you if you set up the trunk being right and you don't write, one of the indices then they're just be nothing on disk. 399 00:58:08.400 --> 00:58:09.390 Josh Moore: And so that would be fine. 400 00:58:10.560 --> 00:58:19.920 Josh Moore: If someone tries to read it they'll get the default revalue, though, so you would probably want to add metadata to tell them what I don't want this year yeah okay. 401 00:58:22.830 --> 00:58:23.550 Jamie Sherman: All right, thanks. 402 00:58:27.630 --> 00:58:28.530 Josh Moore: Okay well. 403 00:58:33.390 --> 00:58:42.090 Josh Moore: it's too late for me to have a coffee sorry no wait myself back up I shouldn't have gone running before this um so we're at an hour now so one more hour. 404 00:58:43.800 --> 00:58:50.580 Josh Moore: um, are there any other big topics anyone would like to kick off from the list of specifications and or new specifications. 405 00:58:52.320 --> 00:58:57.240 Christian Tischer: I mean, I have a small topic, maybe what's the status of the display settings. 406 00:58:58.050 --> 00:59:04.200 Josh Moore: Okay, so now we're kind of going to the list of things that we haven't talked about yet so that's good so. 407 00:59:04.290 --> 00:59:05.100 um. 408 00:59:06.330 --> 00:59:09.090 Josh Moore: The display settings So what we did. 409 00:59:10.470 --> 00:59:16.980 Josh Moore: For our own purposes and maybe this is opening pandora's box um. 410 00:59:18.270 --> 00:59:20.610 Josh Moore: So we were bad citizens basically we said. 411 00:59:22.740 --> 00:59:28.920 Josh Moore: we're not going through the process of specifying this block of metadata, yet we know we need to but couldn't get it done. 412 00:59:29.160 --> 00:59:41.820 Josh Moore: So we're going to put all this metadata under a block that's in a namespace that we don't consider for everyone's consumption and that's the Romero block right, so we said we have this metadata display metadata in. 413 00:59:42.900 --> 00:59:53.400 Josh Moore: You know marrow for many of the images for all the images that are in the ID or we're just going to take it out and we're going to store it in the tsar and so, if you so I think in. 414 00:59:54.840 --> 00:59:59.820 Josh Moore: can't remember if visor is making use of it, I think it is Trevor you can correct me. 415 01:00:00.600 --> 01:00:12.270 Josh Moore: But certainly in the party so with the Python implementation, it will show you your images with the colors that you're expecting because we have the metadata Now all that metadata needs to get turned into. 416 01:00:13.350 --> 01:00:23.370 Josh Moore: What do you want to call it, you know, is there a is there a specification called display settings that it gets attached to each image, or is it part of a larger. 417 01:00:24.210 --> 01:00:40.710 Josh Moore: concept, you know here's here's where the plus the process of designing a specification comes into play what's a good name for the thing what's going to make this generally useful for everyone and represent concepts that everyone can understand um Someone needs to do all that work. 418 01:00:41.790 --> 01:00:47.730 Josh Moore: For us it's on the roadmap and we'll look at the roadmap here in just a second um but. 419 01:00:48.180 --> 01:00:59.760 Josh Moore: We haven't started work on it so that's certainly something if someone is interested, they could dig in, and you know really it's taking the json out of the o'meara block and putting it ooh someone's. 420 01:01:00.540 --> 01:01:10.530 Josh Moore: Adding issues excellent um and and naming it something that means something for the Community and driving the whole consensus process so. 421 01:01:12.270 --> 01:01:17.070 Josh Moore: invitation to get involved, I guess, otherwise we will get to it as quickly as we can. 422 01:01:27.030 --> 01:01:29.550 Christian Tischer: Okay, then I have another question. 423 01:01:31.980 --> 01:01:40.470 Christian Tischer: If it's if it's part of this specifications thing I was just with the I had a discussion with Stefan Seifert about labels. 424 01:01:42.420 --> 01:01:43.980 Christian Tischer: about the down sampling. 425 01:01:45.120 --> 01:01:50.850 Christian Tischer: Like I don't know what the writer libraries do right now I wasn't aware that i'm opening a pandora's box. 426 01:01:51.330 --> 01:02:03.120 Christian Tischer: There, and I was talking about it, but if you down sample label masks I mean you don't want to do nearest neighbor because then you invent your neighbors so you have to do something else, and then. 427 01:02:04.590 --> 01:02:16.140 Christian Tischer: There are at least to kind of natural options, one is to kind of take the middle pixel layer higher or you take the most frequent PICs layar or. 428 01:02:16.830 --> 01:02:34.290 Christian Tischer: You do like the guy such a new yeah you somehow store a whole map of frequencies of labels at each location in the down sampling, I was just curious what the current writer writer implementations are doing there. 429 01:02:35.970 --> 01:02:39.240 Josh Moore: So that one is our implementation is using. 430 01:02:39.960 --> 01:02:47.670 Josh Moore: Open CV and I can dig up the code arm I don't know if anyone else has implementations that. 431 01:02:48.870 --> 01:02:53.400 Josh Moore: You know it's certainly a question and actually we need to capture the metadata of what's happening. 432 01:02:53.850 --> 01:03:00.540 Josh Moore: So I think the only metadata we have now is to say, which method you're using so basically you just you write that method in. 433 01:03:01.140 --> 01:03:16.800 Josh Moore: arm, but if there needs to be a flag or an enumeration so you know you can pick one of these modes for doing the down sampling, then that would be an extension of the current specification to say here, we really need to know exactly what you did. 434 01:03:20.370 --> 01:03:20.760 Christian Tischer: Okay i'm. 435 01:03:25.470 --> 01:03:29.190 Davis Bennett: Just gonna say I have a Python library Where are you, I support the mean in the mode. 436 01:03:30.360 --> 01:03:31.620 Davis Bennett: that's pretty bare bones. 437 01:03:35.160 --> 01:03:38.730 Christian Tischer: And then use mode for the labels and mean for the intensity is too big. 438 01:03:39.270 --> 01:03:40.410 Davis Bennett: yeah and I don't. 439 01:03:41.790 --> 01:03:50.460 Davis Bennett: bother stefan's all felt, but this is only for visualization so the lossless down sampling, I think, is when you want to do annotations and I don't support that. 440 01:03:51.390 --> 01:03:57.630 Christian Tischer: Okay, no, but I think that's reasonable, I mean I would I was writing something similar for the Java right yeah. 441 01:03:58.710 --> 01:04:01.050 Christian Tischer: I was just curious, but people do. 442 01:04:11.340 --> 01:04:22.200 Josh Moore: i'll just keep going through the list that people added here unless something else someone else just throw something out maybe they're fairly short of the Java writer. 443 01:04:24.510 --> 01:04:27.600 Josh Moore: So I guess tissue you have a Java writer at the moment right. 444 01:04:28.740 --> 01:04:29.100 Josh Moore: No. 445 01:04:29.250 --> 01:04:30.000 Josh Moore: So did you know. 446 01:04:30.870 --> 01:04:34.440 Josh Moore: So the output from bio formats to. 447 01:04:34.440 --> 01:04:42.390 Josh Moore: Raw if you pass it the current the right flags which should become the default will be in armies are so that is a Java writer. 448 01:04:43.170 --> 01:04:56.700 Josh Moore: um and then there will be a Czar reader and a Czar writer and bio formats, which should kind of be a basis for the Java stacks anyone who want who needs to read our rights are. 449 01:04:58.380 --> 01:05:03.630 Josh Moore: So that's currently on a branch of anyone who's interested in testing that out, is more than welcome to. 450 01:05:08.940 --> 01:05:14.790 Christian Tischer: Okay, the next contact you, I think we kind of needed rather sooner than later, so. 451 01:05:16.230 --> 01:05:22.590 Josh Moore: Okay, and I think Kimberly was trying out the converter class, but I don't know the status of that. 452 01:05:25.200 --> 01:05:25.620 Christian Tischer: Okay. 453 01:05:38.340 --> 01:05:43.650 Josh Moore: Someone put geometry Mitch meshes points, who added this and what do you mean. 454 01:05:51.000 --> 01:05:52.650 Josh Moore: They may have left by now so. 455 01:05:54.600 --> 01:05:56.100 Josh Moore: If no one else wants to talk about it, I. 456 01:05:58.500 --> 01:06:05.070 Davis Bennett: mean I would speculate that at some point, there will be data structures that don't naturally fit in a Czar array. 457 01:06:06.480 --> 01:06:09.630 Davis Bennett: And maybe geometry is one of those. 458 01:06:10.860 --> 01:06:27.150 Jackson Brown: I remember like that specific geometries message points being referenced in like a query annotations in the Jesse issue, I feel like I remember that exact string of characters it's like, how do we store these annotations read them back out. 459 01:06:28.770 --> 01:06:29.760 Josh Moore: This Tele still here. 460 01:06:32.130 --> 01:06:32.790 Talley Lambert: yeah hey. 461 01:06:33.540 --> 01:06:41.400 Josh Moore: You did not this, no, no, no, it doesn't make sense that that is kind of what it would be referencing maybe. 462 01:06:42.900 --> 01:06:50.880 Talley Lambert: It does sound familiar from that post the truth of the party is that you know we don't have any sort of. 463 01:06:51.540 --> 01:07:02.250 Talley Lambert: we're we're we're leaning heavily on these sorts of conversations, to come up with the answer that you know, there is no internal model that it's like Oh, we should use when the party does, because the party is just like looking for there to be some sort of. 464 01:07:03.210 --> 01:07:09.990 Talley Lambert: You know not not yeah consensus on on how those annotations might be stored we don't we don't have any sort of saving format ourself. 465 01:07:15.240 --> 01:07:25.980 Josh Moore: yeah so yeah I think we'll get there, someone was going to work on, I can't remember if meshes got picked up this morning, someone can maybe look, but certainly. 466 01:07:28.650 --> 01:07:39.030 Josh Moore: A vector based representation and the two that were mentioned where PL why and vr ml and someone was going to run that via the specification process so. 467 01:07:39.360 --> 01:07:51.330 Josh Moore: You know that may produce something that supports enough of the implementations sorry enough with the the requirements, but davis's right, you know I think we will have to be prepared for the situation where we get to the point where that. 468 01:07:51.360 --> 01:07:52.710 Josh Moore: which we want to specify. 469 01:07:52.740 --> 01:08:02.220 Josh Moore: is just too complicated to get into the tsar and then we'll make some decisions, for example, probably pretty soon. 470 01:08:03.540 --> 01:08:21.840 Josh Moore: will want a specification for how to link to external files right and you just say there's this other file and it does something really complicated because otherwise we'll get into the situation where we are storing pdfs as our arrays or you know, whatever um. 471 01:08:23.460 --> 01:08:35.160 Josh Moore: I don't have a great suggestion, there is a specification that's built to do that it's called aro CRATE if anyone wants to look at it, I actually kind of like what they've done um. 472 01:08:38.820 --> 01:08:46.200 Josh Moore: Everybody checks is like the place to go for any type of stuff like that say that again research objects yeah yeah. 473 01:08:46.800 --> 01:08:54.270 Josh Moore: um the caveat, however, is so i'm just pasting the link under what. 474 01:08:55.350 --> 01:09:07.380 Josh Moore: Davis just said um it's not built for files like Czar so linking from these research object concepts to a multi file data set like Czar. 475 01:09:08.070 --> 01:09:15.600 Josh Moore: breaks everything similar like the noto, for example, doesn't work with Czar you must, is it up your Czar to upload it to the noto. 476 01:09:16.200 --> 01:09:30.870 Josh Moore: um so there's there's like two worlds there's the monolithic file world and there's the chunk you know cloud storage file file format world um and it's probably up to us to come up with a bridge between the two of them. 477 01:09:43.200 --> 01:09:47.250 Josh Moore: Okay, feel free to interrupt i'm going to just try to get to the last few things and then. 478 01:09:49.050 --> 01:09:59.730 Josh Moore: We can talk a little bit about process, because that worked pretty well this morning, so I wanna I want to go fishing for more people to get involved in the issues um. 479 01:10:01.350 --> 01:10:04.950 Josh Moore: So we talked about all me xml dummy or any specifics left on that. 480 01:10:08.490 --> 01:10:14.280 Damir Sudar: Note the only thing a little bit is so you know i'm a member of the. 481 01:10:15.360 --> 01:10:23.370 Damir Sudar: By averaging North America working group that's trying to work together in a whole bunch of other other people to put together a. 482 01:10:25.050 --> 01:10:30.480 Damir Sudar: A set of require or set of recommendations of what should be in that. 483 01:10:32.040 --> 01:10:32.880 Damir Sudar: metadata. 484 01:10:33.930 --> 01:10:34.380 Damir Sudar: We. 485 01:10:35.640 --> 01:10:43.080 Damir Sudar: keep on bumping into which of this what we're trying to define should actually be in the. 486 01:10:44.220 --> 01:10:59.940 Damir Sudar: Implementation side meeting in bizarre bizarre world and what should still be on the Meta data side on the pure Meta data side and so that that's that's one one challenge that we keep running into any other one is. 487 01:11:01.980 --> 01:11:10.110 Damir Sudar: If you would like your metadata to be somewhat monolithic not not be scattered across a term file system. 488 01:11:11.970 --> 01:11:18.750 Damir Sudar: Because you would like to be able to read it all, and then interpret it all, and so that. 489 01:11:19.170 --> 01:11:28.770 Damir Sudar: comes down to where would it be stored inside a SAR would just be a block at the top that says, this is where all the metadata guys or would it be. 490 01:11:29.640 --> 01:11:47.850 Damir Sudar: kind of associated with the individual chunks say if there's a title somewhere, then the ordinance of that file could be with that title right, but the garden that's our kind of a more a new position type of thing the stage for what i'm talking about. 491 01:11:49.140 --> 01:11:59.010 Damir Sudar: So, so that that that that is one of those challenges that we we we have punted on so far, but we need to come up with but kind of great approach. 492 01:12:04.410 --> 01:12:05.610 Josh Moore: All I can say is, I agree. 493 01:12:07.260 --> 01:12:14.190 Josh Moore: needs doing anyone have immediate opinions to add to that other than you'd like to see it done as well. 494 01:12:16.500 --> 01:12:26.610 Jamie Sherman: Like i've wrestled with some of that in size format files and like one of the things I would push on hard in terms of like it's. 495 01:12:27.780 --> 01:12:35.070 Jamie Sherman: I I come from the mass spec field and i've seen this from both sides, and one of the issues that really comes up with open specs. 496 01:12:35.700 --> 01:12:47.910 Jamie Sherman: Is the mapping right so it's great come up with a wish list of what you would like to see in that, but if you haven't tightly defined what that is like what exactly that property is it becomes an absolute nightmare. 497 01:12:48.330 --> 01:12:59.490 Jamie Sherman: For people to to implement, and not only to implement, but it can have like in the mass spec space, it can have catastrophic effects on your identifications from your raw data. 498 01:12:59.970 --> 01:13:13.140 Jamie Sherman: You know, like if the resolution isn't stored in the right units, because the units haven't been clearly specified in the spec your tolerances are all off and you'll start matching things that don't even you know, and I i've actually seen exactly that one happened. 499 01:13:14.490 --> 01:13:23.760 Jamie Sherman: You know, so I would it's a strong ask on like if you're proposing things that you want to see an aspect that has larger verbiage, you know. 500 01:13:25.290 --> 01:13:30.000 Jamie Sherman: Specifically bina you know, like if you're extending it that way. 501 01:13:31.140 --> 01:13:34.200 Jamie Sherman: tight definition definitions on what the things are is really important. 502 01:13:35.700 --> 01:13:42.480 Jamie Sherman: same for me like if it's a new set sorry there's a dog chewing on something in the background, so. 503 01:13:44.850 --> 01:13:46.230 Jamie Sherman: Long as it's not you, then yeah. 504 01:13:48.630 --> 01:13:54.330 Josh Moore: anyone else on metadata storage monolithic versus scattered versus. 505 01:13:55.710 --> 01:14:01.560 Josh Moore: binary versus text versus json vs xml versus it's gonna be a fun one. 506 01:14:02.730 --> 01:14:11.430 Rohola Hosseini: Sorry, so maybe one point, yes, so I was also looking into this packaging for a screen data and stuff like that, so how is the tsar handling. 507 01:14:12.270 --> 01:14:29.190 Rohola Hosseini: The metadata which is like a like files or things which our types, like, for example in America, you can attach to each level like different files like can be any format So is there any like guidelines on how to store those data or. 508 01:14:29.430 --> 01:14:32.640 Josh Moore: No, I mean to some degree that's exactly what we're talking about here is. 509 01:14:32.820 --> 01:14:41.850 Josh Moore: So cola has data in a numero that's high content and is looking to get that into one means are we can get the imaging data into the high content. 510 01:14:42.240 --> 01:14:53.340 Josh Moore: format, but then there's everything else right there's all the annotations that are numero all the attached files um you know, so I think I mentioned it on image se is that. 511 01:14:54.060 --> 01:15:10.320 Josh Moore: Anything that's that i've i've said is missing that's basically the process we're doing here so that would be linkage to external files and um yeah the key value pairs from your from the annotation so those will all need to get stored somewhere. 512 01:15:12.030 --> 01:15:13.020 Josh Moore: And that hasn't been done yet. 513 01:15:17.520 --> 01:15:17.790 Thank. 514 01:15:23.940 --> 01:15:29.730 Josh Moore: Okay, someone added remote links relationships between data sets we kind of talked about this, but not exactly sure. 515 01:15:30.990 --> 01:15:31.890 Josh Moore: So anything else to. 516 01:15:37.140 --> 01:15:39.570 Josh Moore: open to that kind of tabled, I don't have anything else down. 517 01:15:42.060 --> 01:15:47.220 Josh Moore: And, someone said label features status, so I think I mentioned it i'm just. 518 01:15:48.420 --> 01:15:59.640 Josh Moore: So the the pixel based annotations the labeled image is a specification that you know everyone's encouraged to use it, we will continue to support it um. 519 01:16:02.340 --> 01:16:02.580 Josh Moore: well. 520 01:16:04.080 --> 01:16:10.890 Christian Tischer: I think somebody was working on like let's say mean you have these labels right, but maybe you want to store. 521 01:16:12.090 --> 01:16:14.610 Christian Tischer: a bunch of numbers, with each label right. 522 01:16:15.930 --> 01:16:17.700 Christian Tischer: Like some measurements that somebody. 523 01:16:18.180 --> 01:16:24.750 Josh Moore: that's so the feet label features yeah I see what you mean, so I didn't know if it was the feature of the spec or. 524 01:16:25.230 --> 01:16:42.750 Josh Moore: No you're talking about a Speck, which is the features on for labels um as far as I know, no one's worked or anything on that, I mean so drag from monash I think updated the specification, so that you can attach. 525 01:16:43.800 --> 01:16:52.140 Josh Moore: Small metadata to each of the labels so basically you can add arbitrary values to the labels that you've used on your segmentation. 526 01:16:52.590 --> 01:17:05.550 Josh Moore: um but that's not going to scale, you know if you if you really have a table of large feature data, you need to get that out into an array because it's just it's going to be horrible um and that's probably something we'll see. 527 01:17:07.110 --> 01:17:11.070 Josh Moore: In many different specifications so going back to what don Maria was saying, you know. 528 01:17:12.150 --> 01:17:18.990 Josh Moore: For small smallish to mid sized metadata and make sense to have it in my opinion is that makes sense to have it in json. 529 01:17:19.500 --> 01:17:27.840 Josh Moore: Eventually, you will reach a limit where you want to take it all and just turn it into binary and you want to tell people you know you need to go parse the binary if you want to deal with this big thing. 530 01:17:29.100 --> 01:17:32.130 Josh Moore: And I think that'll happen with the label features as well. 531 01:17:33.060 --> 01:17:46.410 Davis Bennett: one kind of wrinkle for the label field images that we've run into is data type that you use one hand to support the most labels, you might want to use in 64. 532 01:17:47.160 --> 01:17:57.030 Davis Bennett: On the other hand, these images get really big so we're still trying to figure out when we use a data type that reflects the actual number of labels, we have. 533 01:17:57.900 --> 01:18:06.660 Davis Bennett: which can vary sample to sample so you could have one sample where you're looking for nuclei you only found one another sample you're looking for nuclei you found two to the 32. 534 01:18:07.800 --> 01:18:16.110 Davis Bennett: These arrays are very different data types, but they kind of contained the same type of information so stefan's all felt proposed basically. 535 01:18:17.070 --> 01:18:24.510 Davis Bennett: For every single array of data set that represents a label field to give it labels in the biggest integer space possible. 536 01:18:25.080 --> 01:18:47.280 Davis Bennett: and express a mapping from the true ID it's like local ID if you only have 255 label you still preserve the concept that they came from the UN 64 space and then you have a metadata that says now this thing is really 200 billion, but locally i'm going to call it seven side this data. 537 01:18:48.540 --> 01:18:58.620 Davis Bennett: To layer of complexity, but it can really save a lot of space preserve the full space of labels available can save something that's you went eight when otherwise you'd be doing you at 64. 538 01:18:59.940 --> 01:19:01.800 Davis Bennett: We haven't implemented, this is just an idea. 539 01:19:02.940 --> 01:19:04.080 Bill Katz: So I mean that's that's the. 540 01:19:04.140 --> 01:19:13.530 Bill Katz: type of compression and we experimented a lot with that in doing compressions for our label block data, where especially. 541 01:19:14.190 --> 01:19:20.970 Bill Katz: I think this has great impact, as you have sort of like much smaller block sizes within the larger chunks. 542 01:19:21.690 --> 01:19:37.650 Bill Katz: Because within a particular area, the labels aren't changing very much at all, so you can wind up with this, you know great compression technique, but i'm not yeah i'm not quite sure about the difference between like, whether that be in a compression or the specs or. 543 01:19:43.410 --> 01:19:46.890 Damir Sudar: And all the topic of story labels and what. 544 01:19:48.570 --> 01:20:01.320 Damir Sudar: We just talked about reminds me, the problem is storing labels as label images they cannot have overlapping objects right, you can only have touching objects at the at the best. 545 01:20:03.300 --> 01:20:09.480 Damir Sudar: If we were to switch away completely from the idea of storing it as images, but if the forum as. 546 01:20:11.190 --> 01:20:30.570 Damir Sudar: run like encoded tables or some some other storage mechanism cool tour maps or something like that, then you could allow overlapping objects as well, and you no longer have this problem about how many objects, can you maximally sorry in an image. 547 01:20:33.810 --> 01:20:41.040 Damir Sudar: that's maybe not something that we want to discuss, because it means throwing away a whole bunch of work that has been done already right. 548 01:20:42.870 --> 01:20:54.180 Lee Kamentsky: way i've done, that is to have a mapping use the labels matrix as a mapping and the places where they do overlap you. 549 01:20:54.210 --> 01:21:01.050 Lee Kamentsky: Just have all the mapping goes to a list of two objects that share the same pixel. 550 01:21:05.010 --> 01:21:08.130 Josh Moore: yeah so I mean there are a couple of ways, so currently. 551 01:21:09.960 --> 01:21:16.320 Josh Moore: there's no overlapping, so you would have multiple label objects, if you needed to support labeling sorry overlaps. 552 01:21:17.610 --> 01:21:23.640 Josh Moore: yeah and we would need to choose one of the solutions you guys are spelling out or multiple of them. 553 01:21:24.090 --> 01:21:30.630 Josh Moore: You know, to take that specification to the next level, so we could add in the the list of multiple objects or. 554 01:21:31.050 --> 01:21:40.200 Josh Moore: You know I don't actually think it's that bad to throw things away, Dom here now, the current implementations will need to keep supporting this data until the data gets upgraded some future point right. 555 01:21:40.860 --> 01:21:49.470 Josh Moore: um so that is complexity, but the goal of all this is to get the specifications right, so I don't think we should say it's it's any of it's written in stone. 556 01:21:54.450 --> 01:22:01.920 Josh Moore: Okay, and the last thing the last grenade that someone through here is yeah how to deal with the number of files. 557 01:22:03.030 --> 01:22:03.450 Josh Moore: So. 558 01:22:05.430 --> 01:22:06.330 Ken Carlile: yeah so. 559 01:22:07.830 --> 01:22:14.820 Ken Carlile: i'm assisted Min I may not be in the right forum for this, but mark pointed me towards you guys so i'm gonna stick my nose in. 560 01:22:15.150 --> 01:22:15.960 Josh Moore: So we are. 561 01:22:16.380 --> 01:22:17.760 Ken Carlile: So where do you have a good. 562 01:22:18.090 --> 01:22:21.720 Ken Carlile: engineer yeah okay so mark kid is typical. 563 01:22:23.370 --> 01:22:28.770 Ken Carlile: Assuming I pronounced his name right which is a big assumption I don't think i've ever heard it pronounced so um. 564 01:22:30.030 --> 01:22:34.500 Ken Carlile: So as a system admin I get to deal with. 565 01:22:35.580 --> 01:22:41.520 Ken Carlile: All of these czars and and fives and stuff landing on the storage that I have to manage. 566 01:22:42.960 --> 01:22:43.740 Ken Carlile: and 567 01:22:45.180 --> 01:22:57.300 Ken Carlile: A big thing that that comes up for me is what exactly is expected by these formats, to be. 568 01:22:58.380 --> 01:23:05.880 Ken Carlile: On the on the storage side, do you expect object, do you expect a deck like s3 do expect it to be file. 569 01:23:06.930 --> 01:23:10.440 Ken Carlile: You know it's it's very unclear to me. 570 01:23:11.820 --> 01:23:24.240 Ken Carlile: And this, I think, a more a more critical critical part of this is, from what I know and i'm again SIS admin i'm not a software developer i'm not a scientist i'm sorry. 571 01:23:27.900 --> 01:23:35.790 Ken Carlile: Czar by default at least one of the writer implementations of it just writes all the files to a single directory. 572 01:23:36.390 --> 01:23:41.640 Josh Moore: So before you got here I I did my spiel on this because it's killing us as well, so. 573 01:23:41.820 --> 01:23:42.510 Josh Moore: that's my top. 574 01:23:43.350 --> 01:23:48.660 Josh Moore: My top to do is to make that go away because it's you know it's very painful so. 575 01:23:48.720 --> 01:23:51.300 Ken Carlile: 20 million files in one directory is bad. 576 01:23:51.540 --> 01:23:54.060 Josh Moore: yeah i'm I. 577 01:23:55.080 --> 01:23:57.780 Josh Moore: Think i'm on the order of like 50 million now in a directory so. 578 01:23:58.710 --> 01:24:00.390 Ken Carlile: How do you even do anything with it. 579 01:24:00.450 --> 01:24:11.010 Josh Moore: You don't, which is the problem right, so it makes it a very clear and present Blocker that has to be fixed um so let's assume that's fixed, can you know because it's good to have your feedback. 580 01:24:12.150 --> 01:24:13.710 Josh Moore: Do you have any other worries. 581 01:24:14.700 --> 01:24:21.030 Ken Carlile: Yes, i'm from our at our edge analia our storage systems. 582 01:24:22.560 --> 01:24:31.170 Ken Carlile: by hook or by crook, or by chance, or by us doing something right can handle a very large amount of files. 583 01:24:32.820 --> 01:24:58.440 Ken Carlile: At other htc locations, that is not the case, because they're going to be running stuff like GPS or luster as the file system and those will run out of of file account real fast if tsar is used on them, at least in the way that i've seen it used engineer. 584 01:25:00.210 --> 01:25:04.980 Ken Carlile: So I think that's something that needs to be brought into consideration, there should be. 585 01:25:07.560 --> 01:25:11.040 Ken Carlile: Again this is may not be for the implementation side, but there should be. 586 01:25:12.960 --> 01:25:15.570 Ken Carlile: a lower limit on. 587 01:25:16.590 --> 01:25:26.610 Ken Carlile: The size of the files, so you don't end up with these millions and millions of files that just stop the file system from working. 588 01:25:28.440 --> 01:25:40.770 Ken Carlile: or becomes so cumbersome to the file system itself that it becomes impossible for the user to be retrieving them in a sort of timely fashion. 589 01:25:43.050 --> 01:25:46.980 Ken Carlile: So I think that is a consideration that I just wanted to bring up to the Community. 590 01:25:49.710 --> 01:26:03.060 Ken Carlile: So, but it like in the instance of Stefan is all fields lab I believe they have something on the order of. 591 01:26:04.740 --> 01:26:13.560 Ken Carlile: 500 million files in oh gosh I should I should have grabbed this before I started talking. 592 01:26:14.430 --> 01:26:16.110 Josh Moore: yeah don't do a final this thing now. 593 01:26:17.580 --> 01:26:18.630 Ken Carlile: Oh, I have ways. 594 01:26:20.700 --> 01:26:21.360 Josh Moore: interview. 595 01:26:21.930 --> 01:26:23.910 Ken Carlile: Sorry, no starfish. 596 01:26:25.980 --> 01:26:33.930 Ken Carlile: 845 million files in 252 terabytes on one of our storage systems. 597 01:26:35.220 --> 01:26:42.330 Ken Carlile: And like I said, we are fortunate gianelli to have some that can handle that other places, no way. 598 01:26:45.600 --> 01:27:03.660 Ken Carlile: And I understand that there is a big motivation towards we just want to be able to retrieve this little part, so we can visualize it and we just want it, we want to be able to write from everything at once, from all our possible sources, we want to be able to write, so we get lots of parallelism. 599 01:27:05.220 --> 01:27:24.390 Ken Carlile: But there is a lower bound to where or there's there's a boundary to where the performance hit from all the latency from all those files will overcome the advantage of being able to just pick and choose one you know a few K from here and UK from there and UK from the other place. 600 01:27:26.610 --> 01:27:42.870 Ken Carlile: So those are those are my big concerns about these sorts of file formats html5 has its own problems, you know one single monolithic file you got another whole set of problems and that's simply doesn't work with the storage systems that we have so. 601 01:27:44.160 --> 01:27:46.110 Ken Carlile: I don't have any good answers I just have complaints. 602 01:27:48.960 --> 01:27:49.230 Josh Moore: and 603 01:27:50.160 --> 01:27:55.650 Bill Katz: Just to add to that so basically there was this issue of shorted I mean. 604 01:27:56.880 --> 01:28:14.490 Bill Katz: So nested directory is only get you so far, the issue is more that you just have lots of files and at some point there's a trade off between you know how small the files can get because you want to be able to slice the data more granular early versus larger data. 605 01:28:15.510 --> 01:28:21.600 Bill Katz: Which is better from a file system perspective and sequential access, but then starts getting you other kinds of problems. 606 01:28:22.080 --> 01:28:30.000 Bill Katz: And so, like, at least on the neuro glanzer side that was the distinction between moving between the uncharted and the shorted file formats, where. 607 01:28:30.960 --> 01:28:40.920 Bill Katz: You know they're just grouping all kinds of chunks into one shard shard file and the other huge advantage of that is that when you start dealing with cloud data stores. 608 01:28:41.400 --> 01:28:52.470 Bill Katz: That, then you can coalesce the number of your http requests into one that does a range query so that you potentially getting a lot of Blocks with one round trip with. 609 01:28:52.950 --> 01:29:08.130 Bill Katz: with at least one http request as opposed to being required to have one http request for every single object that you're pulling so it's, not just for the local file systems or whatever it's also, for you know, like these. 610 01:29:09.390 --> 01:29:12.570 Bill Katz: These web services that are working on blob stores. 611 01:29:17.190 --> 01:29:17.550 Josh Moore: Even when. 612 01:29:17.580 --> 01:29:27.030 Bill Katz: I think that might be out of the scope, in the sense that you know this, this is talking about shorted files and maybe we're just it's just simply not going to be in the cards here. 613 01:29:27.570 --> 01:29:38.010 Bill Katz: Because of the way I think there has been some discussion about shorted files and in Czar from from offshoots or something like that certainly neural glands we've we've looked at that. 614 01:29:38.820 --> 01:29:50.490 Josh Moore: yeah so I added a link to the to the don'ts buildings so trevor's not here so i'll try to represent this at least a bizarre space, this is kind of developed. 615 01:29:51.450 --> 01:29:57.840 Josh Moore: outside of Czar like you said you know it's it's a layer on top of Czar um I don't think that's ideal, like, I think. 616 01:29:58.560 --> 01:30:12.390 Josh Moore: um you know if we're doing all this work to have a file format that we're all working together on we probably need to have some involvement with a specification, but for the moment it lives here in this fs spec reference maker repository. 617 01:30:13.050 --> 01:30:37.890 Josh Moore: And it's basically a way to write down the location of um yeah the individual chunks in other files so, for example, you can take a hdfs file you parse hdfs file file it generate some I think json and then you can use that json the json plus the html5 gives you a Czar. 618 01:30:39.150 --> 01:30:49.770 Josh Moore: Protocol for accessing the hdfs data, you know, and I think we can take this further I think you know, there could be a Czar chunk to Czar super chunk. 619 01:30:50.850 --> 01:30:58.890 Josh Moore: process which is basically your shorted uncharted you know you there would be a knob that someone likely assist admin could take and say. 620 01:30:59.610 --> 01:31:09.390 Josh Moore: Sorry, you know you are, you have broken the rules, you now must combine at least this many files together into larger files um. 621 01:31:10.080 --> 01:31:24.030 Josh Moore: And then, what we need is that specification so okay I combine these things together, how is it that I represent what i've done so that the user doesn't need to record what they've what they've built right, and I think we can get there. 622 01:31:24.420 --> 01:31:25.650 Bill Katz: I think that makes a lot of sense. 623 01:31:25.650 --> 01:31:30.180 Bill Katz: For the I mean I think one issue might be when you start doing rights. 624 01:31:31.710 --> 01:31:36.510 Bill Katz: Interest place rights, in particular, as opposed to some other mechanism so. 625 01:31:37.770 --> 01:31:43.440 Bill Katz: I guess if you're going to push it off to html5, then it will handle it but. 626 01:31:44.130 --> 01:31:56.880 Josh Moore: yeah I mean, I think that we're just giving people tools they'll have to make the choice right, so if someone wants parallel rights, then maybe they argue with their system and and say no, I actually do want this many small files, you know i'll pay for it, or whatever. 627 01:31:57.720 --> 01:32:08.220 Josh Moore: A lot of us have public only data, you know we've the data is written it's not going to change anymore and there you can do a lot of compression and cleaning up um. 628 01:32:08.430 --> 01:32:11.070 Bill Katz: yeah beautiful data sets I think it gets a lot easier. 629 01:32:11.880 --> 01:32:13.410 Bill Katz: yeah definitely. 630 01:32:15.330 --> 01:32:15.900 Josh Moore: Okay. 631 01:32:16.890 --> 01:32:17.940 Ken Carlile: Just one other. 632 01:32:18.000 --> 01:32:18.510 Note. 633 01:32:19.560 --> 01:32:29.430 Ken Carlile: And I didn't speak up when when it came up, but it was about the metadata where to store the the metadata whether to keep it with the tiles or. 634 01:32:30.030 --> 01:32:41.070 Ken Carlile: Put it all in one unified space, to my mind, and a naive interpretation put it all in one thing region into memory, then you never have to go back to the desk for. 635 01:32:44.940 --> 01:32:50.340 Ken Carlile: Now sure there's going to be instances where your metadata doesn't fit in memory, but. 636 01:32:51.930 --> 01:33:03.480 Ken Carlile: If you can get that with one request as opposed to requests and seeks and requests and seeks all over the file system you're going to benefit from it yeah. 637 01:33:04.410 --> 01:33:23.040 Bill Katz: That that does bring up again the case of immutable versus mutable data on the on the disk right if the metadata is changing um, then you might want to have that where the metadata is actually changing on the on the smaller units that you're actually potentially modifying. 638 01:33:25.710 --> 01:33:29.070 Bill Katz: I mean we guess where it's essentially like almost like a caching layer right where. 639 01:33:29.310 --> 01:33:36.000 Ken Carlile: yeah and you would run into those problems where you have multiple writers and they all need to update the Meta data file so yeah it. 640 01:33:37.500 --> 01:33:47.460 Bill Katz: Was I was actually i'm sort of experimenting with the notion of using almost a log structure for the metadata, and this was particularly in the case of. 641 01:33:48.480 --> 01:33:59.070 Bill Katz: When I had certain aspects of the metadata that was changing, and I wanted to keep a log of it like um you know, like when Max labels change or something like that. 642 01:33:59.640 --> 01:34:09.060 Bill Katz: And the notion that I would just be able to append as opposed to rewriting the entire metadata file like an under certain circumstances that makes. 643 01:34:09.420 --> 01:34:18.060 Bill Katz: That that that's an interesting concept and you also get sort of like a permanent record of how the metadata changes if you're not making that many rights. 644 01:34:19.020 --> 01:34:29.790 Ken Carlile: My question about that is at what point does it become does it just start making more sense to actually use a real database to store your metadata instead of relying on the file system as a database. 645 01:34:30.660 --> 01:34:34.650 Bill Katz: yeah yeah I mean that's a log is a very simple. 646 01:34:36.030 --> 01:34:40.800 Bill Katz: You know, a particular way of doing the database in a very simple way but um. 647 01:34:43.230 --> 01:34:58.170 Josh Moore: yeah I mean there are, I think, Chris is still around I mean Aaron will know, so you know, there are experiments with tile db for some of this as well you know so it's there are formats that are built for doing it, you know I I would. 648 01:34:59.310 --> 01:35:12.120 Josh Moore: They have their own issues directly related to you know how long do you let someone just keep adding and adding to the log before you say no, you must actually go to a vacuum and clean up what you, you know the mess you just made so. 649 01:35:12.480 --> 01:35:16.350 Josh Moore: You know, every solution will be proposed will have its limits, either way. 650 01:35:17.700 --> 01:35:19.470 Josh Moore: And so we'll try to come up with you know. 651 01:35:20.550 --> 01:35:35.130 Josh Moore: Recommendations for the user on how they should turn the knobs spot you know, there are people there will be people and soul failed as one of them who will take everything to the the most interesting limit to get things done, and we have to make that be possible as well, so. 652 01:35:35.940 --> 01:35:36.810 Bill Katz: I don't envy you. 653 01:35:37.470 --> 01:35:37.920 Good luck. 654 01:35:38.970 --> 01:35:48.480 Bill Katz: in the sense that that the law, a lot probably it gives you some notion of provenance which we don't really cover much within these things. 655 01:35:48.930 --> 01:35:57.630 Bill Katz: it's sort of like you write it out, once you get the data it's there it's static when you start talking about more of these in place rights or rights in general. 656 01:35:59.100 --> 01:36:12.510 Bill Katz: Is there any notion about you know the provenance of things that have changed over time, or what happens if there's conflicts or if there's a race condition and two people right chunks at the same time, which one wins do we know which one long. 657 01:36:16.530 --> 01:36:21.870 Josh Moore: yeah and we're not having the alternative is then to build that into the specifications themselves yeah. 658 01:36:22.530 --> 01:36:36.240 Bill Katz: And it's definitely true that I think some use cases don't require provenance at all it's just going to slow things down and make things more complicated and other use cases I could certainly see that to be very useful. 659 01:36:40.110 --> 01:36:41.190 Josh Moore: going to say something to you yeah. 660 01:36:41.580 --> 01:36:43.170 Jamie Sherman: Well, it was kind of curious so. 661 01:36:45.120 --> 01:36:55.620 Jamie Sherman: How much do we think of the only means our spec or the armies are as a reading container versus a writing container because those are I mean, I understand that both are. 662 01:36:56.700 --> 01:37:01.740 Jamie Sherman: Viable but in terms of like instrument acquisition often what happens is that. 663 01:37:02.580 --> 01:37:09.270 Jamie Sherman: You know, like like if we want instrument manufacturers to actually implement direct own these are writers, for example, right. 664 01:37:09.900 --> 01:37:18.960 Jamie Sherman: Like often what happens is that a selling point for them is how fast their instruments actually acquire data and how fast they slam it to disk. 665 01:37:19.560 --> 01:37:38.730 Jamie Sherman: And so often what they'll do is they'll just cash it locally into a temporary file structure and then, when it gets written out into their proprietary format it's then reformatted behind the scenes before you even see it into that finalized structure, so I guess what i'm wondering is. 666 01:37:40.110 --> 01:37:40.350 Jamie Sherman: For. 667 01:37:41.610 --> 01:37:50.160 Jamie Sherman: For the writing issues that come up and i'm not sure that this actually covers the metadata directly but, like in general, does it make more sense to say you know. 668 01:37:50.580 --> 01:38:08.460 Jamie Sherman: This is the end and file, you might use an intermediary of some form to get into this, but this is what we want to see because 90% of our users are going to be using read access i'm just asking i'm not yeah i'm curious what people have to say on that one. 669 01:38:12.990 --> 01:38:13.500 Anyone. 670 01:38:14.730 --> 01:38:21.480 Bill Katz: I mean, from my perspective, the at least from the fly am side you know, when I look at at a. 671 01:38:22.620 --> 01:38:31.920 Bill Katz: system like Czar four and five, I think the primary use case, the reason why stuff like that was written is for parallel rights. 672 01:38:32.790 --> 01:38:45.840 Bill Katz: The ability to be able to quickly move data into a whole series of files and then to do it in a clustered kind of way where where you can do it now, the thing is that if. 673 01:38:46.350 --> 01:38:56.340 Bill Katz: In my mind if i'm you know and it's also different between like talking about mutations that are doing in place rights versus doing ingestion to me Those are two separate processes. 674 01:38:56.730 --> 01:39:04.050 Bill Katz: In one in the ingestion case there's a lot of control that can be given to the to the systems that are doing the writing. 675 01:39:04.500 --> 01:39:15.600 Bill Katz: That you probably have control over how you're subdividing the writing such that you could potentially use a shortened system right you wouldn't necessarily have to have like a. 676 01:39:15.930 --> 01:39:30.000 Bill Katz: fairly small chunk per file size system, you could sort of like say this whole sub volume goes to this chart file, you know, and you paralyze it along the shard files not along the the the chunks and so to me it's kind of like. 677 01:39:31.140 --> 01:39:39.750 Bill Katz: One of the real benefits of an n five or is is our is when you have more of a very highly parallel and maybe you don't. 678 01:39:40.740 --> 01:40:00.660 Bill Katz: Really, you can't really constrain it that much, but at least you can kind of say that you won't get into too many race conditions per object per chunk profile that the granularity of your locking is such that it's it's fairly granular compared to a shortened file arm. 679 01:40:01.860 --> 01:40:12.450 Bill Katz: And so, at least from our perspective, you know we do massive ingestion so that's like huge amounts, and it is similar to what you're saying we do a lot of alignment and then we just dump. 680 01:40:13.470 --> 01:40:28.260 Bill Katz: And then we sort of have a notion, but after that there's a lot of churn in terms of in you know, like what would be in place rights where we're modifying data, but we tend to try not to actually affect we use databases. 681 01:40:29.430 --> 01:40:39.750 Bill Katz: We don't we don't do file stuff because at that point, you know we're trying to keep the the bulk of the data relatively static and we're we're modifying the stuff on top, the layers on top. 682 01:40:44.610 --> 01:40:52.260 Lee Kamentsky: For us, and we use these komatsu cameras which they would never implement oh means or. 683 01:40:53.280 --> 01:41:06.690 Lee Kamentsky: Just there's no way, and so it is yeah it is and we are writing 6464 by 64 chunks within our massively parallel processes. 684 01:41:07.800 --> 01:41:08.670 Lee Kamentsky: So yeah. 685 01:41:10.530 --> 01:41:15.210 Lee Kamentsky: I really wouldn't envision any microscope manufacturer. 686 01:41:15.900 --> 01:41:18.630 Ken Carlile: yeah I, I think that. 687 01:41:19.650 --> 01:41:26.370 Ken Carlile: It is in extremely unsuited to data dumps off of an instrument. 688 01:41:28.950 --> 01:41:48.330 Ken Carlile: Because of that latency of dealing with all the files, with an instrument you're pretty much going to be getting a single stream anyway, so you don't want to spend that compute time and network time and latency on transferring the data, because you know you'll get. 689 01:41:50.850 --> 01:41:52.800 Ken Carlile: 20 megs a second hundred meg's a second. 690 01:41:54.150 --> 01:41:56.400 Ken Carlile: Writing Azhar out from a single machine. 691 01:41:58.320 --> 01:42:03.180 Ken Carlile: As opposed to you know potential of like four gigs a second writing it. 692 01:42:04.710 --> 01:42:05.700 Ken Carlile: As a single stream. 693 01:42:07.020 --> 01:42:09.090 Ken Carlile: Assuming you've got the storage and the network to do that. 694 01:42:15.810 --> 01:42:16.470 Josh Moore: anyone else. 695 01:42:17.910 --> 01:42:28.950 Jamie Sherman: Those points or take away I get your points, but I would say, like so I worked at that size for sorry science for a while, in the mass spec space and they actually did make an effort to implement the. 696 01:42:29.460 --> 01:42:40.140 Jamie Sherman: open standard to right, and there is it's always an ongoing conversation, so if it is like it isn't such a far fetched thing and and I like I. 697 01:42:41.340 --> 01:42:47.490 Jamie Sherman: Like it is possible to get them on board i'm not saying it's going to happen quickly or easily but and. 698 01:42:48.510 --> 01:42:59.460 Jamie Sherman: In general, I would say it may not be the it may not be the format which the instrument requires the data immediately internally, but it may be the format, the are allowed to. 699 01:43:00.090 --> 01:43:09.510 Jamie Sherman: That you see when you actually open your files to view them in their in their browsers as opposed to an instrument control software, for example. 700 01:43:09.900 --> 01:43:11.730 Davis Bennett: yeah that is a. 701 01:43:12.120 --> 01:43:22.410 Davis Bennett: Personal aside like I think my cross campus who buy these expensive cameras exert zero pressure on the manufacturers for on the software side and they really could. 702 01:43:23.160 --> 01:43:38.880 Davis Bennett: could say, like the orca flash we're buying 80% of those per year, you can come to us with better software otherwise we'll we'll buy from indoor and so like it's on the p is and the customers with the buying power to actually talk to have a Matsu and make demands that's my two cents. 703 01:43:39.810 --> 01:43:48.150 Josh Moore: yeah I mean, I think, as a as a part of this, we will try to organize it's not exactly a union, but you know we will try to do some of this organization and. 704 01:43:48.420 --> 01:43:56.850 Josh Moore: give everyone material that they can give their vendors and say you know, this is what I wanted it needs to be a PDF that's easy and you just kind of hand it over um. 705 01:43:57.090 --> 01:44:11.460 Bill Katz: That there is an interesting aspect here also about some of the new standards that are coming out even on disk storage so in particular i'm thinking of the SI P P value a you know storage mechanisms that. 706 01:44:11.520 --> 01:44:13.950 Ken Carlile: The new envy me oh bill bill. 707 01:44:13.980 --> 01:44:15.240 Bill Katz: yeah well, I mean. 708 01:44:17.070 --> 01:44:21.030 Ken Carlile: The shipping product bill there is never been a shipping product. 709 01:44:21.570 --> 01:44:22.680 They keep on trying to do. 710 01:44:23.820 --> 01:44:29.400 Bill Katz: Okay yeah I totally can but it's still something that we probably should make a note about and just keep keep a prize. 711 01:44:29.730 --> 01:44:38.340 Bill Katz: So one of the problems, was that this actually occurred with Seagate kinetic it was a long time ago right they tried to get it never went anywhere. 712 01:44:38.670 --> 01:44:46.500 Bill Katz: But the notion was to cut out this whole posits file system layer and just go directly to key value and actually implement this on the storage side. 713 01:44:46.890 --> 01:44:52.020 Bill Katz: But the reason why I find it fascinating and just kind of interesting to keep in the background here. 714 01:44:52.470 --> 01:45:10.980 Bill Katz: Is that a lot of the stuff that we're talking about in terms of n five Czar and the very small chunk sizes and stuff like that could be extremely expedited if you actually wound up with a industry standard about dealing directly with key value kind of storage mechanisms. 715 01:45:12.420 --> 01:45:17.310 Bill Katz: And and there you know, like there's at least one group within um. 716 01:45:18.240 --> 01:45:31.380 Bill Katz: I forget what the company is but they're there to Sheba maybe or I can remember, but they basically have a team they're talking about it constantly they just recently adopted the standards in the SI P, I think, as of 2020. 717 01:45:31.860 --> 01:45:38.430 Bill Katz: So, whereas before with the kinetic stuff there really was no general arm, you know. 718 01:45:39.150 --> 01:45:47.040 Bill Katz: Industry standard, there is an industry standard now whether that actually translates to storage devices that implemented. 719 01:45:47.400 --> 01:45:55.620 Bill Katz: As far as I know, there's only been like a couple and I can't get my hands on it's like like Ken is saying he's absolutely right, but it's something to keep keep keep in mind. 720 01:45:58.260 --> 01:46:11.310 Christian Tischer: May I ask, I mean like this notion that is all me Czar would be slow I don't know, I think it depends on the chunk size or so, if I mean if if I say one camera image is one chunk. 721 01:46:12.660 --> 01:46:16.020 Christian Tischer: I mean I don't know how can I be any faster. 722 01:46:16.650 --> 01:46:30.540 Josh Moore: yeah I mean I think if those are your streams if we get if if we give the end user, the right dimensions, so they can say so, the vendors, you know I have five streams here in my streams, you know. 723 01:46:32.130 --> 01:46:38.520 Josh Moore: that's not going to be very usable, for I think most of us, so they will need to be a process in the middle, but I think. 724 01:46:38.910 --> 01:46:52.590 Christian Tischer: Then you you I mean I think like depending on the access patterns right like there's anyway, no perfect changqing, but I think that can be tracking that's very good for superfast writing from a camera so I don't think it's. 725 01:46:53.790 --> 01:46:56.970 Christian Tischer: The file format is file, for that would be my point. 726 01:46:57.870 --> 01:47:11.430 Bill Katz: yeah and we're also seeing things with like, even the envy me s DS right the queue sizes are really large now you're able to get a lot of parallel rights, even for for small you know things I guess the real issue is. 727 01:47:12.750 --> 01:47:15.120 Bill Katz: Also sequential versus random rights. 728 01:47:18.270 --> 01:47:23.940 Bill Katz: At some point, it all gets abstracted to to the storage system that you're doing right the interfaces. 729 01:47:25.200 --> 01:47:25.620 Bill Katz: But the. 730 01:47:25.710 --> 01:47:26.820 Josh Moore: details are important. 731 01:47:27.240 --> 01:47:42.930 Josh Moore: So i'm going to need a few minutes to screen share a few thoughts, so we didn't get into the depth on any of this but I don't want to let you leave without some people have already left does anyone else have any last thoughts comments questions wishes desires. 732 01:47:45.420 --> 01:47:46.290 Josh Moore: Revelations. 733 01:47:50.850 --> 01:47:57.720 Josh Moore: Seeing if there's anyone who we haven't heard from yet okay i'll do this quickly, and then we can all get out of here. 734 01:48:00.360 --> 01:48:02.100 Josh Moore: I assume I have food somewhere waiting on me. 735 01:48:05.340 --> 01:48:06.630 Josh Moore: Okay, so. 736 01:48:08.370 --> 01:48:20.190 Josh Moore: This morning I posed this more as a question and kind of built it up and i'll just kind of say what we did what we talked through and there was a general consensus on so I guess i'm looking for vetoes, which is um. 737 01:48:20.940 --> 01:48:34.290 Josh Moore: Today, what we've done largely all is post something on image sc when we're starting a new specification, and this is the one for labels, you know I posted something. 738 01:48:35.370 --> 01:48:47.520 Josh Moore: I haven't had a ton of feedback on the last two specifications um the first one, there is a good deal of back and forth on the Multi scales, but then it's kind of tapered off. 739 01:48:48.090 --> 01:48:55.560 Josh Moore: um and and keeping the github issues in sync with the image as C is kind of a pain actually you know so. 740 01:48:56.190 --> 01:49:05.400 Josh Moore: um, so I think the breakdown is for everyone who's not doing implementations and really into the details, you know image sc is the Community. 741 01:49:05.700 --> 01:49:18.870 Josh Moore: location to discuss all of this, it is you know it's a platform where anyone can ask a question um but, when it really comes to getting work done, I would propose that we all do, that via this n G F F repository. 742 01:49:19.890 --> 01:49:27.840 Josh Moore: um, this is a github project board kind of like trial or whatever I don't know if everyone's used them before. 743 01:49:28.230 --> 01:49:41.460 Josh Moore: it's based roughly on the github roadmap they've actually done quite a nice job of getting everything that they are working on as a project into one board separated, you know by date it goes. 744 01:49:42.420 --> 01:49:52.290 Josh Moore: Then they do the the newer things are to the to the right, I think I didn't newer things to left, but whatever we can rearrange it um so that. 745 01:49:53.070 --> 01:50:10.080 Josh Moore: The process would be and I will, after the meeting go through the notes and see anyone who was, who seemed to be interested in working on a specification the process would be set up a issue so create an issue on this end je FF repository. 746 01:50:11.280 --> 01:50:23.880 Josh Moore: will get it on the roadmap whoever's running the specification will be responsible for the date so you know if you're driving it you're saying, I want to get this into the into the standard by. 747 01:50:24.900 --> 01:50:28.290 Josh Moore: middle of the year end of the year um. 748 01:50:29.490 --> 01:50:47.370 Josh Moore: Because will almost certainly have synchronization issues as more people start implementing things it'll get tricky that's a good that's a good problem to have and we'll need to come up with ways of discussing that I think there are two ways I can think of running those discussions. 749 01:50:48.510 --> 01:50:52.890 Josh Moore: I mean, obviously we can keep talking on images see but, if we want more active. 750 01:50:54.180 --> 01:51:02.670 Josh Moore: Low Level Technical discussions it's either to turn on github discussions here or there is a. 751 01:51:04.230 --> 01:51:05.580 Josh Moore: back door. 752 01:51:08.040 --> 01:51:17.610 Josh Moore: Sorry, someone called it a speak easy proposal to use this julep i'll leave that to you, whether or not you want to get involved in. 753 01:51:18.540 --> 01:51:32.010 Josh Moore: dark corner back of the alleyway kind of discussions we'll see what happens, but likely will need a good place that we can chat it up these things that's coming need to figure it out um. 754 01:51:32.970 --> 01:51:47.460 Josh Moore: Something else that will need managing will be the various implementations so currently there's really no bar how many of the existing oh I means our implementations must must support. 755 01:51:48.600 --> 01:51:52.290 Josh Moore: A specification before it can be merged right. 756 01:51:53.760 --> 01:51:54.720 Josh Moore: um. 757 01:51:56.010 --> 01:52:04.050 Josh Moore: We will probably put a test suite of some form in place and then define some of the test is optional, and some of them is required. 758 01:52:04.470 --> 01:52:18.660 Josh Moore: But this will be something that, over time, we will need to get more serious about so that we know that we're not just making our lives absolutely miserable right so as people change different parts of the specifications what happens um. 759 01:52:19.950 --> 01:52:30.810 Josh Moore: I think that's the fast fast break down, we have five minutes left so you would have any thoughts or questions on how that would work or anyone who's interested in doing it, and once more details. 760 01:52:47.880 --> 01:52:49.230 Josh Moore: This morning was much more. 761 01:52:50.700 --> 01:52:59.010 Josh Moore: Maybe I was, I was, I was not as tired, as I grab three or four people I was really excited oh how people are actually going to do this, one issue got created so. 762 01:53:02.640 --> 01:53:07.230 Josh Moore: Okay, I will, I will make sure that issues get created that we've talked about. 763 01:53:08.250 --> 01:53:16.950 Josh Moore: i'll try to find me so anyone who's kind of talked about something I will probably see see their name and if you'd like to take one you know just say the word and. 764 01:53:20.310 --> 01:53:29.850 Josh Moore: yeah it's really a process of finding and that's what I kind of tried to do throughout this meeting for any given topic find the examples that are out there. 765 01:53:31.080 --> 01:53:35.100 Josh Moore: You know, basically way the the various costs and benefits. 766 01:53:36.360 --> 01:53:44.850 Josh Moore: Help the Community to make a decision, and then get into the spec i'm not too bad and then I i'm pretty sure i've had a good. 767 01:53:47.550 --> 01:53:52.020 Josh Moore: Good response from each of the implementation maintainer is to implement the spec so. 768 01:53:52.470 --> 01:54:02.100 Josh Moore: This isn't about whoever comes up with a specification must write all of the implementations you have people like Trevor who's left so i'm going to you know say he'll do some of this work um. 769 01:54:03.000 --> 01:54:19.380 Josh Moore: You know, certainly, the only team on the army's our pie repository the Java implementations will get the implementations done, but we need more people driving these conversations so everyone's obviously welcome have a thought, if you, you know, want to do. 770 01:54:21.270 --> 01:54:24.000 Josh Moore: These blind warps or whatnot you know how about it. 771 01:54:26.460 --> 01:54:28.740 Josh Moore: cool anyone else have any final thoughts. 772 01:54:30.870 --> 01:54:31.290 Davis Bennett: Thank you. 773 01:54:32.370 --> 01:54:33.150 Josh Moore: Always gladly. 774 01:54:36.090 --> 01:54:38.520 Josh Moore: All right, take care, everyone have a good night night or day. 775 01:54:39.270 --> 01:54:39.840 Christian Tischer: Thank you. 776 01:54:47.670 --> 01:54:48.180 Volker Hilsenstein: Thanks. 777 01:54:49.470 --> 01:54:49.800 Take care.