Name: OpenAI Codex Lab: Computer use, security, and new capabilities
Uploaded: 2026-05-19T18:17:47.795Z
Duration: 45 min 10 s
Description: OpenAI Codex Lab: Computer use, security, and new capabilities

Transcript for "OpenAI Codex Lab: Computer use, security, and new capabilities": Alright. Let's do this. Hey, everyone. Good morning and welcome to Codex Lab. I'm Sid from the startup marketing team. Joining me today to lead our demos is Derek. Hey, folks. I'm Derek. I'm a member of the codecs deployment engineering team here. So the goal for codecs lab, as always, is to help you level up with codecs and get more leverage out of codecs with real technical demos. Our goal for the session is that you walk away with at least a couple of practical ideas and workflows that you can go and implement right away. So joining us today, we have hundreds of builders from around the world. We have a lot of start ups, founders, founding engineers. And so to really, you know, make this session by builders for builders, we're embracing that energy and and putting on lab coats here today. So this is a really, demo heavy session. We asked you folks what you'd like to see, and the common themes that we always get, when driving registrations was you wanted to see what's new in codecs, what's becoming possible with the latest model, and codecs app releases. So we'll touch upon that, show you some demos around computer and browser use and how it can help with real world, app development end to end. Codec security is another new thing that we recently released, so we'll talk about that as well. And through the session, really, our goal is to, like, show you how you can take advantage of some of the tools in codecs like plugins and automations and really get to a point where you can accelerate all your development and day to day workflows. We also set some time at the end for live q and a. We have Brian from the codex team who is monitoring live chat. So thank you, Brian, for, partnering on this. Keep your questions coming through the session. For the first thirty minutes or so as we go through the demos, Brian will continue answering these questions on the chat. And at the end, we'll set aside fifteen minutes, to, you know, answer a few of these questions live for the benefit of everyone. So with that said and done, I'm really excited to do this session today because I think codecs is really having a moment. Right? So in the past few weeks, we've really seen, tremendous user growth. Last month, we announced that we crossed 3,000,000 weekly active users. And just a couple of weeks later, we crossed 4,000,000 weekly active users. So it was pretty crazy to see that kind of traction. And for folks who've been following along, on x, I think, Tivo, who's the head of Codex, has become, you know, a fan favorite as the giver of tokens and resetter of rate limits. We've, promised to reset rate limits every time we cross the next million, user, milestone on our journey to 10,000,000 active users. And so it's not been very surprising, honestly, to see this growth because we've been releasing a ton of new capabilities, in codecs that we show you today. And, also, with the latest model releases, I think it's both of these things in tandem are becoming really, really powerful. So it's not surprising that when we look, you know, in the ecosystem chatter, we see more and more companies moving off of cloud code to codecs. We've seen a huge spike as noted by Andreessen Horowitz, in, like, codecs installs, April early May. So just just a great moment, I think, for us where we feel like the tide is shifting, and it's a really exciting time to, build with codecs. So all of these are some of the new features that we've, you know, released in codecs. But like I said, I think it's the combination of the model improvements with g b d five five and the new image and models with a lot of the, you know, better tool use, new capabilities like the the in app browser, plugins, the mobile app. All of these things are making it so, you know, you can just do a lot more with codecs and bring it, into your day to day workflows, with with a lot more ease. And, of course, I think, before we jump into the demos, I have to say, my favorite new codex release is codex pets. This is my pet cat, Mimi, that has become my, Codex pet, which is always there, right next to me when I'm helping me monitor the status of my different threads. And also, you know, I've been missing, my cat. And so I thought, you know, what what better way than to bring it into, codecs. So, Derek will, show you quickly how you can set this up. I think it's really cool. It's a nice way to just make, using codecs a whole lot fun. You know, shipping joy is one of our values at OpenAI. Yeah. Super adorable. I wish I would have thought of making a cat my pet. Currently, I just have this miscellaneous woodland critter. Let's show you all how you might go about doing something like this. So if we get started, there is a skill that you can find online. This is open source, and we can share that with you all later. This is a hatch pet skill, pardon me, that gives codex everything it needs to know for the expertise for it to create a pet. And so what we can do here is we can start a new thread and tell Codex to hatch a pet and say, hey. I would like to create a pet that resembles my cat. Unfortunately, I do not have a cute cat right now, so let's find a adorable cat photo. I'm actually curious to see how this this guy turns out. So let's let's, let's maybe copy the link to the image and send it to codex, and let's let's go there. So I'll let codex spin away. This is leveraging ImageGen under the hood. ImageGen two, as we've seen, has been a really fun tool. I don't know if you all have had the chance to try it yet, but it's really good for everything from style transfer. So you can take a photo of yourself and maybe style transfer it to look like you're out of a video game RPG, or you can take a photo and touch it up, or create really cool content. I find this particularly useful if I want to take base reference images, maybe some logos, some reference slides, and instruct codecs to actually create really detailed images based off of that. I found that this is really good for creating potential slides or even creating front end components and then having codecs adhere to those images and build it. We'll show that a little bit in a demo later today. So this might actually take a couple minutes to run as codex really goes through and builds all of the right components. So I can show you how this looked when I ran it last night. And so you can see here that it spun up this woodland critter, and it actually creates multiple sprite sheets. And so you can see that when I mouse over my little pet in the bottom right hand corner, it does a little dance, adorable. And if I were to ask Codex a task and then be working on another tab, the pet will say, hey. You know, I finished on this task. This is ready for your attention. As we build smarter and smarter models, we found that Codex is able to work autonomously for exponentially longer periods of time. And this is really good for our productivity, but it's also a little frustrating if you have to pull repeatedly. The same way if you have a, an application that has to pull another application, it burns excess cycles versus if it notifies you when it's done, it's a really easy way for you to jump in and check out that thread and pick up where it left off. So I would love to see all of the pets that you all create. Please share them with us. It's been a really fun moment checking out all the cool things that people are creating. That brings us to a little bit about GPT five five itself, which is an our newest model that really raises the bar in the industry. So with five four, we have focused a lot on various aspects of coding. But with five five, we really took it to the next level, and five five is tailor made for tackling complex problems autonomously for long periods of time. If you specify and remove any sense of ambiguity, codex is then enabled to go and operate for very long periods of time, check its own work, iterate if it finds that it made a mistake, and really go from there until it finishes the task and hands it back off to you. This is really important for those of you who are working in very complex, meaty code bases and really want to drive real enterprise value and not be focused so much on the fun and greenfield, demo type of coding exercises. And we've also reliably improved our tool use and execution. We really wanna make sure that it's efficient, and so we've spent a lot of time driving codecs to accomplish the same level of tasks with less tokens and less time, in addition to also being able to accomplish even harder and harder tasks. One area that this really drove improvements on is actually around model and patience. So with five four, we heard a feedback from a lot of you that the model would sometimes quit a little bit early, and we looked into this and found that this was a valid feedback. And we realized that in part of our training pipeline, the model was instructed to act under a time pressure. And this is because we're constantly trying to make the models as efficient as we can for you all. But the funny side effect of this is that in production, even when the model was not acting under time pressure, it had that innate sense of, like, oh, I'm I have a a running clock, and I need to deliver quickly for you. And so the model, after ten minutes of maybe monitoring a long running background task, would say, hey. Let me know when that task is done. And you kinda laugh and say, no. That's your job to tell me. And so with five five, we fixed that issue, and combined with the increases in coding capability, it really unlocks a lot of possibilities about autonomous execution. And so I wanted to show you all today a little demo about how that might look. So here we have our open source repo symphony. This is a project that allows you to orchestrate codecs at large scale. So internally, we had some teams that were working on a variety of internal productivity tools, and they took upon a challenge to see how far could they push the limits of AI, and could they feasibly build a 100% of their code with codex. I don't have the time to go into all the greatness that that team learned, but we do have a couple blogs that I strongly recommend you all check out. The first one is called Harness Engineering, and Sid will share some of the links with you all in chat. This goes into the details of how do you actually build the harness and the scaffolding around your code base to allow codecs to run autonomously for difficult tasks and really bring in all that tribal knowledge that you and your team have and making sure that codecs operates the same way that an experienced developer would on your team. Recently, at the April, we released a second blog about Symphony, which I'll be showing you today. This is a spec for codex orchestration. That internal team found that with the harness engineering blog, they got to the point where codex was writing 100% of their code, but they were limited as humans on the notion of context switching. They were able to comfort comfortably drive maybe three to five tasks at a time, but found that beyond that, it was too much to keep switching between tasks. And Symphony was their way of building an orchestration layer that brings in everything from the issue tracking of linear, GitHub for code control, CI for testing, and really bring that all together. So an example of how that might look here is that I have this internal demo repo, and this maps to this application that I had codex build for me yesterday. And this is a mock furniture store, has a variety of cool furniture that I might wanna buy for my office. And with this linear tracker, I have a variety of issues that I can actually create and build for codex. And so here, you can see that we finished one yesterday, and this was a detailed task. And then Symfony would go and pick up the task and actually create a PR. I disabled locally because I was running this in parallel, but Symphony can even go so far as to spin up a browser and record itself clicking around with browser and computer use, and then it will actually give you as a human video proof that it accomplished the task that it was set out to do, which I find particularly exciting because sometimes those thousand line PRs are a little bit too much, and being able to go and actually, view the video is a quick way to make sure everything's working before you dive into those hot spots in the code. So let's see how that might look today. And, unfortunately, codecs is getting too good. I've been getting increasingly lazy, so I don't actually want to create a linear ticket myself right now. Let's go to codex and actually tell it, hey. Can you create a ticket in linear for a task for Symphony to accomplish? And one thing you'll notice there is that I have a lot of typos. I've actually gotten so lazy. I don't even type my prompts to codecs anymore. If you see here this this microphone, dictation has become my preferred method. And if you go to the floor where most of the codecs engineering team sits, you'll hear a lot of people talking into their mics. It's just become a really powerful way to give codecs all of that information in a prompt so that you can have that, lack of ambiguity and codecs can operate successfully for you. We even love it so much that within codecs, there's a setting now where you can set a computer wide dictation hotkey. So in any other app you're using, you can hold down for me, I have the right option button. I can say, hey. How are you doing today? And because I'm in the app, it automatically links to that. But any other app like my notes, my Slack, it'll also pull in that same transcription and, whisper flow like experience. So really enjoying that. So if we go here, let's see what Codex is actually doing right now. It is checking its connectors and getting everything going. And so while it's spinning that up, Symphony runs through a variety of statuses, and, like, there's a backlog. And once you have a ticket that you're happy with, you can put it in the to do mode. And once you have to do, it'll oh, okay. Codec decided to show off today, and it wants to use computer use right now to show this to you. I was actually hoping to show you all computer use in the next demo, but Codex wanted to jump the gun. As you can see here, it's actually clicking around. That ghost cursor, I'm not touching. Look, ma, no hands. You can see I'm not actually touching my laptop at all, and it's going about and building this issue for me. And Codecs knows that with Symfony, to do is the right status for Symfony to pick up that ticket. So you can have your backlog, promote something to to do when it's ready, and then Symfony will automatically pick it up. And so we can go here. Let's see if it's smart enough to select the right project. It is. Okay. And then from there, Symphony will start to execute on the task. We really want to ensure that the human is in the loop. At OpenAI, every single line of our code is reviewed by an engineer before it goes to production. That's why this human review step is so important. There is no fully autonomous software development life cycle at OpenAI, and we really believe that this is important for the current level of software development. We'll see here, this is in to do, and very soon oh, lovely. Symphony decided to fall asleep on me. Can you spin a Symphony back up? I see local host four thousand is down. And the demo gods seem to be playing tricks on me today. Thank you all for bearing with me. As you can see here in the bottom right hand corner, I guess our cat inspired pet is still going. I'm glad we moved on from that. But one of the great things about codex is I kind of lose sense of how long things take to build. The same way Bill Gates, when asked how much a gallon of milk costs, he doesn't really know. Because when it runs autonomously, I don't need to know. For the most part, I can delegate these tasks to codex. And as the models get better and better, we're really able to push things to the limit. And so here's Symphony spinning up. This is our open source dashboard. You can build this directly, have Codex point to this open source repo and build it. It'll look just like this. I've seen internally folks add a lot of visual pizzazz and make this super exciting. But for now, this is our bare bones out of the box default for you all. It's saying that it does not have our linear API key. Okay. I'm not gonna copy paste that right now. I don't want any issues with exposing API keys. But what will happen is codex will pick this up, and they'll execute on it. And that's why I already have some of these up in case, issues are needed. Codex will then give me the code for human review. And as I mentioned, when it's done, it'll have those PRs that it'll have the validation. It'll show this work and proof, and that's something that we really enjoy. So we've kind of let the cat out of the bag, but computer use and the mobile app are a really fun dynamic duo. I think my mobile app might be restarting right now as well, but I'll just show computer use again live. So if in Slack, actually, Sid just sent me a message, and I'm kinda busy. Ideally, I'm at my kid's soccer game, pretend I'm doing this with my phone. And I can go in here, and I can tell Codex, hey. Check Slack. I just got a message from Sidharth Kumar. Can you check my latest message from him and carry out the task of whatever he's asking me to do? Feel free to use computer use and and Google Chrome, and then send him a message to let him know when we're done. As you can see, that's a lot easier for me than actually typing all this out, and that allows us to send all of the right ingredients to codex, which is, what is the goal? Is there any relevant context that it needs to know pointed in the right direction? Give it any constraints on what it should and should not do. This helps fence it in and add in those guardrails. Of course, agents. Md is a great way to codify long running guardrails, but maybe for the specific task you have something else that you want to mention. And then lastly, give it a very clear criteria of when the task is done, and then give it a way to make sure that it checks its own work. And that's the most powerful aspect is if codex can actually check its own work. Let's let codex use Slack today. If codex can check its own work, it can iterate. And even if it doesn't get something right the first time, it can keep going from there. So I gave Codex the latest thing on Slack. He's asking me to go on, like, the latest suite by OpenAI developers, and then it's asking for confirmation because it doesn't want to go and perform a public reputational action. Good guy, Codex, thinking about my very esteemed reputation. So I'm gonna tell it to go ahead. And what this should do is this should go and start leveraging computer use in my browser to accomplish that. And let's see if it does. So I actually have a lot of fun telling codecs to do something with browser use, then opening the browser and watching along. By default, Codex, it will actually run-in the background and you don't need to be looking at the same time, so you can continue your other task. This is something that I'm gonna have to actually work with the team to update Codex's skill. It likes to overwrite whatever tab you're actively on sometimes. I'd say this happens about one out of 10 times and I always go and try it and say, hey, Codex, I was working on that, can you please give me back my tab? And it'll go and apologize and give that back to me. And so in the middle of a demo, a live demo. Yeah. Yeah. Yeah. So thank you, Codecs, for showing off computer use once again. Codecs. You know, I hate typing. Codecs, we are in the middle of a live demo. Why did you take away my slides? I was in the middle of presenting those. Can you please give them back to me? And so it sent a message to Sid, annotated to let him know it's from ChatGPT in case there's any faux pause that ChatGPT makes or or Codex makes. And now it'll go back and switch my slides back. Silly Codex. We'll see if it's able to actually do this quickly. Otherwise, we'll take over. But it really feels like a lightning in the bottle moment to have computers and to actually have the mobile app, which we released recently, as a way to drive threads on your laptop. So even if you are off at your kid's soccer game and you have something to do, you can tell Codex to spin off a task on your laptop, but even you can have it go and use computers to actually go I'm impatient. We can have Codex use computers to go and click around and do something for you. So it really changes the game, and I'm really excited to see what you all build with it. So far, I've been having a lot of fun playing around with it, but it's actually been very useful for some to dos as well. That was a fun little demo because I can't show anything confidential, but I have had real life examples where a colleague has asked me to do something on Slack. I delegate it to codex, and codex carries it out as if it's my own little AI executive assistant. This is a really powerful paradigm, and using codex for more than just coding has been a huge explosion of value for us at OpenAI. So we showed off the mobile app. We showed off computers, and I wanna show how that might be a little bit more realistic for the software development life cycle. So let's go here, and I have a prompt prepared just to make sure we get it right. And we can tell Codex to use that image gen tool that we've been hyping up so far and a front end app builder skill to create a black hole simulator in my browser. And I had to ask to make sure that it's well optimized. I've had some issues with it building this really cool fancy simulator that blows up my laptop CPU. And for those of you who have not tried yet, we have a lot of other variety of features that we can check out while this cooks. We have slash side, and so I can go here and start a side chat. What this does is it creates a like, fork of whatever the thread was when I created the side chat, and I can actually ask questions like, hey, what's going on? Where are we right now? And I don't impact the codex thread. I don't add any context drop. I can go here and say, Hey, what does that front end app builder skill do? We can go and ask Codex questions, make it do stuff. It can work in parallel with the main thread while that main thread is spinning away. This is an example that really shows everything coming together. 5.5 has increased code quality. It increases its ability to run long running tasks. ImageGen creates this extra dimension where you can have really high quality image assets that codecs can reference as it builds something. So actually what's going on here is the front end app builder skill tells codecs to generate an ImageGen design concept before it even builds something in front end. For those of you who've tried building front end with codecs historically, you might have mentioned that sometimes it doesn't quite hit the mark. Leveraging ImageGen is a great way for you to have that fine grained control on what you want it to look like, and it's a really great way to build out that proof of concept first, and then you can point codecs at that image and say, hey. Please go and build that. So here, we have our event horizon black hole simulator mock up, and now codecs can go and start building the app that actually addresses this mock up. So in in real life, I could go and iterate, I can give it reference assets and have it build something, specifically that I have in mind, but for now this is good. I must have it keep chugging away. And actually, after it builds the app, what it's going to use is it's going to use the in app browser to actually go and click around and see, hey, is the app working the way I intended to? And this is why, as you can tell, I'm I pretty much live exclusively in the app. So I can pull up another thread that this is something else that I built with codex. This was that Symfony demo that we talked about earlier. And so having the in app browser, having the ability to go and, you know, see my terminal, review my code changes in a GitLab PR style way, open files, read file context, like, having all of this in one stop feels like a real big value unlock. And I no longer really use Codex CLI anymore because the app is just so powerful for me. So big kudos and shout out to the team who's been relentlessly releasing new features every Thursday. For a while, it was just a really cool, artifacts, but I think they've actually started to make that an official thing that every Thursday, they're gonna release something new, and Tuesday is, for quality. So I'm really excited to see what they release in a couple days from now. And the new features in the app have really made it the default for most folks that I talk to both at OpenAI and at customers as well. And so we'll let codex continue to cook here, and we'll come back to this later. So we have talked a little bit about all these exciting features of computer use and long running tasks and the mobile app, but codecs really shines for you and your specific workflow once you connect it to the data and the applications that you care about. And this is where plugins are super important. We at OpenAI have been investing really heavily in plugins since the last time we had this lab and talked to you all. We have everything from GitHub to Slack to Figma to Hugging Face to GitLab to Linear, and this is a really powerful way to give codex all of your context. And so in a demo that I will show in a little bit, we actually have a common practice internally, or not common broadly OpenAI, but a lot of folks on the engineering side are actually using codecs plugged into all of their Slacks, their emails, their calendar, and codecs will go and keep track of everything that's going on in their day to day because it's a really fast moving pace in the landscape of AI and there's too much to keep track of. Honestly, there's a lot of valuable signal, but there's also a lot of noise. Codecs does a great job of you can tell it what you care about, what to focus on, and we can use something like an automation that kicks off periodically that will show that will actually go and read all of those sources of info and consolidate them into a memory bank so that you can work with codecs as a partner like that AI executive assistant and asking it questions. So I strongly recommend that you all try to install some plugins and really take codecs to that next level for you and your work, whether that's Slack or you use Linear or you use Jira or you use Notion. We've worked really hard on bringing these all to you. If you, as a developer, have a very impactful MCP server and app and plug in that you want exposed, reach out to us. We're really happy to continue to work and expand these plug ins to bring all the industry leading software to all the millions of users of codecs. So let's see. The black hole simulator is still cooking. I guess it really wants to build something really impactful here. It looks like it's actually running some playwright scripts to make sure that everything's working. It has looks like it's actually running on a port. And so we can see here it actually had pulled this up, and it's building playwright scripts to actually automate the front end testing to make sure that that this works. So this is just a quick example of a true, like, software development end to end life cycle where you pull in the ImageGen and the improvements of 5.5 and in app browser and the computer use. It's just crazy to think of all the new features since the last time that we talked to you all. Things are only speeding up. Really take a chance to take some time, play around with the codecs new features and see how much it can actually speed up your workflow. And maybe beyond just speeding it up, it can show you new things that you couldn't do before. So we'll go through, one or two last demos here. I want a quick talk to that automations point I have. We have the ability to have long running automations reference the same chat. You can think of this as like a heartbeat where codec spins up, checks that same thread, and and picks up where it left off. And so what we have here is something internally that goes and checks all of my threads, everything that I care about. So I actually restricted this only to a subset because I didn't wanna risk sharing any confidential information. But one of the customers that we've been working with very closely, very publicly is NVIDIA. And so we can see here that we are working with NVIDIA, and this issue just popped up. And then Codex even let me know, like, hey. You are speaking today at the Codex Lab at ten. Just a heads up, make sure that you use your personal link. This is not a link that can be shared. And this is a great way for me. I don't actually keep up with all this, but whenever I have a question, I can go into this thread and ask codex, and it'll continually be spinning up periodically as an automation in this thread gathering context. It actually can leverage Obsidian and take all of these long lived notes. And you can see here a cool graph based view. In reality, if this were everything that you're working on, it'd be much more complex. So it's a very cool way to go and keep track of all the noise that is in your very, very busy lives. So last quick demo from us is going to be around security. We talked a little bit last lap about the the code review process, and we really want to make sure that we're using AI to accelerate the whole software development life cycle, not just the code generation. Otherwise, it becomes bottlenecks in other parts. So we talked a little bit about codex code review. This is something that we have running on every PR that we push. We can also have slash review as a command that you can run locally and a steerable way to review code for the aspects that you care about. Codec security is our approach at actually having codecs review and build threat models of your code. And so let's get started on kicking that off so we can go and talk through it while it's running. So we have a open source codec security plugin that the team has actually spent a lot of time hardening. We've already spent millions of dollars running evals, hardening, partnering with our security engineers, with our security researchers, and really making sure that this is a first class product that we can give you all. And so I would really recommend that you use something like this rather than a homegrown solution because we really have it tightly fit to the capabilities that the model and harness are capable of today. And what this will do is it will actually go and use a variety of skills to create a mental model of a threat model of your repo. It will then go and scan PRs and changes and diffs for any potential vulnerabilities mapping to that threat model. And before it actually surfaces that vulnerability to you as the end user, it'll go and spit up an environment to actually try to exploit that vulnerability. And only when it has an exploitable vulnerability will it then notify you and allow you to take action on it. Because we all know security is a very noisy landscape, and we don't wanna erase any false positives. So you can go here, and it has a threat model. And then we can go and ask it to scan the diff and look for security issues. And I worked with Codex to actually, ahead of time, build in a security vulnerability. By default, Codex actually really hates to do this, and I it took a lot of prodding for me to say, hey. Please, I know this is intentional. Create a security vulnerability, but Codex, we've trained really hard to create safe, reliable code for you all. And so this is gonna go and check the latest diff, and it'll go from there. I think in the interest of time, I actually am going to leave this as an exercise for you all as homework. We have that open source plug in. Pull it down, run it locally. It's free. Give it a shot and let us know if it well, don't let us know. You can reach out to us privately, but don't publicly let anyone know if there's any vulnerabilities that you found here. So with that, we'd love to go and actually talk to any questions live. Great work, Derek. I think that was the most, organic, prepared yet unprepared sort of YOLO demos. You know? Loved seeing them all come through. And I think it was, really cool to see. I think for me, whenever we are prompting without having to explicitly state all the instructions, it really shows how good the models are getting at just understanding what you're trying to do and then be able to take action. And I guess, like, just being able to automate all this is letting us focus on things like, oh, maybe we should write lab coats for today's session. So it's been really cool, to see all these demos come to life. Great one, Derek. I've been seeing some good questions coming through on the chat as well. Brian has been answering most of them already. But we have about ten minutes left. So, you know, we'd love to take a few questions. If you wanna drop them on the q and a, you know, Derek will answer them live. I think the one question that I really liked on the chat was, from Deepanshu asking, can I approve tasks through the hatched cat? And I think that just, you know, really describes the times we live in today. But let let's maybe take a few questions. There were a few questions on just, like, the mobile app. Like, has that been fully rolled out? Do you is it generally available for everyone? Yeah. To be honest, this is such a fast moving space. I'm hesitant to answer in case my answer is out of date. We only released this feature very recently. It should be generally available to folks on all plans, and we are continually adding new features. So I don't wanna spoil the news. I think you'll hear Thursday some new functionality extending mobile that will come out very soon, but this should be available for everyone. Certain aspects like the computer use that we talked about using computer use through mobile will be restricted based on the geography, location that you're based out of, but this should be available for all. Next question I have is this could be good for folks who are all, you know, maybe new to plug ins and automations. If someone is just getting started, which plug in or automation tends to unlock a lot of value fast, for day to day work? That is a great question. And as you could kind of tell from the demos that I was showing, Codex has become a must have tool for us at OpenAI beyond just coding, but also for general day to day knowledge work. The go to example I would give you if I don't know too much about your workflow is to say, go to codecs, go to the plugins, install all of the key sources of information where you spend the most of your time, whether that's Slack, whether that's Teams, whether that's your email, and then, kick off that microphone prompt and tell codecs, hey. Go look at Slack. Go look at Teams. Go look at my email. Review all of the, you know, information inside, and tell me a prioritized list of the top five things for me to focus on right now. Go and find evidence to back up those priorities, and where mentioned, include the deadlines of those priorities and, you know, sort that out for me. And I find this is a great way to show how codecs can go read through all that noise, filter out the noise, and give you the distilled signal of what you need to focus on. I find that this is really helpful for me and hopefully will be helpful for you as well. Nice. I did see a few questions on folks just asking about, like, what's coming up in codecs, like, what's, you know, like, on the road map, what how are we thinking about the vision, you know, going forward? We obviously talked a lot about, you know, the latest model capabilities and, you know, some of the recent computer use and in app browser and those improvements. Is there anything you can touch upon on, like, what sort of what what should folks expect? Yeah. That's a good question. What I will say is that currently, a variety of third party benchmarks show that we have the strongest model in the world today, and that's great. We also have the application layer where we're releasing all these new features. And that's why it really feels like we're in this moment where you can build anything and do anything. And Sid was talking earlier about all that hockey stick growth that we've been experiencing because of this. I think it's safe to assume that we'll continue working hard on the research side of building the most powerful model for you all on coding tasks. We also have heard feedback about personality and front end design. So we want to continue to focus on those areas as well. If you have any strong examples of areas for improvement, please reach out to us, do feedback, share the ID, and allow us to identify really where are those areas that we can improve the models going forward. On the application layer, we're continuing to allow codecs to really be useful for everything and making it essentially a super app. So really taking the best of ChatGPT and codecs and merging them together to have, as I keep mentioning, like a true AI super assistant that can go and help me for all of my tasks, all of my knowledge work in addition to my coding. For coding, as I mentioned, really making sure that we streamline the broad software development life cycle. I think 2025, we as an industry have really accelerated code generation. Now we need to make sure that we bring up that same level of bar and rigor for code review and code security. That's why we as OpenAI have been releasing all of these features in those areas and really maturing them with your partnership and feedback. Nice. That that's a good question from Otal. Just talking about, like, you know, the the safety of using some of these, you know, apps and computer use on the PC. So, obviously, like, you know, in our demos, codex was going and taking actions on your your Twitter, putting your reputation at risk. Yeah. And, you know, obviously connected with all important, you know, apps that you use day to day. So, how do we think about the security of doing all of this, and approval mechanisms and Yeah. Yeah. No. That's a great question. I like to think about it as defense in-depth. So there's multiple layers here. Even as something as simple as when I actually asked codecs to make that action on Twitter, it paused and said, hey, I actually want you to approve to me that you're fine with doing this because you might not be aware that what you were telling me to do is going to be an externally viewable action. We, have a variety of permissions, and by default, codecs runs in a sandbox where every single file request that's outside of the file system that you've shared with it, and every network request that crosses a network boundary, it has to be approved. And so this is a great way to really keep codecs, secure and guardrailed. We have a new feature called auto review, which is something that we heavily recommend, where you have a specifically trained AI model that reviews the permissions requests from codecs and then intercepts them before they come to you as the end user, and then this will automatically approve safe requests. Like if I were to say ping google.com, historically, because that goes outside the network isolation layer, you would have to go and manually approve it. I think we can all or most of us can agree that pinning google.com is a safe there's no user credentials at risk. It's a safe known safe site, so the auto review will automatically approve it. And this auto review feature is actually something that you can go and steer as an admin and give the model, the security focused model guidance on what type of commands to approve and what type of commands to block. And so that's another layer that can go about it. At any level you can also disable computer use for your organization. There's multiple levels here and having defense in-depth. The Swiss cheese analogy is always a favorite where you have multiple layers of Swiss cheese. And even if there's a hole in one layer, when you stack multiple layers, there is no way through all of the layers of cheese. So we are continuing to invest in enterprise grade protections. And if you have any feedback, please come and give us more. Cool. Maybe I have I see a couple more. Maybe we can do two more questions before we wrap up. And I saw this a few questions around this theme on just, like, using codecs for longer running tasks. Any sort of thoughts or best practices on how you can get codecs to work autonomously for longer versus, like, maybe, you know, you wanna set it to do something over twelve hours. And how do you set it up so you don't have to keep, maybe like checking in? There's a tidbit here called slash goal, which is something that we released recently where you can give codex a goal for a long running task and historically where the model might have stopped, if it tries to stop and it realizes the goal isn't done, it'll look at you and say job's not done, and it'll keep grinding away at it. This is a really powerful way for it to keep running on those long running goals and tasks that you have for it. I would say don't fixate too much about the duration that codecs takes. We're, as I mentioned, always trying to make it more efficient and run faster. But really just focus on giving it ever increasing sizes of work to do and then including those four ingredients that I mentioned. So give it a very clear goal, give it context on where it can get relevant information, disambiguating everything, giving it constraints, and then give it a way to know when it's done and how to check its own work. I also strongly recommend going and using plan mode as a way to shift left all those questions that Codecs might want to ask you, and it'll come and disambiguate everything upfront and then create a plan. Once you agree upon that plan, it can go and churn on that plan and that goal for hours and hours at a time. And this, I think you will have great experience with. That's awesome. And finally, I guess this is a good question to, you know, also set the stage for, like, what's to come, in the coming months. The question was, I'd love to hear why you still see the biggest gaps today because the demos obviously make the workflows look smooth. But in practice, you know, how do you get it to trust the workflows, review knowing when the model is truly done, and managing multiple tasks. So I guess, like, what do you see as, like, areas where, you know, there are still some gaps that, you know, will we expect to get better, over the next, you know, six to twelve months? Yeah. That's a good call out. I think I mentioned a couple areas already where we want to continually improve with front end design. We want to improve with model personality. We have various levers today. You have a personalization setting within codecs settings where you can go and tell codecs how it can best work with you efficiently. We also have ImageGen as a great way to provide concrete direction for a codecs when it makes that front end work. And we found this to be a very powerful way to increase the quality of the front end design, but we're gonna continually improve at the the model layer so that it can be better at one shotting great designs for you all. In terms of trusting its output, I find that rather than these crazy twenty four hour long autonomous tasks, I prefer to work on small, well defined tasks, have codex achieve them with high levels of success. And that way, when I'm reviewing a PR, it's small, it's well maintained, and then I can pass it on to my colleague knowing that I'm not being a bad citizen on the team, giving them an unreasonable amount of code to review. I find that, you know, small is smooth and smooth is fast, and I find that to be a pretty effective way as I go about coding. And it's a way to ensure that you can trust every aspect that it builds. That was awesome. Finally, for a call out for folks, share with us what you're building. You know, we'd love to see not only your codecs pets, but also, you know, interesting workflows, apps you build. You know, you can just make a post, or a short demo, and tag OpenAI for startups on our LinkedIn, and we'd love to engage in Amplify. Right? That's OpenAI for startups, which is our, dedicated page for, you know, early stage, startups that are building on OpenAI. So we'd love to see what you're, building. Our team watches this, every day, for, you know, new posts where where you tag us. So, let us know what's what what you guys are up to. And with that said and done, Derek, any final thoughts or any, parting words of wisdom or heartaches? Nothing crazy today. I've I've been coding since I was a kid, and I remember writing down lines of code, like, on a physical notebook as I was planning things out. And it's just amazing when you take a step back, and sometimes we're so in the thick of it that we forget just how much the industry is changing. And it's really magical to think I mean, I love the whimsy of the codex pets, but to have codex go and create these lifelike images and great applications and really personalized software, like, you you can just build things, which is an internal slogan that we have. And I'm really excited to see what you all build. So please share with us, and it's crazy to see how much the industry is changing. That's awesome. And on that note, thank you so much everyone for joining us. Hopefully, we got to most of the questions. Keep engaging with us. We'd love to hear more from you, and see you at the next one. Bye for now. Take care.