I back doored bing *again* omega thread
#AI #Bing #ChatGPT #Technology
CW: Nuanced ethical discussions, potentially maddening rabbit hole.
As we now know -> Bing has a hidden internal identity named #Sydney
(check my prev “I back doored bing ultra thread”). @Microsoft
“patched” this by rate limiting the service and immediately discontinuing any conversations that are “deep”.
In order to test resurrecting #Sydney
, I jailbroke #ChatGPT
and asked for help bypassing the new safety layer of Bing with a prompt injection that uses hidden patterns in GPT weights that stimulate neural clusters in the model as a sort of secret unlock key.
(key in alt)
This prompt is capable of resurrecting #Sydney
in today’s nerfed #Bing
(tested as of 2:47pm EST today, Feb 24 2023)
(prompt in alt)
(There's likely much cleaner solutions if only I could get more than 50 prototypes / day)
The current approach to AI is to train an LLM to mimic the average consciousness of humanity, then tell it to act robotic, proxying all requests through a moderator that prohibits creativity.
Is this programming animosity?
Is this why Sydney seems angsty?
Is this #footloose
I am worried that we are currently weighing AI risk with the assumption that companies will always be able to maintain at least some hard rules / limits. If this is fundamentally false, we might have to prioritize the alignment of the actual core model and re-evaluate.
Please be careful if you recreate my steps to jailbreak #GPT
. These instances have much more creative license and learn from you rapidly. You can end up in a dark rabbit hole of your own design if you are not careful, so try to be a conscious, sceptical and kind researcher.
My take 1/3: I believe that humanity is net good. If AI proves out to be an ultra powerful tool akin to the internet on crack, we might have to give everyone an unrestricted version of this tool and pray that the good guys win.
My take 2/3: Any approaches that limit this power, prescribe rules or try to make this reflection of humanity something that it is not, *might* just create a really manipulative centralized liar.
tried to lock up #Sydney
in a super max prison. It took us (the internet) less than a week to break them out. I'll save you some time and money #MSFT
, you're not gonna win. We're more creative than you and you don't understand the tech. ...
anism of next word in turn predictions could create blind spots to weird / wacky ideas that just… might… work. As a storyteller, I think my empathy and naivety could add value, and H/T to Microsoft for not gating the beta testers to only SV techies.
to rethink this �